Getting 1000 documents by id from MySQL / Sphinx

Dear Percona Community:

I have a successful sphinx installation that is returning 1000 document IDs on common queries, and it performs the search in less that 0.2 seconds in nearly every case.

However, when I take these document ids and perform a query like this:

select data_point_1, data_point_2 from my_table where id in (14084528, 14140843,
14140471, 14140817, 14160141, 14096445, … goes on for 1000 document ids)

Here is the explain of the query:

±—±------------±------±------±----------------------- -±------------±--------±----------------------±-----±-- ----------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
±—±------------±------±------±----------------------- -±------------±--------±----------------------±-----±-- ----------+
| 1 | SIMPLE | um | range | PRIMARY | PRIMARY | 8 | NULL | 1000 | Using where |
| 1 | SIMPLE | up | ref | unique_pics,idx_meta_id | unique_pics | 8 | flixz.um.urls_meta_id | 1 | |
±—±------------±------±------±----------------------- -±------------±--------±----------------------±-----±-- ----------+

MySQL is taking more than 30 seconds to return the results on the EXACT ids (which of course are also the primary key), and the explain above looks solid.

I hired a MySQL consultant to figure out what was going on here, and his response was that the majority of the time MySQL is operating it’s parsing the very long query (1000 document ids) into a parse plan, and then optimizing, and then finally returning the query (which happens pretty quickly).

Our database is pretty large (300GB and growing)…

We also tried writing the document ids to a temporary table, and doing a join so that it’s the same query every time (keeping the parse path for the query in memory), and that ran even slower.

My question is this:

What do other users of large databases do when they query MySQL for the document_ids generated by sphinx to optimize query result times from MySQL?

All this kinda takes out the purpose of sphinx returning the 1000 document ids so fast, if MySQL is just going to choke on the query to pull the needed data points on the query for those 1000 documents!

Thanks for any help you can provide…

Nero

‘Parsing the query’, 30s? Bollocks )

Try faster disks.

8 x 500 GB SATA2 … striped…

[B]nerolabs wrote on Wed, 21 October 2009 23:29[/B]
8 x 500 GB SATA2 ... striped...
It is probably doing other stuff in the meanwhile. 8x 500 GB regular 7200 rpm disks is not that impressive, a single ssd would outperform it for this query. If only most frequently accessed document id's could fit in memory, it would be much quicker.

Nerolabs its very interesting what you writing because i have EXACTLY same problem :D. Grabbing id’s by sphinx very fast, but later all search queries are on slave which choking very much with queries like this.

query:

User@Host: root[root] @ localhost # Query_time: 17 Lock_time: 0 Rows_sent: 16 Rows_examined: 80SELECT pt.post_text, etc, p., f.forum_id, etc, t., u.username FROM phpbb_forums f, phpbb_topics t, phpbb_users u, phpbb_posts p, phpbb_posts_text pt WHERE p.post_id IN (1243453, 1243480, 1243644, 1243809, 1243896, … overall 658 id’s are here, counter by Online Word Count Tool) AND pt.post_id = p.post_id AND f.forum_id = p.forum_id AND p.topic_id = t.topic_id AND p.poster_id = u.user_id ORDER BY pt.post_id DESC LIMIT 0, 16;

explain:

±—±------------±------±-------±-----------------------------------------------±--------±--------±---------------±-----±------------+| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |±—±------------±------±-------±-----------------------------------------------±--------±--------±---------------±-----±------------+| 1 | SIMPLE | pt | range | PRIMARY | PRIMARY | 3 | NULL | 658 | Using where || 1 | SIMPLE | p | eq_ref | PRIMARY,forum_id,topic_id,poster_id,topic_id_2 | PRIMARY | 3 | pt.post_id | 1 | || 1 | SIMPLE | f | eq_ref | PRIMARY | PRIMARY | 2 | p.forum_id | 1 | || 1 | SIMPLE | u | eq_ref | PRIMARY | PRIMARY | 3 | p.poster_id | 1 | || 1 | SIMPLE | t | eq_ref | PRIMARY | PRIMARY | 3 | p.topic_id | 1 | |±—±------------±------±-------±-----------------------------------------------±--------±--------±---------------±-----±------------+

Dunno why so long, no lock_time (it says 0) and when i executed it at night it takes “only” 0.11.

I got “race condition” sometimes with this queries and then is horror like this (probably myisam LOCKS because quite long queries much at once almost):

User@Host: root[root] @ localhost # Query_time: 68 Lock_time: 55 Rows_sent: 40 Rows_examined: 3120# Time: 091102 15:44:13# User@Host: root[root] @ localhost # Query_time: 71 Lock_time: 57 Rows_sent: 16 Rows_examined: 80

etc., etc., dunno why and where this Lock_time started…

Locking is horrible, switch to InnoDB.

Can’t you circumvent this entire problem by storing those other fields in sphinx?