Getting 1000 documents by id from MySQL / Sphinx

nerolabs · October 21, 2009, 3:20pm

Dear Percona Community:

I have a successful sphinx installation that is returning 1000 document IDs on common queries, and it performs the search in less that 0.2 seconds in nearly every case.

However, when I take these document ids and perform a query like this:

select data_point_1, data_point_2 from my_table where id in (14084528, 14140843,
14140471, 14140817, 14160141, 14096445, … goes on for 1000 document ids)

Here is the explain of the query:

±—±------------±------±------±----------------------- -±------------±--------±----------------------±-----±-- ----------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
±—±------------±------±------±----------------------- -±------------±--------±----------------------±-----±-- ----------+
| 1 | SIMPLE | um | range | PRIMARY | PRIMARY | 8 | NULL | 1000 | Using where |
| 1 | SIMPLE | up | ref | unique_pics,idx_meta_id | unique_pics | 8 | flixz.um.urls_meta_id | 1 | |
±—±------------±------±------±----------------------- -±------------±--------±----------------------±-----±-- ----------+

MySQL is taking more than 30 seconds to return the results on the EXACT ids (which of course are also the primary key), and the explain above looks solid.

I hired a MySQL consultant to figure out what was going on here, and his response was that the majority of the time MySQL is operating it’s parsing the very long query (1000 document ids) into a parse plan, and then optimizing, and then finally returning the query (which happens pretty quickly).

Our database is pretty large (300GB and growing)…

We also tried writing the document ids to a temporary table, and doing a join so that it’s the same query every time (keeping the parse path for the query in memory), and that ran even slower.

My question is this:

What do other users of large databases do when they query MySQL for the document_ids generated by sphinx to optimize query result times from MySQL?

All this kinda takes out the purpose of sphinx returning the 1000 document ids so fast, if MySQL is just going to choke on the query to pull the needed data points on the query for those 1000 documents!

Thanks for any help you can provide…

Nero

gmouse · October 21, 2009, 3:55pm

‘Parsing the query’, 30s? Bollocks )

Try faster disks.

nerolabs · October 21, 2009, 3:59pm

8 x 500 GB SATA2 … striped…

gmouse · October 21, 2009, 4:13pm

[B]nerolabs wrote on Wed, 21 October 2009 23:29[/B]

8 x 500 GB SATA2 ... striped...

It is probably doing other stuff in the meanwhile. 8x 500 GB regular 7200 rpm disks is not that impressive, a single ssd would outperform it for this query. If only most frequently accessed document id's could fit in memory, it would be much quicker.

qspy · November 3, 2009, 9:32pm

Nerolabs its very interesting what you writing because i have EXACTLY same problem :D. Grabbing id’s by sphinx very fast, but later all search queries are on slave which choking very much with queries like this.

query:

User@Host: root[root] @ localhost # Query_time: 17 Lock_time: 0 Rows_sent: 16 Rows_examined: 80SELECT pt.post_text, etc, p., f.forum_id, etc, t., u.username FROM phpbb_forums f, phpbb_topics t, phpbb_users u, phpbb_posts p, phpbb_posts_text pt WHERE p.post_id IN (1243453, 1243480, 1243644, 1243809, 1243896, … overall 658 id’s are here, counter by Online Word Count Tool) AND pt.post_id = p.post_id AND f.forum_id = p.forum_id AND p.topic_id = t.topic_id AND p.poster_id = u.user_id ORDER BY pt.post_id DESC LIMIT 0, 16;

explain:

±—±------------±------±-------±-----------------------------------------------±--------±--------±---------------±-----±------------+| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |±—±------------±------±-------±-----------------------------------------------±--------±--------±---------------±-----±------------+| 1 | SIMPLE | pt | range | PRIMARY | PRIMARY | 3 | NULL | 658 | Using where || 1 | SIMPLE | p | eq_ref | PRIMARY,forum_id,topic_id,poster_id,topic_id_2 | PRIMARY | 3 | pt.post_id | 1 | || 1 | SIMPLE | f | eq_ref | PRIMARY | PRIMARY | 2 | p.forum_id | 1 | || 1 | SIMPLE | u | eq_ref | PRIMARY | PRIMARY | 3 | p.poster_id | 1 | || 1 | SIMPLE | t | eq_ref | PRIMARY | PRIMARY | 3 | p.topic_id | 1 | |±—±------------±------±-------±-----------------------------------------------±--------±--------±---------------±-----±------------+

Dunno why so long, no lock_time (it says 0) and when i executed it at night it takes “only” 0.11.

I got “race condition” sometimes with this queries and then is horror like this (probably myisam LOCKS because quite long queries much at once almost):

User@Host: root[root] @ localhost # Query_time: 68 Lock_time: 55 Rows_sent: 40 Rows_examined: 3120# Time: 091102 15:44:13# User@Host: root[root] @ localhost # Query_time: 71 Lock_time: 57 Rows_sent: 16 Rows_examined: 80

etc., etc., dunno why and where this Lock_time started…

gmouse · November 4, 2009, 1:19pm

Locking is horrible, switch to InnoDB.

Can’t you circumvent this entire problem by storing those other fields in sphinx?

Topic		Replies	Views
Selecting where id in (list of 1000 ids) Other MySQL® Questions	10	31957	June 25, 2008
Sphinx and Mysql, Join problems! Sphinx & Full-Text Search	1	518	January 4, 2007
Problem with slow mysql select count(id) Other MySQL® Questions	0	593	June 29, 2008
Slow Full-Text Search when table grows constantly Sphinx & Full-Text Search	1	626	December 25, 2009
Is It max performance on one server or did I missed something ? Other MySQL® Questions	20	1182	June 11, 2010

Getting 1000 documents by id from MySQL / Sphinx

User@Host: root[root] @ localhost # Query_time: 68 Lock_time: 55 Rows_sent: 40 Rows_examined: 3120# Time: 091102 15:44:13# User@Host: root[root] @ localhost # Query_time: 71 Lock_time: 57 Rows_sent: 16 Rows_examined: 80

Related topics