Dear Percona Community:
I have a successful sphinx installation that is returning 1000 document IDs on common queries, and it performs the search in less that 0.2 seconds in nearly every case.
However, when I take these document ids and perform a query like this:
select data_point_1, data_point_2 from my_table where id in (14084528, 14140843,
14140471, 14140817, 14160141, 14096445, … goes on for 1000 document ids)
Here is the explain of the query:
±—±------------±------±------±----------------------- -±------------±--------±----------------------±-----±-- ----------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
±—±------------±------±------±----------------------- -±------------±--------±----------------------±-----±-- ----------+
| 1 | SIMPLE | um | range | PRIMARY | PRIMARY | 8 | NULL | 1000 | Using where |
| 1 | SIMPLE | up | ref | unique_pics,idx_meta_id | unique_pics | 8 | flixz.um.urls_meta_id | 1 | |
±—±------------±------±------±----------------------- -±------------±--------±----------------------±-----±-- ----------+
MySQL is taking more than 30 seconds to return the results on the EXACT ids (which of course are also the primary key), and the explain above looks solid.
I hired a MySQL consultant to figure out what was going on here, and his response was that the majority of the time MySQL is operating it’s parsing the very long query (1000 document ids) into a parse plan, and then optimizing, and then finally returning the query (which happens pretty quickly).
Our database is pretty large (300GB and growing)…
We also tried writing the document ids to a temporary table, and doing a join so that it’s the same query every time (keeping the parse path for the query in memory), and that ran even slower.
My question is this:
What do other users of large databases do when they query MySQL for the document_ids generated by sphinx to optimize query result times from MySQL?
All this kinda takes out the purpose of sphinx returning the 1000 document ids so fast, if MySQL is just going to choke on the query to pull the needed data points on the query for those 1000 documents!
Thanks for any help you can provide…
Nero