Can someone please clarify how to apply the following note about avoiding LIMIT from page 16 of Peter’s presentation MySQL Queries Optimization?
[B]Quote:[/B]
If you can precompute positions do it
– WHERE POS BETWEEN 1001 and 1010 works much
better than LIMIT 1000,10
I have a situation where I need to select a random question from a filtered list. I found the performance of ORDER BY RAND() to be terrible. Currently, I am (A) counting the number of results, (B) using PHP to select a random number of results to skip, (C) skipping through the result set [via LIMIT] to find the corresponding Primary Key value, and then (D) pulling the single row.
Is there a faster way to do this using BETWEEN?
PHP:
// Count the number of available questions in the topic$CountQsql = “SELECT QuestionIDFROM questionsWHERE topic = 1132AND Published = 1”;$CountQresult = mysql_query($CountQsql, $conn);$CountQnum = mysql_row_num($CountQresult);// Select a random number of rows to skip$SkipRows = rand(0,($CountQnum - 1));// Select the ID of the random question chosen$SelectRandQsql = “SELECT QuestionID AS QChosenFROM questionsWHERE topic = 1132AND Published = 1LIMIT {$SkipRows} , 1”;$SelectRandQresult = mysql_query($SelectRandQsql, $conn)$SelectRandQarray = mysql_fetch_array($SelectRandQresult)$QChosen = $SelectRandQarray[‘QChosen’];// Pull Question Info$sql = “SELECT QText, Answ1, Answ2, Answ3, Answ4FROM questionsWHERE QuestionID = {$QChosen}”;
One way is to simply cache the full set of QuestionIDs in memory. I’m not sure how many bytes an integer takes up in memory for php, but let’s play it safe and say its something large like 32 bytes. If you’re willing to use 10 MB of memory, you can hold over 300,000 questions. You can organize that data structure however you like, to facilitate filtering by topic and only load those questions which have been published. Then you just generate random numbers to index your set of arrays. This works best if the questions don’t change very often, but even if they do you can either reload the entire cache periodically or track the differences and reload those.
Here’s another way to do this that uses a bit more SQL:
SHOW TABLE STATUS LIKE ‘questions’;
This will get you the maximum value of your auto-incrementing primary key. Once you have that, then you can use php to generate a random number between 1 and the primary key max. Then your query becomes:
SELECT * FROM questions WHERE QuestionID <= $random_id LIMIT 1;
Note that this method isn’t perfect, since there is a possibility that this statement returns zero rows, either due to deleted rows or additional restrictions applied to the query (like those you gave in your example). If that happens, you simply re-generate a random number larger than the previous and try again. If the data in the table is randomly ordered, then you shouldn’t run into that too often. However, over time, this probably won’t really be the case. For example, you add a new category and all of its questions in one big chunk, then this method will work horribly for that topic.