Selecting where id in (list of 1000 ids)

Hi,

I’m trying to do the following select:
SELECT * FROM table WHERE id IN
(1,2,3…999,1000); (with other id numbers of course)

However, this query takes about 2 seconds. Is there any way to speed this up?

Nobody who knows a solution?

SELECT * FROM table WHERE (entries.id IN (8631,…351485) LIMIT 1000
Takes: 14 sec

SELECT * FROM table LIMIT 1000
Takes: 0.8 sec

Do you have a unique index on id?

Some questions:

How many entries did you have in the 14 second query?

Why do you need to single out and select so many entries?

How large is the database?

And how much memory have you reserved for key_cache (if MyISAM) or for innodb_buffer_size if you are using InnoDB tables?

[B]sterin wrote on Tue, 17 April 2007 00:06[/B]
Some questions:
  1. How many entries did you have in the 14 second query?

  2. Why do you need to single out and select so many entries?

  3. How large is the database?

  4. And how much memory have you reserved for key_cache (if MyISAM) or for innodb_buffer_size if you are using InnoDB tables?

  1. 1000 id’s (entries) were selected…
  2. I get the fulltext results from SphinxSearch. This returns only the id’s … I have to go over as many (at least 1000) to get a good result.
  3. Half a million rows.
  4. Table is MyISAM. key_buffer_size = 128M

Any suggestions?

No MySQL doesn’t put the complete index in memory.
But it will try to use as much of the cache as possible (until the limit that you have set is reached).
And you want it to use as much as possible.

The most ideal situation is when the entire DB with both index and data fits into RAM. Because as soon as it has to read from disk everything slows down by magnitudes.

So if you can fit the entire index into key_buffer_size while still maintaining about 75% free memory for OS cache for the tables is the best solution.

So RAM is generally considered a DBMS best friend.
As much as possible is the rule ).

First,

Using sphinx you should not retrieve all 1000 rows but use SetLimit to retrieve only ids you’re going to show on the page.

Second the difference is expected - the query needs 1000 random IO operations for large data so it takes what it takes.

Other query does full table scan so it can find 1000 rows and stop pretty fast.

I’m not a MYSQL expert, but I’m not a newbie, either.

Like said before, I’d first reduce your indexes and only have them where you need them. Additionally, knowing that Mysql will only use ONE index for a table per query, you should be able to examine your queries and decide which ones should go (if any).

Second… essentially you have 1,000 OR statements. I hate OR statements, almost as much as I refuse to use LEFT JOINs (I just use standard joins 100%).

I know how to speed it up (assuming you’re doing PHP):

  1. All those ID’s you’re searching for – store them in an array, with their key as the ID and the value as the ID:
    for ($a=0; $a<=$numberOfIDsToSearch; $a++)
    { $val = $myOrigIDArray[$a]; $newIDArray[$val] = $val; }

  2. Before executing the query, get the lowest ID # and the highest ID #. Set them at $min, and $max variables. It’s easy to grab them from an array thru PHP’s built-in functions.

  3. Now, we’ll make your SQL Statement getting all IDs in between your highest and lowest ($max and $min). You’ll fetch an array of all possibilities of where those 1,000 lye, and you saved them all in memory!

Then, just decide which one to get. Here, I’ll put the code from the select statement on (do you really need *?? - that’ll slow it down):

$getidData = “SELECT id, info1, info2, info3 FROM table WHERE id>=$min AND id<=$max”;
$result = mysql_query($getidData, $db_conn) or die(“I screwed up”);
while ($row = mysql_fetch_array($result))
{
$id = $row[0];
if ($newIDArray[$id] == $id)
{ // … I found it! I can build a new array of found ones here, to process later! }
} // end while

[B]JGilbert wrote on Tue, 08 May 2007 15:31[/B]

I don’t know what Sphinx is, but if it were my table and my query I might try removing any indexes that weren’t needed. It sounds like you may have too many

But removing excessive indexes would not speed up selects. For write operations sure, carefully choose them to make sure theres no io wasted.

[B]toddjnsn wrote on Tue, 27 May 2008 01:59[/B]

I hate OR statements, almost as much as I refuse to use LEFT JOINs (I just use standard joins 100%).

I dont understand this. If your query only needs an INNER JOIN then great, but I can think of many queries where a LEFT JOIN is necessary and desirable.

Dont be afraid of LEFT JOIN, it is more important to return the right results, and then worry about performance, than having a fast query that misses out some records.

[B]toddjnsn wrote on Tue, 27 May 2008 01:59[/B]

3)Now, we’ll make your SQL Statement getting all IDs in between your highest and lowest ($max and $min)

I think the IN (1,2,3…999) he pasted was an example )
The real IDs will are unlikely to be a nice contiguous range, since they are document ids which are returned by the Sphinx search daemon.

I agree with Peter, try to limit the amount of ids that Sphinx is returning. The Sphinx PHP API (and all other Sphinx APIs) have support for paging, and asking for 100 records from Sphinx and then fetching those 100 from the database should be much faster than using 1000. Unless you really need 1000 results?

Theres a few options you can consider.

If youre querying Sphinx for 1000 results, and then doing a JOIN to filter those results down further, you may be better off considering SphinxSE, which integrates with MySQL.
The bad news is that its not as scalable, and Sphinx must run on the same box as MySQL. Because of this, Id choose to avoid SphinxSE personally, but sometimes its a valid choice.

Secondly, could any of the content be cached either to filesystem, or to memcached? Typically most search features have a subset of basic queries which are run frequently, you may be able to cache these to alleviate the load.

I cant think of much else really. Youre essentially selecting a ‘random’ set of records where the ID is the primary key,so theres not much scope for improvement.

Maybe you need to alter your table schema, or hive some fields off into another table.

You could also consider innodb, which in my experience provides vastly improved performance for selects

"I dont understand this. If your query only needs an INNER JOIN then great, but I can think of many queries where a LEFT JOIN is necessary and desirable.

Dont be afraid of LEFT JOIN, it is more important to return the right results, and then worry about performance, than having a fast query that misses out some records."

I avoid doing SLOW (left) joins when I’d have to do them on my system a lot. So I denormalize things and put redundant data on pieces that WON’T change after writing (ie linking the rows by association). That way, I avoid the left join by a once-in-a-while write, with a million queries w/ avoided left joins. I’d never do such a thing if I had to put a redudant piece somewhere and have to change it after the fact.

As far as my example – I think it’d solve his problem just fine. I ran into situations where I had to do matching – 100,000+ IDs to find matches of multiple tables w/ 100,000+ IDs. I couldn’t do an OR statement – that’d be ridiculous.

So I did what I showed by example. Turned 30-45 minutes of query time down to about 20-30 seconds (swallowed ram on the run, but only for that 20-30 seconds).

I don’t think mysql optimizes the table at IN (at least in Mysql 5.1) I also get very slow queryes when using this approach…

However I found a workaround… Create a temporary table that holds your id’s and then use a JOIN statement… this works pretty fast.