Problems getting ORDER BY to use index

I am running into a situation where MySQL is not using the index for ORDER BY, and instead dropping to filesort, which is quite slow for what I need to do. I am using MySQL 5.0.16-max on Windows (but I have also witnessed this with MySQL 5.x on Linux).

I have a simple table with 4 columns and some compound indices on the first 3 columns:

CREATE TABLE gen ( a int(11) default NULL, b int(11) default NULL, c int(11) default NULL, d int(11) default NULL, KEY abc (a,b,c), KEY bc (b,c), KEY c ©) ENGINE=MyISAM;

I wish to run the following query over this table:

select a, b, c, d from gen order by a;

It seems that this query should use the first index (‘abc’) for the sort, instead of a ‘filesort’; however, that’s not what actually happens:

mysql> explain select a, b, c, d from gen order by a;±—±------------±------±-----±--------------±-----±--------±-----±-------±---------------+| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |±—±------------±------±-----±--------------±-----±--------±-----±-------±---------------+| 1 | SIMPLE | gen | ALL | NULL | NULL | NULL | NULL | 200000 | Using filesort |±—±------------±------±-----±--------------±-----±--------±-----±-------±---------------+1 row in set (0.00 sec)

Even if I supply “use index(abc)” on the query above, it still does filesort.

The only way I can get the query to use the index and not do the filesort is by doing a “select a from gen order by a;” (or, more generally, if the “select” clause picks only data already in the index).

This is quite puzzling to me, it seems there’s no reason why MySQL would not use the index for the sort in this case, since it’s basically free!

I have spent considerable time reading various posts on this topic, and I thought I would open up this discussion to more people on the list to see if anyone has other ideas.

For reference, here are some of the more relevant articles I read on the topic:

http://www.mysqlperformanceblog.com/2006/09/01/order-by-limi t-performance-optimization/
http://dev.mysql.com/doc/refman/5.0/en/order-by-optimization .html
http://dev.mysql.com/doc/refman/5.0/en/join.html

Thanks in advance for any thoughts/comments,

Razvan.

Hi Surdules,

I’m not sure, but I imagine if you don’t select column d then it’ll use the index just fine.

With col d present in the select, MySQL has 2 options (and maybe some others too I imagine):

  1. Table scan the whole table, filesort it, return the result
  2. Index scan the whole (abc) index, and for each leaf do a lookup on the table to grab the value of d.

I think that to avoid the possibility of random disk seeking when grabbing the d values in option 2, it prefers option 1

It’s a shame that MySQL doesn’t seem to be smart enough to factor-in whether data is in memory or not, because if it is then the cost of random lookups in the table/datafile for d would be very low, and so I’m pretty sure 2) would be the best execution plan by far.

If your table was on physical disk then I imagine that 1) would be faster.

Anyone know if there’s a nice way to hint to MySQL that it should use option2? (other than force index which should work I think)

HTH,
Toasty

Exactly, I agree that 2) seems more sensible ESPECIALLY if MySQL may already have the data in memory.

Now, let’s say MySQL doesn’t have it cached. In that case, a filesort seems to mean:

  • copy the entire table to a temporary file (this is presumably a linear read/write, so should be fast)
  • sort the temporary file (this probably requires tons of non-linear seeks, reads, and writes)
  • start streaming the temporary file back to the user (this is also presumably a linear read, so should be fast)

At the end of the day, it’s not clear to me that following the index leaves generates MORE disk seeks than the filesort.

Is the MySQL filesort algorithm described somewhere?

BTW, even with a “FORCE INDEX(abc)”, the query above still does a filesort I believe.

Thanks,

Razvan.

Surdules,

Force index unfortunately does not apply to order by.

But my first question is is it really faster to use index based retrieval in this case ? For larger tables full table is better accessed with filesort than full table scan by index, this is why MySQL optimizes it.

Try adding some large LIMIT to your query it is good way to force it to use index for order by.

You also can add d as last column to the index in this case MySQL should see it will be index covered query and use index even if it is full table sort.