poor performance ORDERing BY indexed column

Hello!

I need help concerning performance of particular SQL query (provided below). Could anyone, please, give me some clue, why the execution time of first query (provided below) is so long, while execution of second query (provided below) is almost instant. Maybe there is any way to optimize the first one? For example, using index on some or several columns, or performing some table optimization routines, or something else.

All activities are performed in the single table.

First query:
select distinct [PRIMARY_KEY_BIGINT(20)] from [TABLE] where lower(concat_ws(’ ', [VARCHAR(30)], [VARCHAR(60)])) like ‘const_chars%’ order by [BTREE_INDEXED_MEDIUMINT(9)] asc limit 20, 20

Exec time: ~30 secs

EXPLAIN statement on this query:
id: 1
select_type: SIMPLE
table: [TABLE]
type: index
possible_keys: NULL
key: [BTREE_INDEXED_MEDIUMINT(9)]
key_len: 4
ref: NULL
rows: 202848
Extra: Using where

Second query:
Idetical, except ORDER BY criterion is [UNINDEXED_BIGINT(14)]:

select distinct [PRIMARY_KEY_BIGINT(20)] from [TABLE] where lower(concat_ws(’ ', [VARCHAR(30)], [VARCHAR(60)])) like ‘const_chars%’ order by [UNINDEXED_BIGINT(14)] asc limit 20, 20

Exec time: ~0.5 sec

EXPLAIN statement on this query:
id: 1
select_type: SIMPLE
table: [TABLE]
type: ALL
possible_keys: NULL
key: NULL
key_len: NULL
ref: NULL
rows: 202848
Extra: Using where; Using filesort

Some other facts.

  1. if ORDER BY criterion is removed from the first query, execution time of the query becomes almost instant (~0.5 sec). Will provide EXPLAIN results upon request
  2. there is ~200000 entries in the table
  3. MySQL server version is 5.0.41-community
  4. it seems that older versions of databases (lots of inserts/deletes performed) takes more time to process the first query (up to several minutes on db, which was created several month ago), while younger versions takes about 1.5 sec
  5. OPTIMIZE/ANALYZE statements on this table did not helped.

Thank You very much!

First of all one of your main problems is that you are using a … WHERE concat_ws(col1,col2) LIKE ‘something%’
This means that you are always forcing mysql to perform a table scan since it can’t use an index when you have a function on the column.
So you should start by thinking about rewriting this query to be able to use an index for the LIKE condition.

Second is that you are using a function that is redundant.
lower() is not needed because LIKE’s are always performed case insensitive.

Now to what probably takes time.
The difference between the two queries is that query 1 is using an index to retrieve the rows in sorted order, while query 2 sorts the rows afterwards.
Usually it is advantageous to avoid the post sort hence why the database is using an index to retrieve the rows in sorted order.

This means that the steps to perform query 1 is:

  1. Get first id from index.
  2. Jump to that position in the table
  3. Read the VARCHAR30 and VARCHAR60 fields,
  4. Concatenate these and test the condition.
  5. Throw away rows that does not match.

While the execution plan in query 2 is:

  1. Read all rows sequentially from the table.
  2. Test the condition
  3. Throw away rows that does not match.
  4. Sort the remaining rows.

In this case you have different things that potentially takes time in the two execution plans.
In query 1 execution plan you have a “jump to the position in the table and read the two columns” which is highly dependent on the speed of the disks (if you don’t have enough RAM memory so that the entire table is in OS file cache).
This is typically around 8-10 ms which means that generally you can only fetch about 125 rows/second from the disk.
In query 2 it is instead reading the entire table sequentially and then discarding rows that doesn’t match.
And reading sequentially from a disk is typically today 50-100MB/s.
This means that we get a break even at about 50,000,000/125=400,000 bytes/row.
So as you can see it is only if the rows are larger than 400kb that you gain anything by using an index if you have a lot of matching rows.

And in your case as you can see reading the table sequentially is much faster than reading it randomly using the index.

That was a long explanation for something that should take a little time. )

Start by looking at how you can change your WHERE, especially how you can get rid of the concat_ws. Then you can get an index to work for the WHERE and then you can start to look at the rest.

Thank You very much, sterin, for such a nice explanation!
It is really frustrating, that there is no way to make MySQL optimizer not to use index in such cases, if query becomes so slow.

However, for the way to solve this out, our team has chosen to create additional row in the [TABLE] ( ( ), where concated [VARCHAR(30)] and [VARCHAR(60)] are stored and being updated by trigger on table inserts and updates ( ( ). Further, we have indexed this row. We really need these values for meeting functional requirements.
This solves the slowness problem for like ‘const_chars%’, but does not solve the problem for like ‘%const_chars%’ (or ‘%const_chars’). ( And I do not see a way to do it right now, at least, for MySQL DBMS.