Indexing oddity

Hi all,

I am running into some weird behavior with the mysql optimizer choosing to do a full table scan instead of choosing the primary key index for this query:

mysql> explain SELECT * FROM route WHERE route_id = 30809866;
±—±------------±------±-----±--------------±-----±- -------±-----±-------±------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
±—±------------±------±-----±--------------±-----±- -------±-----±-------±------------+
| 1 | SIMPLE | route | ALL | PRIMARY | NULL | NULL | NULL | 383400 | Using where |
±—±------------±------±-----±--------------±-----±- -------±-----±-------±------------+

The table looks like this:

CREATE TABLE route (
route_id char(10) CHARACTER SET utf8 COLLATE utf8_bin NOT NULL,
type tinyint(4) NOT NULL,
event_type tinyint(4) NOT NULL,
title varchar(128) COLLATE utf8_unicode_ci NOT NULL DEFAULT ‘’,
description text COLLATE utf8_unicode_ci,
start_instructions text COLLATE utf8_unicode_ci,
user_id varchar(64) COLLATE utf8_unicode_ci NOT NULL,
site_id varchar(64) COLLATE utf8_unicode_ci NOT NULL,
<snip – more columns where>
PRIMARY KEY (route_id),
KEY site_id (site_id,user_id),
KEY type (type),
KEY event_type (event_type),
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci

As you can see, the primary key has an index on route_id, however mysql seems to skip it and instead do a full table scan. Any ideas why this could be?

Hi,

Try enclosing the value in quotes. The primary key is CHAR() column, not an integer column.

When I use quotes, it comes up with a different issue:

mysql> explain SELECT * FROM route WHERE route_id = ‘30809866’;
±—±------------±------±-----±--------------±-----±- -------±-----±-----±------------------------------------- ---------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
±—±------------±------±-----±--------------±-----±- -------±-----±-----±------------------------------------- ---------------+
| 1 | SIMPLE | NULL | NULL | NULL | NULL | NULL | NULL | NULL | Impossible WHERE noticed after reading const tables |
±—±------------±------±-----±--------------±-----±- -------±-----±-----±------------------------------------- ---------------+
1 row in set (0.00 sec)

One thing I noticed is that if I don’t quote the route_id value and use an order by clause it seems to use the index correctly:

mysql> explain SELECT * FROM route WHERE route_id = 30809866 order by route_id;
±—±------------±------±------±--------------±------- -±--------±-----±-------±------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
±—±------------±------±------±--------------±------- -±--------±-----±-------±------------+
| 1 | SIMPLE | route | index | PRIMARY | PRIMARY | 30 | NULL | 356424 | Using where |
±—±------------±------±------±--------------±------- -±--------±-----±-------±------------+
1 row in set (0.00 sec)

That makes sense. Thanks Justin.

-Carlos

[B]Quote:[/B]
Your second query will range scan the primary key (due to the ORDER BY) and only return the rows that match (which will be no rows).
The primary key is the table itself. Both with and without ORDER BY, the entire table will be read. The query with WHERE route_id = 30809866 can return many rows.

The problem is that you are comparing a string with an integer. Evaluation of the clause WHERE route_id = 30809866 will convert route_id to an integer before performing the comparison. Many different strings will equal 30809866 after conversion. Moreover, you will run into problems because the numbers you use can be larger than the maximum size of an integer, resulting in the use of (inaccurate) floats. Either use integers and use an integer field, or switch to char(10) and use strings in your query. E.g. if you know that route_id’s are padded with zeros, use WHERE route_id = ‘0030809866’