Inconsistent use of time index

I am experiencing strange behavior out of MySQL 5.1.30. I have a table with (among others) a datetime field. The field has an index on it. My query does a join of two tables: image that contains time data, and image_change that is associated with image through a foreign key constraint. The image_change table contains about 2M records.

When I execute this query:

explain select ic.* from image_change ic, image iwhere ic.image_id=i.id and i.camera_id=2 and (i.time > ‘2009-03-01’ and i.time <‘2009-04-15’)

I get the following response:

1, ‘SIMPLE’, ‘i’, ‘range’, ‘PRIMARY,FK5FAA95B637C4FBA,Index_Time’, ‘Index_Time’, ‘8’, ‘’, 746, 'Using where’1, ‘SIMPLE’, ‘ic’, ‘ref’, ‘FK10B73B14D0FB3FBA,Index_Image’, ‘FK10B73B14D0FB3FBA’, ‘8’, ‘reboard.i.id’, 558, ‘’

Notice that ‘Index_Time’ is being used. When I run the same query with a slightly different date:

explain select ic.* from image_change ic, image iwhere ic.image_id=i.id and i.camera_id=2 and (i.time > ‘2009-03-01’ and i.time <‘2009-04-16’)

I get a response that indicates that the index is not being used.

1, ‘SIMPLE’, ‘ic’, ‘ALL’, ‘FK10B73B14D0FB3FBA,Index_Image’, ‘’, ‘’, ‘’, 2065041, ''1, ‘SIMPLE’, ‘i’, ‘eq_ref’, ‘PRIMARY,FK5FAA95B637C4FBA,Index_Time’, ‘PRIMARY’, ‘8’, ‘reboard.ic.image_id’, 1, ‘Using where’

If I actually run the query that doesn’t use the index, it takes about two orders of magnitude slower to run.

Any thoughts on what to check?

Thanks,

Gene

Could you post the result of SHOW INDEXES for the two tables? Also, try running the following:

explain select i.id from image i where i.camera_id=2 and (i.time > ‘2009-03-01’ and i.time <‘2009-04-15’)

explain select i.id from image i where i.camera_id=2 and (i.time > ‘2009-03-01’ and i.time <‘2009-04-16’)

I’m guessing the results of the second EXPLAIN will estimate very large number of rows. This could actually be correct if you have a lot of rows with a time value between 4/15 and 4/16, but more than likely there’s a problem with the index statistics. Try running ANALYZE TABLE on image and image_change and see if that helps.

If that doesn’t help, you can use STRAIGHT_JOIN to force the optimizer to join the tables in a particular order. That should avoid the nasty table scan of image_change you’re doing. In general its better to let the optimizer make the decision by itself, but if its just being stupid, then you can force it.

Here are the results of the queries:

explain select i.id from image i where i.camera_id=2 and (i.time > ‘2009-03-01’ and i.time <‘2009-04-15’);1, ‘SIMPLE’, ‘i’, ‘range’, ‘FK5FAA95B637C4FBA,Index_Time’, ‘Index_Time’, ‘8’, ‘’, 746, 'Using where’explain select i.id from image i where i.camera_id=2 and (i.time > ‘2009-03-01’ and i.time <‘2009-04-16’)1, ‘SIMPLE’, ‘i’, ‘ref’, ‘FK5FAA95B637C4FBA,Index_Time’, ‘FK5FAA95B637C4FBA’, ‘8’, ‘const’, 4484, ‘Using where’

As you suspected, one estimates a somewhat larger number of records, 4484 vs. 746 out of 27252 total in the table.
But the query

select count(1) from image i where i.camera_id=2 and (i.time > ‘2009-03-01’ and i.time <‘2009-04-16’)

returns 70, and the corresponding query ending on ‘2009-14-15’ returns 57 records.

So it seems like something is fouling up the optimizer. ANALYZE TABLE produces nothing surprising:

‘reboard.image’, ‘analyze’, ‘status’, ‘OK’‘reboard.image_change’, ‘analyze’, ‘status’, ‘OK’

Unfortunately, I do not have direct access to this query, as it comes from the Hibernate layer. So I am not sure I know how to force it to do a straight join. I will look into that, however.

Thanks again,

Gene

The estimates in the results of EXPLAIN are meant to capture the number of rows examined by the database engine, not the number of rows in the result set. So its actually more fair to compare

explain select i.id from image i where i.camera_id=2 and (i.time > ‘2009-03-01’ and i.time <‘2009-04-15’);

with

select count(*) from image where (i.time > ‘2009-03-01’ and i.time <‘2009-04-15’);

and

explain select i.id from image i where i.camera_id=2 and (i.time > ‘2009-03-01’ and i.time <‘2009-04-16’)

with

select count(*) from image i where i.camera_id=2

Is the output of EXPLAIN for your two original queries the same both before and after you ran ANALZYE? That command actually recomputes the index stats, so if those where somehow stale, running this command could affect the query plan.

What’s the output of SHOW INDEXES for your two tables, by the way?

Try this:

ALTER TABLE image ADD INDEX image_ix_cameraid_time_id( camera_id, time, id);

Since you have a:

…WHERE i.camera_id=2 AND ( i.time > ‘2009-03-01’ AND i.time <‘2009-04-16’ )

Adding a index on the columns (camera_id, time) makes it possible for the optimizer to use it for both the conditions so it doesn’t have to choose between on or the other index.
And by adding the id as the last column, which is needed for the join, it doesn’t even have to read anything from the table since all information exists in the index. This saves us extra seek times.

How many rows does the larger date range return in the result set?

SELECT COUNT(*) from image_change ic, image iwhere ic.image_id=i.id and i.camera_id=2 and (i.time > ‘2009-03-01’ and i.time <‘2009-04-16’)

And compared to the smaller date range?

Since if there are a lot of matching records in the image_change table (your 2M table) and you have selected ic.* it could be faster to read all these rows sequentially and then throw away the non matching ones.
Not using indexes aren’t always a bad thing since todays disks can pump a lot of data sequentially but show a pretty poor performance when reading data randomly.

BTW I suggest using the INNER JOIN syntax instead:

SELECT ic.*FROM image_change icINNER JOIN image i ON ic.image_id=i.id AND i.camera_id=2 AND ( i.time > ‘2009-03-01’ AND i.time <‘2009-04-16’ )

It makes the query much easier to read and the risk of forgetting a join condition is virtually impossible.
And it integrates seamlessly (both visually and logically) with LEFT JOIN’s.

Thanks for the great suggestion about how to modify the index – that seems to have done the trick. I am not sure I will be able to change the query that’s produced, however, as I am using Hibernate. Also, I will need to poke around a bit to find out how to get the three-field index to be generated automatically by Hibernate (as opposed to my mucking with the MySQL tools).

Thanks again for your help!

Gene