45 million row table - Two BETWEENs not using a composite index optimally

45 million row MyISAM table in MySQL 5.1.20.

select * from M
where B between 726642 and 733649
and C between 8224 and 8230;

Result set has 311 rows.

Full table scan: 32.08 seconds.
Using index on (B): 37.90 seconds.
Using index on (B,C): 39.75 seconds.
Potential performance: 0.01 seconds.

I’d hoped that MySQL would apply the composite index (B,C), with an index range scan on B, and a subindex range scan on C (for each matching B). But it seems not. The similarities between the response times for using the index on (B) and the index on (B,C), indicate MySQL is probably index range scanning on B, and then looking at every C element and applying a WHERE filter. That is, MySQL does not seem to take advantage of the fact that for each matching B, the C elements are available in the index in sorted order, and so could be subindex range scanned, rather than a full subindex scan.

It seems MySQL scans 4,776,432 index entries, rather than just the 74,865 index entries it needs to examine.

To check this, I collapsed the ranges, by making all the numbers in each range just the first number:
UPDATE m SET b = 726642 WHERE b BETWEEN 726642 AND 733649;
UPDATE m SET c = 8224 WHERE c BETWEEN 8224 AND 8230;

Now I run the query as: select * from M where B = 726642 and C = 8224;
This returns in 0.01 seconds.
If the BETWEENS were more cleverly optimized by MySQL, the original query should return in the same time - 0.01 seconds.

Is my understanding correct?
Is there any way to get MySQL to optimally use a composite index (index range scan plus subindex range scan) to satisfy two BETWEEN statements?

We cannot easily flatten these ranges into single numbers. These ranges encode a complex data structure. Flattening the ranges would involve many code changes and many extra columns and many extra indexes.

Thanks for your expertise and illumination,
Tasman

p.s. Note this is a cross post from the MySQL Optimizer forum; I hadn’t gotten any response there in over a day, so I thought I’d try here.

MySQL as of MySQL 5.0 can only have one range per index - as soon as there is the range everything in that range will be just scanned.

Sometimes you can replace first range with IN

A IN (5,6,7) and B between 5 and 10 will use key A,B

If the range is too large you can sometimes divide it in number of intervals and use that instead

Say now you need 545 to 789 in the query above - you can use

A100 IN (5,6,7) and A BETWEEN 545 to 789 and B BETWEEN 5 and 10

with key(A100,B)
when A100 keps A/100 numbers.

Thank you very much for the excellent response Peter! :smiley:

I had got as far as using IN for short ranges, and that worked really well - got down to 0.0022s for the query (from 32s).

I was worried about the optimizer taking more time with long IN clauses. I found that FORCE INDEX cuts out another 20% off the query time. (I suspect this saving will depend on the number of indexes on the table and the number of values in the IN clause.) I didn’t get as far as to try limiting the optimizer’s query plan search depth.

Thanks for the possible solution for long ranges (using a column like “B / 100” to reduce the number of IN values). I hadn’t thought of this - very clever! There are a lot of long ranges like this in the application.

I’ve logged an enhancement request to support two BETWEENs over a composite index with MySQL AB (bug id 30264: http://bugs.mysql.com/bug.php?id=30264 ).

Thank you again Peter.