Performance using BETWEEN

I’m using a large database (ip2location) with about 5.2 million rows. The first two columns are “ip numbers” denoting a range of IP addresses. Each row contains information about the location (zip code, etc.) where this range of IP numbers is located.

If ip = 71.158.134.217, then ipnumber = 71 * (256^3) + 158 * (256^2) + 134 * 256 + 217.

Simple enough, right? The ip number for this ip is 1201571545. To find a location given this IP number, the query is:

SELECT * FROM ip2location WHERE 1201571545 BETWEEN ip_from AND ip_to;

The query takes about 3.5 seconds to resolve on my machine. That’s too slow. What’s odd is that the time is essentially invariant with:

ALTER TABLE ip2location ADD INDEX ip_from_index( ip_from );

…or the suggested:

ALTER TABLE ip2location ADD PRIMARY KEY( ip_from, ip_to);

EXPLAINing the query shows that the indexes are never used-- it’s always a full table search. I’ve tried UNIONs and subqueries and whatnot, and nothing makes a whit of difference in the performance. By comparison, a zip code lookup (with an index on the zip code column) executes in a few milliseconds.

There must be a way to make this query faster. Any ideas?

The problem you have is that in your query you are limiting a const value between two columns.

columnA > const AND columnB < const

Not a column value between two const values.

columnA > const1 AND columnA < const2

And I think that the problem is that the optimizer chooses a index range scan since you have an expression that doesn’t close in between two values.

Check out my post #37 on this forum:
http://forums.devshed.com/showpost.php?p=1749143&postcou nt=37

It was the same problem as you had and I solved with a sub-select that first finds the start_ip and then uses the combined index to verify that the end_ip is bigger than the ip address you searched on.