optimize IP range join: postgresql is 496 times faster

I am very curious to know how to write the following join query for MySQL so it will perform as well as PostgreSQL. I never used PostgreSQL before now. I like MySQL and I always use it. However I was disappointed by its performance on this query and I installed PostgreSQL to compare. The purpose of this query is to map ip addresses to countries. On my CentOS 5 machine, MySQL 5.0.45 takes 10 mins 45 seconds, PostgreSQL 8.1.11 takes 0 mins 1.3 seconds. Yes MySQL is 496 times slower. I’m sure there must be a way to hint MySQL to run faster. Here is the query:

mysql> select range.id_country from address join range on address.address between range.begin_num and range.end_num;mysql> describe address;±--------±-----------------±-----±----±--------±------+| Field | Type | Null | Key | Default | Extra |±--------±-----------------±-----±----±--------±------+| address | int(10) unsigned | YES | | NULL | |±--------±-----------------±-----±----±--------±------+mysql> describe range;±------------±--------------------±-----±----±--------±------+| Field | Type | Null | Key | Default | Extra |±------------±--------------------±-----±----±--------±------+| begin_num | int(10) unsigned | NO | PRI | | || end_num | int(10) unsigned | YES | UNI | NULL | || id_country | tinyint(3) unsigned | YES | MUL | NULL | |±------------±--------------------±-----±----±--------±------+

Both tables are MyISAM type. Table address is 2124 rows (all distinct). Table range is 105920 rows (all distinct).

The best answer I found so far is here. Steinbrink gives a way to write it as a join on subquery that my MySQL finished in 6 min 42 sec (63% the time of the simple join version). That is still 310 times slower than PostgreSQL. Too slow!

Actually there is a small error in the SQL at that url:

ORDER BY ip_address DESC
should be:

ORDER BY start DESC

Here is MySQL’s explanation of the original query:

mysql> explain select range.id_country from address join range on address.address between range.begin_num and range.end_num;±—±------------±--------±-----±----------------±-----±--------±-----±-------±-----------------------------------------------+| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |±—±------------±--------±-----±----------------±-----±--------±-----±-------±-----------------------------------------------+| 1 | SIMPLE | address | ALL | NULL | NULL | NULL | NULL | 2124 | || 1 | SIMPLE | range | ALL | PRIMARY,end_num | NULL | NULL | NULL | 105920 | Range checked for each record (index map: 0x7) |±—±------------±--------±-----±----------------±-----±--------±-----±-------±-----------------------------------------------+

Here is PostgreSQL’s explanation of the original query:

postgresql# explain select range.id_country from address join range on address.address between range.begin_num and range.end_num; QUERY PLAN----------------------------------------------------------------------------------------------------------------- Nested Loop (cost=5.72…7061709.90 rows=83990942 width=2) → Seq Scan on range (cost=0.00…3316.47 rows=185547 width=18) → Bitmap Heap Scan on address (cost=5.72…31.25 rows=453 width=8) Recheck Cond: ((address.address >= “outer”.begin_num) AND (address.address <= “outer”.end_num)) → Bitmap Index Scan on addresses_pkey (cost=0.00…5.72 rows=453 width=0) Index Cond: ((address.address >= “outer”.begin_num) AND (address.address <= “outer”.end_num))

Here is MySQL’s explanation showing that a simple query can use the index on begin_num in a “range” type query plan:

mysql> explain select id_country from range where 123123123 between begin_num and end_num;±—±------------±--------±------±----------------±--------±--------±-----±-----±------------+| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |±—±------------±--------±------±----------------±--------±--------±-----±-----±------------+| 1 | SIMPLE | range | range | PRIMARY,end_num | PRIMARY | 4 | NULL | 35 | Using where |±—±------------±--------±------±----------------±--------±--------±-----±-----±------------+

Others encountered this problem before.
References:

[LIST]
[] MySQL Forums :: Optimizer :: How to optimize a range (best answer I found but not good enough)
[
] MySQL Performance Blog: Optimize IP Range Join / Daniweb: Optimize IP Range Join - MySQL
[] MySQL Performance Blog: Range Optimization
[
] MySQL :: inner join on range criteria: unable to use index?
[] Sergey Petrunia’s blog Use of join buffer is now visible in EXPLAIN
[
] MySQL :: MySQL 5.0 Reference Manual :: 7.2.5 Range Optimization
[] MySQL Bugs: #26963: Incorrect query results with CONST join compared to RANGE or ALL
[
] MySQL Bugs: #9693: bug in trying to join a value on a range (using indexes) identical post on mysql performance forum
[/LIST]

Hi,
You can try > and < instead of between.

ON address.address > ranges.begin_num AND address < ranges.end_num

MySQL chooses the wrong join order. Try this:

SELECT range.id_country
FROM range
STRAIGHT_JOIN address ON (address.address between range.begin_num and range.end_num);

Note that the time this query takes, depends on the number of ranges and not so much on the number of addresses.

The range scan on an index that you give is nice, but not very useful. It is simply not selective enough.

Old thread alert )

Thanks for the replies. I’ll try it.

Let me know your findings.