I am very curious to know how to write the following join query for MySQL so it will perform as well as PostgreSQL. I never used PostgreSQL before now. I like MySQL and I always use it. However I was disappointed by its performance on this query and I installed PostgreSQL to compare. The purpose of this query is to map ip addresses to countries. On my CentOS 5 machine, MySQL 5.0.45 takes 10 mins 45 seconds, PostgreSQL 8.1.11 takes 0 mins 1.3 seconds. Yes MySQL is 496 times slower. I’m sure there must be a way to hint MySQL to run faster. Here is the query:
mysql> select range.id_country from address join range on address.address between range.begin_num and range.end_num;mysql> describe address;±--------±-----------------±-----±----±--------±------+| Field | Type | Null | Key | Default | Extra |±--------±-----------------±-----±----±--------±------+| address | int(10) unsigned | YES | | NULL | |±--------±-----------------±-----±----±--------±------+mysql> describe range;±------------±--------------------±-----±----±--------±------+| Field | Type | Null | Key | Default | Extra |±------------±--------------------±-----±----±--------±------+| begin_num | int(10) unsigned | NO | PRI | | || end_num | int(10) unsigned | YES | UNI | NULL | || id_country | tinyint(3) unsigned | YES | MUL | NULL | |±------------±--------------------±-----±----±--------±------+
Both tables are MyISAM type. Table address
is 2124 rows (all distinct). Table range
is 105920 rows (all distinct).
The best answer I found so far is here. Steinbrink gives a way to write it as a join on subquery that my MySQL finished in 6 min 42 sec (63% the time of the simple join version). That is still 310 times slower than PostgreSQL. Too slow!
Actually there is a small error in the SQL at that url:
ORDER BY ip_address DESC
should be:
ORDER BY start DESC
Here is MySQL’s explanation of the original query:
mysql> explain select range.id_country from address join range on address.address between range.begin_num and range.end_num;±—±------------±--------±-----±----------------±-----±--------±-----±-------±-----------------------------------------------+| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |±—±------------±--------±-----±----------------±-----±--------±-----±-------±-----------------------------------------------+| 1 | SIMPLE | address | ALL | NULL | NULL | NULL | NULL | 2124 | || 1 | SIMPLE | range | ALL | PRIMARY,end_num | NULL | NULL | NULL | 105920 | Range checked for each record (index map: 0x7) |±—±------------±--------±-----±----------------±-----±--------±-----±-------±-----------------------------------------------+
Here is PostgreSQL’s explanation of the original query:
postgresql# explain select range.id_country from address join range on address.address between range.begin_num and range.end_num; QUERY PLAN----------------------------------------------------------------------------------------------------------------- Nested Loop (cost=5.72…7061709.90 rows=83990942 width=2) → Seq Scan on range (cost=0.00…3316.47 rows=185547 width=18) → Bitmap Heap Scan on address (cost=5.72…31.25 rows=453 width=8) Recheck Cond: ((address.address >= “outer”.begin_num) AND (address.address <= “outer”.end_num)) → Bitmap Index Scan on addresses_pkey (cost=0.00…5.72 rows=453 width=0) Index Cond: ((address.address >= “outer”.begin_num) AND (address.address <= “outer”.end_num))
Here is MySQL’s explanation showing that a simple query can use the index on begin_num in a “range” type query plan:
mysql> explain select id_country from range where 123123123 between begin_num and end_num;±—±------------±--------±------±----------------±--------±--------±-----±-----±------------+| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |±—±------------±--------±------±----------------±--------±--------±-----±-----±------------+| 1 | SIMPLE | range | range | PRIMARY,end_num | PRIMARY | 4 | NULL | 35 | Using where |±—±------------±--------±------±----------------±--------±--------±-----±-----±------------+
Others encountered this problem before.
References:
[LIST]
[] MySQL Forums :: Optimizer :: How to optimize a range (best answer I found but not good enough)
[] MySQL Performance Blog: Optimize IP Range Join / Daniweb: Optimize IP Range Join - MySQL
[] MySQL Performance Blog: Range Optimization
[] MySQL :: inner join on range criteria: unable to use index?
[] Sergey Petrunia’s blog Use of join buffer is now visible in EXPLAIN
[] MySQL :: MySQL 5.0 Reference Manual :: 7.2.5 Range Optimization
[] MySQL Bugs: #26963: Incorrect query results with CONST join compared to RANGE or ALL
[] MySQL Bugs: #9693: bug in trying to join a value on a range (using indexes) identical post on mysql performance forum
[/LIST]