optimize IP range join: postgresql is 496 times faster

nluv4hs · September 22, 2008, 8:51pm

I am very curious to know how to write the following join query for MySQL so it will perform as well as PostgreSQL. I never used PostgreSQL before now. I like MySQL and I always use it. However I was disappointed by its performance on this query and I installed PostgreSQL to compare. The purpose of this query is to map ip addresses to countries. On my CentOS 5 machine, MySQL 5.0.45 takes 10 mins 45 seconds, PostgreSQL 8.1.11 takes 0 mins 1.3 seconds. Yes MySQL is 496 times slower. I’m sure there must be a way to hint MySQL to run faster. Here is the query:

mysql> select range.id_country from address join range on address.address between range.begin_num and range.end_num;mysql> describe address;±--------±-----------------±-----±----±--------±------+| Field | Type | Null | Key | Default | Extra |±--------±-----------------±-----±----±--------±------+| address | int(10) unsigned | YES | | NULL | |±--------±-----------------±-----±----±--------±------+mysql> describe range;±------------±--------------------±-----±----±--------±------+| Field | Type | Null | Key | Default | Extra |±------------±--------------------±-----±----±--------±------+| begin_num | int(10) unsigned | NO | PRI | | || end_num | int(10) unsigned | YES | UNI | NULL | || id_country | tinyint(3) unsigned | YES | MUL | NULL | |±------------±--------------------±-----±----±--------±------+

Both tables are MyISAM type. Table address is 2124 rows (all distinct). Table range is 105920 rows (all distinct).

The best answer I found so far is here. Steinbrink gives a way to write it as a join on subquery that my MySQL finished in 6 min 42 sec (63% the time of the simple join version). That is still 310 times slower than PostgreSQL. Too slow!

Actually there is a small error in the SQL at that url:

ORDER BY ip_address DESC
should be:

ORDER BY start DESC

Here is MySQL’s explanation of the original query:

mysql> explain select range.id_country from address join range on address.address between range.begin_num and range.end_num;±—±------------±--------±-----±----------------±-----±--------±-----±-------±-----------------------------------------------+| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |±—±------------±--------±-----±----------------±-----±--------±-----±-------±-----------------------------------------------+| 1 | SIMPLE | address | ALL | NULL | NULL | NULL | NULL | 2124 | || 1 | SIMPLE | range | ALL | PRIMARY,end_num | NULL | NULL | NULL | 105920 | Range checked for each record (index map: 0x7) |±—±------------±--------±-----±----------------±-----±--------±-----±-------±-----------------------------------------------+

Here is PostgreSQL’s explanation of the original query:

postgresql# explain select range.id_country from address join range on address.address between range.begin_num and range.end_num; QUERY PLAN----------------------------------------------------------------------------------------------------------------- Nested Loop (cost=5.72…7061709.90 rows=83990942 width=2) → Seq Scan on range (cost=0.00…3316.47 rows=185547 width=18) → Bitmap Heap Scan on address (cost=5.72…31.25 rows=453 width=8) Recheck Cond: ((address.address >= “outer”.begin_num) AND (address.address <= “outer”.end_num)) → Bitmap Index Scan on addresses_pkey (cost=0.00…5.72 rows=453 width=0) Index Cond: ((address.address >= “outer”.begin_num) AND (address.address <= “outer”.end_num))

Here is MySQL’s explanation showing that a simple query can use the index on begin_num in a “range” type query plan:

Others encountered this problem before.
References:

[LIST]
[] MySQL Forums :: Optimizer :: How to optimize a range (best answer I found but not good enough)
[] MySQL Performance Blog: Optimize IP Range Join / Daniweb: Optimize IP Range Join - MySQL
[] MySQL Performance Blog: Range Optimization
[] MySQL :: inner join on range criteria: unable to use index?
[] Sergey Petrunia’s blog Use of join buffer is now visible in EXPLAIN
[] MySQL :: MySQL 5.0 Reference Manual :: 7.2.5 Range Optimization
[] MySQL Bugs: #26963: Incorrect query results with CONST join compared to RANGE or ALL
[] MySQL Bugs: #9693: bug in trying to join a value on a range (using indexes) identical post on mysql performance forum
[/LIST]

shantanuo · July 17, 2009, 10:40pm

Hi,
You can try > and < instead of between.

ON address.address > ranges.begin_num AND address < ranges.end_num

gmouse · July 20, 2009, 5:50am

MySQL chooses the wrong join order. Try this:

SELECT range.id_country
FROM range
STRAIGHT_JOIN address ON (address.address between range.begin_num and range.end_num);

Note that the time this query takes, depends on the number of ranges and not so much on the number of addresses.

The range scan on an index that you give is nice, but not very useful. It is simply not selective enough.

gmouse · July 20, 2009, 5:51am

Old thread alert )

nluv4hs · July 29, 2009, 5:45pm

Thanks for the replies. I’ll try it.

gmouse · July 30, 2009, 4:53am

Let me know your findings.

Topic		Replies	Views
Optimize IP Range Join Other MySQL® Questions	0	316	June 22, 2008
Query Speed Help Other MySQL® Questions	1	343	January 10, 2007
Performance using BETWEEN Other MySQL® Questions	1	445	March 31, 2007
Optimize join query Other MySQL® Questions	1	349	March 21, 2009
Profile query to find slowest join Other MySQL® Questions	1	522	April 16, 2013

optimize IP range join: postgresql is 496 times faster

Related topics