How can I speed up SELECTs on a static table with 20M rows?

zpool · May 20, 2013, 5:48am

Hi!

	I have a IP location lookup table with 20 million rows. I can perform about 1 sequential lookup per second and I need to get it up to about 50 per second.

CREATE TABLE `ip_lookup` (
		`id` int(11) unsigned NOT NULL AUTO_INCREMENT,
		`start_ip_int` bigint(20) NOT NULL,
		`end_ip_int` bigint(20) NOT NULL,
		`country_name` varchar(50) CHARACTER SET latin1 DEFAULT NULL,
		`city_name` varchar(50) CHARACTER SET latin1 DEFAULT NULL,
		`custom_field1` varchar(30) CHARACTER SET latin1 DEFAULT NULL,
		`custom_field2` varchar(20) CHARACTER SET latin1 DEFAULT NULL,
		`custom_field3` varchar(30) CHARACTER SET latin1 DEFAULT NULL,
		PRIMARY KEY (`id`),
		KEY `start_ip_int` (`start_ip_int`),
		KEY `end_ip_int` (`end_ip_int`),
		) ENGINE=InnoDB AUTO_INCREMENT=18245826 DEFAULT CHARSET=utf8;

	I am processing a large number of historical web server logs. I can do queries in parallel (e.g. 10 processes all getting 1 query per second) but parallelising the log processing code is a much bigger task than I can handle right now.
	
	Maxmind supply a compiled lookup tool and their database in binary format. Lookups using that are lightning fast.
	
	How can I create something similar for an arbitrary dataset? I'd like to load the 20M rows into some magic binary indexer and get the code to make API calls to some daemon instead of MySQL lookups. I can't use Maxmind - the data I have is customised heavily.
	
	A lookup is simply:

select country_name, custom_field1, custom_field2 from ip_lookup where start_ip_int >= inet_aton('74.125.28.99') and inet_aton('74.125.28.99') <= end_ip_int limit 1

	Theoretically, I could split up the 20M rows by netblock and store them as text files on disk and have a lookup table in the code to tell it which folder to look in. But there must be a better way! I have Percona Server 5.5.28-29.2
	
	I'd be very grateful for any advice.
	Thanks!

gmouse · May 21, 2013, 7:37am

Due to the nature of B-tree indices, all rows satisfying start_ip_int >= inet_aton(‘74.125.28.99’) are retrieved, and checked for inet_aton(‘74.125.28.99’) <= end_ip_int (or vice versa).

Is the inequality for start_ip_int correct? (>= instead of <=)

If ranges do not overlap, use this:
select country_name, custom_field1, custom_field2 from ip_lookup where start_ip_int <= inet_aton(‘74.125.28.99’) order by start_ip_int desc limit 1

zpool · May 21, 2013, 8:55am

Hi gmouse,

Thanks for your response. You’re correct, it was supposed to be start_ip_int >= inet_aton(). Good suggestion to leave the end_ip_int out of the query but I just tried it and there was no performance improvement, and it introduced some incorrect results (presumably I need to sort the results which slow things down).

If I replace the MySQL lookup with an API call to the Maxmind binary database I get a massive performance increase (like 300/second). I didn’t think MySQL could compete with that for sequential lookups? As i’m trying to process a huge amount of web logs (probably over a million unique IPs), I was hoping for some way to create my own version of whatever Maxmind do in their binary… I can’t use maxmind because I have my own custom fields but i’d really like something that I can export the MySQL table into that magically speeds it up.

I’ve been messing around with Sphinx which is good for getting the hits off my busy MySQL server, but it didn’t give me the performance boost I need either.

Thanks!

gmouse · May 21, 2013, 11:30am

The query should be as fast as maxmind and you can even include the restriction on end_ip if you get wrong results. The extra ORDER BY is the most important part (use ASC for >= instead of <=).

Check EXPLAIN whether the index on start_ip_int is used, and try FORCE INDEX if necessary.

zpool · May 21, 2013, 6:52pm

You’re correct it needed the indexed forced… It’s given me about an 8x increase so thanks very much!

SELECT country_name, custom_field1, custom_field2 FROM ip_lookup FORCE INDEX (ix_natural) WHERE inet_aton("$ip") >= startip AND inet_aton("$ip") <= endip AND " LIMIT 1

Unfortunately it’s still way slower than hitting up something like the API to maxmind binary DB.

gmouse · May 22, 2013, 6:41am

If you’re not reading my messages, what do you expect on these forums?

Topic		Replies	Views
Query Speed Help Other MySQL® Questions	1	346	January 10, 2007
Speed up large table SQL select query. Other MySQL® Questions	3	759	September 28, 2008
Optimisation of SELECT on a very large table Other MySQL® Questions	2	441	March 1, 2007
Need Faster query results Other MySQL® Questions	5	489	July 17, 2009
13GB DB - Performance Question Other MySQL® Questions	22	1237	May 31, 2007

How can I speed up SELECTs on a static table with 20M rows?

Related topics