I have a IP location lookup table with 20 million rows. I can perform about 1 sequential lookup per second and I need to get it up to about 50 per second.
CREATE TABLE `ip_lookup` ( `id` int(11) unsigned NOT NULL AUTO_INCREMENT, `start_ip_int` bigint(20) NOT NULL, `end_ip_int` bigint(20) NOT NULL, `country_name` varchar(50) CHARACTER SET latin1 DEFAULT NULL, `city_name` varchar(50) CHARACTER SET latin1 DEFAULT NULL, `custom_field1` varchar(30) CHARACTER SET latin1 DEFAULT NULL, `custom_field2` varchar(20) CHARACTER SET latin1 DEFAULT NULL, `custom_field3` varchar(30) CHARACTER SET latin1 DEFAULT NULL, PRIMARY KEY (`id`), KEY `start_ip_int` (`start_ip_int`), KEY `end_ip_int` (`end_ip_int`), ) ENGINE=InnoDB AUTO_INCREMENT=18245826 DEFAULT CHARSET=utf8;
I am processing a large number of historical web server logs. I can do queries in parallel (e.g. 10 processes all getting 1 query per second) but parallelising the log processing code is a much bigger task than I can handle right now. Maxmind supply a compiled lookup tool and their database in binary format. Lookups using that are lightning fast. How can I create something similar for an arbitrary dataset? I'd like to load the 20M rows into some magic binary indexer and get the code to make API calls to some daemon instead of MySQL lookups. I can't use Maxmind - the data I have is customised heavily. A lookup is simply:
select country_name, custom_field1, custom_field2 from ip_lookup where start_ip_int >= inet_aton('18.104.22.168') and inet_aton('22.214.171.124') <= end_ip_int limit 1
Theoretically, I could split up the 20M rows by netblock and store them as text files on disk and have a lookup table in the code to tell it which folder to look in. But there must be a better way! I have Percona Server 5.5.28-29.2 I'd be very grateful for any advice. Thanks!