varchar index killing csv import

I have a large csv file of about 2 million domain names from urlblacklist.com

I have an app that queries the table a lot and in order to lookup the domain names fast they need to be indexed. The table is innodb that has columns for domain and categoryID, the combination of these two are the unique key.

When I try to load 2 million records into the table (load infile) it gets to about 1.5 million then starts to have a real hard time. I let it run for 12 hours but it didnt finish so I gave up.

My guess is that this is because it is having to check for a unique key on every insert, so as the number of records grows I am put in a worse and worse situation.

Is there something I can do to make this work? Is mysql just not the right tool? The oracle buffs out there are saying I am just using the wrong platform, is there something I can do to reasonably use mysql for this application?

You’re right.

During Load Data Infile MySQL checks UNIQUE indexes at once while other indexes are built after load is done.

To optimize it you can ether:

  • have your key_buffer large enough for index fits well into it.
  • load URLs sorted by URL in this case unique checks will be fast.
[B]ievolve wrote on Sat, 24 February 2007 19:48[/B]
Is mysql just not the right tool? The oracle buffs out there are saying I am just using the wrong platform, is there something I can do to reasonably use mysql for this application?

I think InnoDB is wrong tool for this. As soon as your active data set size outgrows buffer pool, you start getting very serious performance problems.

If you data set outgrowth caches you’ll have the problems in any case. Innodb is similar to MyISAM in this case, it just can be much larger because indexes are not packed.

There is no silver bullet out where and each choice has benefits and drawbacks.

In this case the concern was not performance but the fact alerts are dropped because of MyISAM table locks.