Slow bulk loads innodb


I am running out of ideas how to speed up bulk loads of my data (approx 30m rows) into a single innodb table. The load is running at about 200r/s. The same bulk load in an otherwise identical MyISAM table is running at about 3k r/s.

Here is what I am running (I am using mk-fifo-split script to load in chunks of various sizes, currently at 10k per chunk so I can see progress readily)

set foreign_key_checks=0;
set sql_log_bin=0;
set unique_checks=0;
LOAD DATA LOCAL INFILE ‘/tmp/mk-fifo-split’ INTO TABLE cdr_test2 CHARACTER SET ‘UTF8’ FIELDS TERMINATED BY ‘,’ ENCLOSED BY ‘"’ LINES TERMINATED BY ‘\n’ (CallID,ParentCallID,SessionID,ParentSessionID,SipSessionID, AccountID,ApplicationID,PPID,StartTime,EndTime,Duration,Outb ound,Status,Network,Channel,StartUrl,CalledID,CallerID,Servi ceID,PhoneNumberSid,Disposition,RecordingDuration,DateCreate d,BrowserIP,ScriptThrowable,applicationType)

The box is on ec2 (ebs volume) with 15G ram and 1TB disk with 4 (HT) cores. The source data and database files are both on the same ebs volume.

I will add that I have also made the following variable changes which have not improved the load speed much at all, if they have it’s not been very noticeable.

innodb_buffer_pool_size = 5125M
innodb_log_file_size = 1024M
innodb_log_buffer_size = 8M

I have also tried setting this prior to a load:

set global innodb_flush_log_at_trx_commit=0;

Any ideas or help would be much appreciated,

What data type is the PK column?

It appears that InnoDB requires a PK even for bulk loads (?), and if the clustered key is non-sequential, then engine could be performing a lot of maintenance work.

Well actually i did not specify a PK for this table (nor does it have any indexes). Reading on innodb is telling me I should be explicit about the PK. If I were it would be a GUID which is probably not ideal. I’m going to try another more sensible column for a PK and see what happens