System: Ubuntu 2.6.22-14.46-server (i386 variant) Athlon 64 X2 Dual Core 2.1GHz, 4GB Ram, 250GB WD2500AAJS system drive, 2 x 500GB WD5000AAKS in software RAID-0 for databases, MySQL Server version: 5.0.45-Debian_1ubuntu3.1-log Debian etch distribution.
I have a database with 10 identically defined MyISAM tables containing from 6.6M to 731M rows (down from 1 table with 1.9B rows in first incarnation!).
There will be no insertions into the tables once created - it is a research project where the investigator wants to look for relationships; changes will be accommodated by rebuilding all tables from a new analysis.
I used a scheme I saw documented by Peter somewhere (can’t find it now) of creating the un-indexed tables with “load data infile …”, then copying the .frm and .MYI files from an empty table with the indexes defined, followed by “myisamchk --quick --recover …”
The following table is used as a source of .MYI and .frm files:
CREATE TABLE Events_Empty_Indexed ( x smallint(5) NOT NULL, y smallint(5) NOT NULL, start_day int(11) NOT NULL default '0', year smallint, month tinyint, day tinyint, num_days tinyint unsigned default '1', total_precip smallint(5) unsigned default '0', max_precip smallint(5) unsigned default '0', dist char(11) default '', deltas char(11) default '', KEY idx1 (x, y, year, month), KEY idx3 (start_day), KEY idx6 (total_precip), KEY idx7 (dist) ) ENGINE=MyISAM DEFAULT CHARSET=latin1;
and this is the definition of all the initial, un-indexed tables:
CREATE TABLE Events_9 ( x smallint(5) NOT NULL, y smallint(5) NOT NULL, start_day int(11) NOT NULL default '0', year smallint, month tinyint, day tinyint, num_days tinyint unsigned default '9', total_precip smallint(5) unsigned default '0', max_precip smallint(5) unsigned default '0', dist char(11) default '', deltas char(11) default '' ) ENGINE=MyISAM DEFAULT CHARSET=latin1;
While I’m not too unhappy with the performance of the “load data…” things can be improved later: I’m currently seeing from 215K to 288K records/sec insertions.
Question 1 is “what parameters affect “load data infile…” operations?”
There are no indexes defined at the time so are any key_* parameters important?
How about read_buffer* and write_buffer*?
The records are mainly integers so compression probably isn’t necessary?
My main concern is the performance of myisamchk. I stop mysql entirely while I build the tables and no other processes are consuming much resources. I’ve tried a bunch of different parameter settings for sort_buffer_size (from 100M to 3G), key_buffer_size (from 100M to 3G), read_buffer_size (from 25M to 50M), write_buffer_size (from 25M to 50M), --recover and --parallel-recover, always --quick, and I am disappointed with the performance (36K rows/sec on the big tables up to 92K/sec on the smallest). It takes 5-1/2 hours to build indexes for 731M rows! and all night for the whole database.
I experimented with a bunch of different settings for buffer sizes etc. on 1 table repeatedly, but found little difference in the time taken, perhaps 10% at the most (parallel won by a very slim margin).
I recall reading in the forum somewhere that a sort on disk might be faster in some cases than trying to cache a large amount of index. These creation runs need to process full tables, so maybe I’m allocating too much memory for buffers?
So the accumulated question 2 is: what parameters affect myisamchk the most, and any suggestions for speeding up the index creatation would be gratefully accepted.