Not the answer you need?
Register and ask your own question!

myisamchk performance issues.

QvidNonQvidNon EntrantInactive User Role Participant
System: Ubuntu 2.6.22-14.46-server (i386 variant) Athlon 64 X2 Dual Core 2.1GHz, 4GB Ram, 250GB WD2500AAJS system drive, 2 x 500GB WD5000AAKS in software RAID-0 for databases, MySQL Server version: 5.0.45-Debian_1ubuntu3.1-log Debian etch distribution.

I have a database with 10 identically defined MyISAM tables containing from 6.6M to 731M rows (down from 1 table with 1.9B rows in first incarnation!).

There will be no insertions into the tables once created - it is a research project where the investigator wants to look for relationships; changes will be accommodated by rebuilding all tables from a new analysis.

I used a scheme I saw documented by Peter somewhere (can't find it now) of creating the un-indexed tables with "load data infile ...", then copying the .frm and .MYI files from an empty table with the indexes defined, followed by "myisamchk --quick --recover ..."


The following table is used as a source of .MYI and .frm files:

CREATE TABLE Events_Empty_Indexed (
x smallint(5) NOT NULL,
y smallint(5) NOT NULL,
start_day int(11) NOT NULL default '0',
year smallint,
month tinyint,
day tinyint,
num_days tinyint unsigned default '1',
total_precip smallint(5) unsigned default '0',
max_precip smallint(5) unsigned default '0',
dist char(11) default '',
deltas char(11) default '',
KEY idx1 (x, y, year, month),
KEY idx3 (start_day),
KEY idx6 (total_precip),
KEY idx7 (dist)
) ENGINE=MyISAM DEFAULT CHARSET=latin1;

and this is the definition of all the initial, un-indexed tables:

CREATE TABLE Events_9 (
x smallint(5) NOT NULL,
y smallint(5) NOT NULL,
start_day int(11) NOT NULL default '0',
year smallint,
month tinyint,
day tinyint,
num_days tinyint unsigned default '9',
total_precip smallint(5) unsigned default '0',
max_precip smallint(5) unsigned default '0',
dist char(11) default '',
deltas char(11) default ''
) ENGINE=MyISAM DEFAULT CHARSET=latin1;


While I'm not too unhappy with the performance of the "load data..." things can be improved later: I'm currently seeing from 215K to 288K records/sec insertions.

Question 1 is "what parameters affect "load data infile..." operations?"
There are no indexes defined at the time so are any key_* parameters important?
How about read_buffer* and write_buffer*?
The records are mainly integers so compression probably isn't necessary?

My main concern is the performance of myisamchk. I stop mysql entirely while I build the tables and no other processes are consuming much resources. I've tried a bunch of different parameter settings for sort_buffer_size (from 100M to 3G), key_buffer_size (from 100M to 3G), read_buffer_size (from 25M to 50M), write_buffer_size (from 25M to 50M), --recover and --parallel-recover, always --quick, and I am disappointed with the performance (36K rows/sec on the big tables up to 92K/sec on the smallest). It takes 5-1/2 hours to build indexes for 731M rows! and all night for the whole database.

I experimented with a bunch of different settings for buffer sizes etc. on 1 table repeatedly, but found little difference in the time taken, perhaps 10% at the most (parallel won by a very slim margin).

I recall reading in the forum somewhere that a sort on disk might be faster in some cases than trying to cache a large amount of index. These creation runs need to process full tables, so maybe I'm allocating too much memory for buffers?

So the accumulated question 2 is: what parameters affect myisamchk the most, and any suggestions for speeding up the index creatation would be gratefully accepted.
Sign In or Register to comment.

MySQL, InnoDB, MariaDB and MongoDB are trademarks of their respective owners.
Copyright ©2005 - 2020 Percona LLC. All rights reserved.