ALTER TABLE performance with MyISAM

I need to add a column to a table which consists of 3 million rows, 400MB data and 600MB index file. I tried it on my - powerful enough - development box with several MySQL configurations and the ALTER TABLE took about 23 minutes. Most of the time goes on copying the data file, NOT on recreating indexes: 18 against 5 minutes. Is this the best I can expect?

I find it strange that copying 400MB data takes 18 minutes. It is not a fair comparison, but simply copying that file takes 12 seconds. Moreover the whole thing could comfortably fit in memory.

Indeed, I think the task is CPU bound, although processor usage stays at about 50% (on both cores). During the copying of the data file Windows Task Manager doesn’t show too many IOs and I don’t hear the hard disk either.

The machine is a dual core AMD 3800, 2G RAM, 10000RPM SATA HD. An example MySQL 5.0.37 configuration:
myisam_sort_buffer_size=256M
key_buffer_size=256M (MySQL Admin shows only half of it is used)
read_buffer_size=2M (increasing it doesn’t help)
read_rnd_buffer_size=8M (increasing it doesn’t help)
sort_buffer_size=2M (increasing it doesn’t help)

[B]Peter Zaitsev[/B]
The workaround which I found so far is really ugly, however I've seen users using it with good success. - You can create table of the same structure without keys, - load data into it to get correct .MYD, - Create table with all keys defined and copy over .frm and .MYI files from it, - followed by FLUSH TABLES. - Now you can use REPAIR TABLE to rebuild all keys by sort, including UNIQUE keys.
From here: [B]MySQL: Loading large tables with Unique Keys[/B] [URL]http://peter-zaitsev.livejournal.com/11772.html[/URL]

Others have had the same question.
So it may be helpful to have the entire post here.

Here is the whole post

[B]Quote:[/B]
peter_zaitsev (peter_zaitsev) wrote, @ 2005-01-10 16:43:00

MySQL: Loading large tables with Unique Keys

Normally MySQL is rather fast loading data in MyISAM table, but there is exception, which is when it can’t rebuild indexes by sort but builds them row by row instead. It can be happening due to wrong configuration (ie too small myisam_max_sort_file_size or myisam_max_extra_sort_file_size) or it could be just lack of optimization, if you’re having large (does not fit in memory) PRIMARY or UNIQUE indexes.

Below is the internal letter I’ve sent out on this subject which I guessed would be good to share.

Today on my play box I tried to load data into MyISAM table (which was previously dumped as mysqldump --tab)

The data was some 1.3G, 15.000.000 rows, 512MB memory one the box.
MySQL 4.1.8

One could could call it trivial fast task, unfortunately I had
unique key on varchar(128) as part of the schema.

The load took some 3 hours before I aborted it finding out it was just about 20% done.

The problem is - unique keys are always rebuilt using key_cache, which means we’re down to some 100-200 rows/sec as soon as index becomes significantly larger than memory.

You can’t go away with ALTER TABLE DISABLE KEYS as it does not affect unique keys.

Actually even Innodb with all its transactional overhead and inability to repair keys by sort can do better if SET UNIQUE_CHECKS=0 is used.

What I would guess could be done for MyISAM ?
At least it could support UNIQUE_CHECKS=0 leaving on user
responsibility if it is really so. If I’m restoring from backup I’m pretty sure about that.

If we want neat solution check if key is really unique could be done after data is loaded. It is much faster, especially for physically sorted key we end up with.

The workaround which I found so far is really ugly, however I’ve seen users using it with good success.

  • You can create table of the same structure without keys,
  • load data into it to get correct .MYD,
  • Create table with all keys defined and copy over .frm and .MYI files from it,
  • followed by FLUSH TABLES.
  • Now you can use REPAIR TABLE to rebuild all keys by sort, including UNIQUE keys.

Unbelievable.
(Anonymous)
2005-03-09 11:44 pm UTC (link)
I have started using your trick, to create a table and copy over the *.frm/MYI files and do a repair with myisamchk.

Let me tell you that this is the most impressive, slickest way of getting around mysql’s lack of speed. I don’t think mysql behaves large myisam_max_* settings. Although show variables will show them set very high (600G), it always defaults to repair by key cache.

I used to wait around 4-6 hours for my tables to convert. Today, I swear to god it was TOPS 15 minutes, and that included a --sort-records after.

Now the tables are blazing. Very slick Peter.

I hope you keep writing these good articles, i come by and read them… never post until now.

Thanks,
J.

That ALTER TABLE workaround should work in your situation.

I have successfully used your proposed workaround on 3 very large tables partitioned thanks to MySQL 5.1.

For two tables (20 million records) I used the workaround as is in order to add a column, and the “ALTER TABLE” was finished in minutes.
Then I rebuilt the index looping on each of the MYI file of my partitioned tables.

For the biggest table (1 billion records), I wanted to drop a column (not indexed). It was so slow I had to stop MySQL - although using no key at all (nor primary) and after having explicitly disabled keys.

Absolutly any idea about the reason why MySQL took this infinite time trying to alter this table.

Since I havent’t found any other way I decided to write a piece of C code to do the job.
I have used specs from
[URL]MySQL

Fortunately I was in the easiest case as my table has fixed-length records and no null fields. In this case you just have to take into account one-byte myisam header and your fixed-length fields.

I ran this code on each of the MYD file of my partitionned table.
An hour later the job was done and one day later index were rebuilt using myisamchk.

Thanks for this workaround !