Best way to alter primary key on large innodb tables?


We’re trying to alter the primary key on a large table to be more efficient with Innodb (we’ve noticed a significant drop in performance on the table since migrating from myisam and we’re hypothesizing that the pk is a large factor).

The current primary key is a 2-column pair - a randomly generated varhchar(16) “id” and an integer “version”. We’d like to replace this with an auto-increment integer and make (id,version)a unique index instead.

The other columns of the table are all longtext or longblob (full schema below). There are ~300k records, totaling ~20GB.

We’ve been experimenting with different approaches on our test environment to try and find the fastest way to make the change, but it’s going a lot slower than we’d like (ideally we’d like to make the change on our production db with max 2 hours downtime).

So far we’ve tried two approaches:

  1. Change existing table
  • drop pk,
  • alter table add new auto_inc pk
  • add unique index
  1. create new table and copy records
  • create new table with auto_inc pk, no indexes
  • INSERT (…) INTO newTable SELECT (…) from oldTable
  • add unique index
  • rename tables

The first approach took 5 hours, 3 of which just to drop the primary key. We’re running the second test now, but it’s already taken over 1.5 hours and hasn’t finished copying the rows over.

Can anyone suggest any other ways which might be faster? Is 2 hours just completely unrealistic?

Server details:
8GB ram
quad core


schema (columns renamed):

CREATE TABLE largeTable ( id varchar(16) collate utf8_unicode_ci NOT NULL, version int(11) NOT NULL, col_1 longblob NOT NULL, col_2 longblob NOT NULL, col_3 longblob NOT NULL, col_4 longblob NOT NULL, col_5 longblob NOT NULL, col_6 longblob NOT NULL, col_7 longblob NOT NULL, col_8 longblob NOT NULL, col_9 longblob NOT NULL, col_10 longtext collate utf8_unicode_ci NOT NULL, col_11 longtext collate utf8_unicode_ci NOT NULL, col_12 longtext collate utf8_unicode_ci NOT NULL, col_13 longblob NOT NULL, col_14 longblob NOT NULL, col_15 longblob NOT NULL, col_16 longblob NOT NULL, col_17 text collate utf8_unicode_ci NOT NULL, col_18 longblob NOT NULL, PRIMARY KEY (id,version))

a.create new table with pk and index
b.fetch 100(0/00) rows from old table rows? exit
d.insert them into new table
e.jump to b

b-d could be in script executed by cron

Thanks! Copying rows in batches sped it up a bit, but unfortunately it still took 3.5 hours in total (

It looks like the bottleneck is reading the rows from disk. It takes 40 seconds on average to copy a batch of 1000 rows (we tried different batch sizes from 10-10,000 but it didn’t make much difference).

Our best guess is that the table has become badly fragmented due to the poor primary key which means table scans involve a lot of jumping around on disc.

Looks like we’ll have to resort to something more fiddly while the service is running

current plan:

  • get ids of all records at time X (there’s a small secondary table with a a 1-to-1 relationship we can use to do this quickly)
  • in the background copy records with these ids to a new table with the new pk and index. Probably in small batches with sleep time inbetween to limit service impact.
  • once finished, lock the table and copy over any new/updated records which have come in since time X (the fiddly bit).

What was the query that fetched x rows ?
Could You post it + explain result ?


Thanks for your help with this.

I tried two ways:
a) Select as part of insert: “INSERT INTO … (SELECT … LIMIT offset, X);”
b) Separate select and insert queries:
“SELECT … LIMIT offset,X”
<process result set>

With b) the select was taking 99% of the total time per batch. Here’s the per-batch select with explain output:

EXPLAIN SELECT id, version, col_1, col_2, col_3, col_4, col_5,col_6, col_7, col_8, col_9,col_10, col_11, col_12, col_13, col_14,col_15,col_16, col_17,col_18 FROM largeTableLIMIT 1000, 3000±—±------------±-----------±-----±--------------±-----±--------±-----±-------±------+| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |±—±------------±-----------±-----±--------------±-----±--------±-----±-------±------+| 1 | SIMPLE | largeTable | ALL | NULL | NULL | NULL | NULL | 288412 | |±—±------------±-----------±-----±--------------±-----±--------±-----±-------±------+

I tried adding an explicit “ORDER BY id, version” clause to the query, but it didn’t give any improvement in query time. Here’s the explain:

±—±------------±-----------±------±--------------±--------±--------±-----±-------±------+| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |±—±------------±-----------±------±--------------±--------±--------±-----±-------±------+| 1 | SIMPLE | largeTable | index | NULL | PRIMARY | 56 | NULL | 288412 | |±—±------------±-----------±------±--------------±--------±--------±-----±-------±------+

NB. the size of the records doesn’t seem to be a major factor, if the records are in the query cache the time drops from 40 seconds to 0.4 seconds per batch (we can see this when we forget to reset the cache inbetween test runs).

Innodb has got inner primary key (as I remember), maybe You could use it ?

Or You could order by WITH where. First run should be very slow, but second, third etc. should fetch less rows than in previous runs.

e.jump to b


As far as I understand, you’re doing all those steps separately; this is far from suboptimal. since Innodb uses a clustered index, when “drop pk” is executed, the resulting table has data stored on an internally generated PK; all the indices are rebuild as well, as they point to the PK. On step 2, a new table is created again, with data stored on the specified PK; indices are regenerated again.

Try executing all in a single ALTER TABLE, and see how much time it’s going to take. MySQL should be smart enough to use the shortest path.