Not the answer you need?
Register and ask your own question!

Best way to alter primary key on large innodb tables?

stephenhendersonstephenhenderson EntrantInactive User Role Beginner
Hi,

We're trying to alter the primary key on a large table to be more efficient with Innodb (we've noticed a significant drop in performance on the table since migrating from myisam and we're hypothesizing that the pk is a large factor).

The current primary key is a 2-column pair - a randomly generated varhchar(16) "id" and an integer "version". We'd like to replace this with an auto-increment integer and make (id,version)a unique index instead.

The other columns of the table are all longtext or longblob (full schema below). There are ~300k records, totaling ~20GB.

We've been experimenting with different approaches on our test environment to try and find the fastest way to make the change, but it's going a lot slower than we'd like (ideally we'd like to make the change on our production db with max 2 hours downtime).

So far we've tried two approaches:
1. Change existing table
- drop pk,
- alter table add new auto_inc pk
- add unique index

2. create new table and copy records
- create new table with auto_inc pk, no indexes
- INSERT (...) INTO newTable SELECT (...) from oldTable
- add unique index
- rename tables

The first approach took 5 hours, 3 of which just to drop the primary key. We're running the second test now, but it's already taken over 1.5 hours and hasn't finished copying the rows over.

Can anyone suggest any other ways which might be faster? Is 2 hours just completely unrealistic?

Server details:
mysql-5.0.72-enterprise-gpl-log
8GB ram
quad core


Thanks,
Stephen


schema (columns renamed):

CREATE TABLE `largeTable` ( `id` varchar(16) collate utf8_unicode_ci NOT NULL, `version` int(11) NOT NULL, `col_1` longblob NOT NULL, `col_2` longblob NOT NULL, `col_3` longblob NOT NULL, `col_4` longblob NOT NULL, `col_5` longblob NOT NULL, `col_6` longblob NOT NULL, `col_7` longblob NOT NULL, `col_8` longblob NOT NULL, `col_9` longblob NOT NULL, `col_10` longtext collate utf8_unicode_ci NOT NULL, `col_11` longtext collate utf8_unicode_ci NOT NULL, `col_12` longtext collate utf8_unicode_ci NOT NULL, `col_13` longblob NOT NULL, `col_14` longblob NOT NULL, `col_15` longblob NOT NULL, `col_16` longblob NOT NULL, `col_17` text collate utf8_unicode_ci NOT NULL, `col_18` longblob NOT NULL, PRIMARY KEY (`id`,`version`))</pre>

Comments

  • januzijanuzi Advisor Inactive User Role Beginner
    3.
    a.create new table with pk and index
    b.fetch 100(0/00) rows from old table
    c.no rows? exit
    d.insert them into new table
    e.jump to b

    b-d could be in script executed by cron
  • stephenhendersonstephenhenderson Entrant Inactive User Role Beginner
    Thanks! Copying rows in batches sped it up a bit, but unfortunately it still took 3.5 hours in total (

    It looks like the bottleneck is reading the rows from disk. It takes 40 seconds on average to copy a batch of 1000 rows (we tried different batch sizes from 10-10,000 but it didn't make much difference).

    Our best guess is that the table has become badly fragmented due to the poor primary key which means table scans involve a lot of jumping around on disc.

    Looks like we'll have to resort to something more fiddly while the service is running

    current plan:
    - get ids of all records at time X (there's a small secondary table with a a 1-to-1 relationship we can use to do this quickly)
    - in the background copy records with these ids to a new table with the new pk and index. Probably in small batches with sleep time inbetween to limit service impact.
    - once finished, lock the table and copy over any new/updated records which have come in since time X (the fiddly bit).
  • januzijanuzi Advisor Inactive User Role Beginner
    What was the query that fetched x rows ?
    Could You post it + explain result ?
  • stephenhendersonstephenhenderson Entrant Inactive User Role Beginner
    Hi,

    Thanks for your help with this.

    I tried two ways:
    a) Select as part of insert: "INSERT INTO ... (SELECT ... LIMIT offset, X);"
    b) Separate select and insert queries:
    "SELECT ... LIMIT offset,X"
    <process result set>
    "INSERT ..."

    With b) the select was taking 99% of the total time per batch. Here's the per-batch select with explain output:


    EXPLAIN SELECT `id`, `version`, `col_1`, `col_2`, `col_3`, `col_4`, `col_5`,`col_6`, `col_7`, `col_8`, `col_9`,`col_10`, `col_11`, `col_12`, `col_13`, `col_14`,`col_15`,`col_16`, `col_17`,`col_18` FROM largeTableLIMIT 1000, 3000+----+
    +
    +
    +
    +
    +
    +
    +
    +
    +| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |+----+
    +
    +
    +
    +
    +
    +
    +
    +
    +| 1 | SIMPLE | largeTable | ALL | NULL | NULL | NULL | NULL | 288412 | |+----+
    +
    +
    +
    +
    +
    +
    +
    +
    +</pre>


    I tried adding an explicit "ORDER BY id, version" clause to the query, but it didn't give any improvement in query time. Here's the explain:


    +----+
    +
    +
    +
    +
    +
    +
    +
    +
    +| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |+----+
    +
    +
    +
    +
    +
    +
    +
    +
    +| 1 | SIMPLE | largeTable | index | NULL | PRIMARY | 56 | NULL | 288412 | |+----+
    +
    +
    +
    +
    +
    +
    +
    +
    +</pre>



    NB. the size of the records doesn't seem to be a major factor, if the records are in the query cache the time drops from 40 seconds to 0.4 seconds per batch (we can see this when we forget to reset the cache inbetween test runs).
  • januzijanuzi Advisor Inactive User Role Beginner
    Innodb has got inner primary key (as I remember), maybe You could use it ?

    Or You could order by WITH where. First run should be very slow, but second, third etc. should fetch less rows than in previous runs.
  • plasticoplastico Entrant Current User Role Beginner
    e.jump to b
    _____________________________________________________________________________________________________________________________________
    [url=http://plasticobiodegradable.com/polipropileno-todo-lo-que-tienes-que-saber/
  • systems28347systems28347 Entrant Inactive User Role Beginner
    So far we've tried two approaches:
    1. Change existing table
    - drop pk,
    - alter table add new auto_inc pk
    - add unique index

    As far as I understand, you're doing all those steps separately; this is far from suboptimal. since Innodb uses a clustered index, when "drop pk" is executed, the resulting table has data stored on an internally generated PK; all the indices are rebuild as well, as they point to the PK. On step 2, a new table is created again, with data stored on the specified PK; indices are regenerated again.

    Try executing all in a single ALTER TABLE, and see how much time it's going to take. MySQL should be smart enough to use the shortest path.
Sign In or Register to comment.

MySQL, InnoDB, MariaDB and MongoDB are trademarks of their respective owners.
Copyright ©2005 - 2020 Percona LLC. All rights reserved.