pt-online-schema-change doesn't support updating PRIMARY KEY or unique index

In the process of altering a big table with pt-online-schema-change, I updated the PRIMARY KEY of one record. When pt-online-schema-change reported it has successfully altered the table, I found the record that I have just updated still existed. It is because the update trigger pt-online-schema-change create replaces the new record into the new table. The test procedure is as follows:

  1. Create a table.

CREATE TABLE ord (
id int(11) NOT NULL AUTO_INCREMENT,
customer_id int(11) DEFAULT NULL,
good_id int(11) DEFAULT NULL,
create_time datetime DEFAULT NULL,
datetime_lastchanged datetime DEFAULT NULL ON UPDATE CURRENT_TIMESTAMP,
PRIMARY KEY (id)
) ENGINE=InnoDB;

  1. Insert 17000000+ rows into the table.

The first 20 records are as follows:

mysql> select * from ord limit 20;
±—±------------±--------±--------------------±---------------------+
| id | customer_id | good_id | create_time | datetime_lastchanged |
±—±------------±--------±--------------------±---------------------+
| 1 | 6 | 1 | 2015-06-15 14:46:03 | 2015-06-26 15:52:47 |
| 10 | 7 | 2 | 2015-06-15 14:46:14 | 2015-06-26 15:52:52 |
| 11 | 3 | 5 | 2015-06-15 14:14:05 | 2015-06-26 15:53:47 |
| 12 | 1 | 2 | 2015-06-26 16:40:54 | 2015-06-26 16:40:54 |
| 13 | 6 | 1 | 2015-06-15 14:46:03 | 2015-06-26 15:52:47 |
| 14 | 7 | 2 | 2015-06-15 14:46:14 | 2015-06-26 15:52:52 |
| 15 | 3 | 5 | 2015-06-15 14:14:05 | 2015-06-26 15:53:47 |
| 16 | 1 | 2 | 2015-06-26 16:40:54 | 2015-06-26 16:40:54 |
| 20 | 6 | 1 | 2015-06-15 14:46:03 | 2015-06-26 15:52:47 |
| 21 | 7 | 2 | 2015-06-15 14:46:14 | 2015-06-26 15:52:52 |
| 22 | 3 | 5 | 2015-06-15 14:14:05 | 2015-06-26 15:53:47 |
| 23 | 1 | 2 | 2015-06-26 16:40:54 | 2015-06-26 16:40:54 |
| 24 | 6 | 1 | 2015-06-15 14:46:03 | 2015-06-26 15:52:47 |
| 25 | 7 | 2 | 2015-06-15 14:46:14 | 2015-06-26 15:52:52 |
| 26 | 3 | 5 | 2015-06-15 14:14:05 | 2015-06-26 15:53:47 |
| 27 | 1 | 2 | 2015-06-26 16:40:54 | 2015-06-26 16:40:54 |
| 35 | 6 | 1 | 2015-06-15 14:46:03 | 2015-06-26 15:52:47 |
| 36 | 7 | 2 | 2015-06-15 14:46:14 | 2015-06-26 15:52:52 |
| 37 | 3 | 5 | 2015-06-15 14:14:05 | 2015-06-26 15:53:47 |
| 38 | 1 | 2 | 2015-06-26 16:40:54 | 2015-06-26 16:40:54 |
±—±------------±--------±--------------------±---------------------+
20 rows in set (0.00 sec)

  1. Altering the table with pt-online-schema-change.

[admin@CentOS6 ~]$ pt-online-schema-change u=root,p=pass,D=db1,t=ord --alter=“add name varchar(20)” --statistics --execute
No slaves found. See --recursion-method if host CentOS6.4study has slaves.
Not checking slave lag because no slaves were found and --check-slave-lag was not specified.
Operation, tries, wait:
copy_rows, 10, 0.25
create_triggers, 10, 1
drop_triggers, 10, 1
swap_tables, 10, 1
update_foreign_keys, 10, 1
Altering db1.ord
Creating new table…
Created new table db1._ord_new OK.
Altering new table…
Altered db1._ord_new OK.
2015-08-10T10:35:59 Creating triggers…
2015-08-10T10:35:59 Created triggers OK.
2015-08-10T10:35:59 Copying approximately 16312164 rows…
Copying db1.ord: 8% 05:45 remain
##########################################################
#######pt-online-schema-change is still running!!!########
##########################################################

  1. When pt-online-schema-change is still running, update the PRIMARY KEY of one record.

mysql> update ord set id=5 where id=17235885;
Query OK, 1 row affected (0.80 sec)
Rows matched: 1 Changed: 1 Warnings: 0

mysql> select * from ord where id=17235885;
Empty set (0.07 sec)

mysql> select * from ord where id=5;
±—±------------±--------±--------------------±---------------------+
| id | customer_id | good_id | create_time | datetime_lastchanged |
±—±------------±--------±--------------------±---------------------+
| 5 | 1 | 2 | 2015-06-26 16:40:54 | 2015-08-10 10:36:38 |
±—±------------±--------±--------------------±---------------------+

  1. When pt-online-schema-change reports it has successfully altered the table, I find the record id=17235885 still exists, and the record id=5 exists too.

mysql> select * from ord where id=17235885;
±---------±------------±--------±--------------------±---------------------±-----+
| id | customer_id | good_id | create_time | datetime_lastchanged | name |
±---------±------------±--------±--------------------±---------------------±-----+
| 17235885 | 1 | 2 | 2015-06-26 16:40:54 | 2015-08-10 10:36:38 | NULL |
±---------±------------±--------±--------------------±---------------------±-----+
1 row in set (0.03 sec)

mysql> select * from ord where id=5;
±—±------------±--------±--------------------±---------------------±-----+
| id | customer_id | good_id | create_time | datetime_lastchanged | name |
±—±------------±--------±--------------------±---------------------±-----+
| 5 | 1 | 2 | 2015-06-26 16:40:54 | 2015-08-10 10:36:38 | NULL |
±—±------------±--------±--------------------±---------------------±-----+
1 row in set (0.03 sec)

Because of this problem, I dare NOT use pt-online-schema-change because I cannot gurantee the application never updates PRIMARY KEY. Can anybody solve the problem and give me any help? Thank you very much, and I hope percona team can solve this problem completely.

Hi trikker;

What you described is expected since the UPDATE trigger does a REPLACE INTO as you found, and all actions are based around the primary key. So when you update the primary key, it basically will treat it as a new record.

So my first question would be, why are you updating the primary key, especially with a surrogate key like ID?

That aside, the the tool is mostly reliant on there being a primary key (as are many tools), and if you are updating the primary key, you defeat the purpose, so I would not use the tool at that point for safety reasons. So my advice to fix the issue would be to not update the primary key (not what you want to hear, but that would be best practice).

-Scott

Hi Scott,

Thank you for your response and advice. The reason why I so eagerly find a way to solve the problem(updating the PRIMARY KEY during the process of pt-online-schema-change) is that I am a DBA and I cannot gurantee the application developer never updates the PRIMARY KEY. If I use the tool and unfortunately cause data errors, I will take all the responsibility. So I am in such a quandary, for one thing, I don’t want to lock the table when altering a big table, for another, I cannot run risks to use the tool(and my leader will not allow me to do this). Last, forcing the developers not to update the PRIMARY KEY is

Hi Trikker;

I think we all know the pain of trying to get developers to do what is best. My advice would be to tell them (and the relevant managers) that they can either improve their coding standards by using best practices like not updating the surrogate primary key, or you will be forced to perform blocking maintenance. Explain to them the problem, and why the blocking maintenance is your only choice unless they change their ways or are willing to risk data integrity (though I’m willing to bet your data integrity is already up in the air at this point).

As for forcing the developers to do things a certain way, if they have full write access to the database, then that’s pretty difficult to stop. The whole point of a surrogate primary key (i.e. “id”) is to have a unique value that does not change. So your devs are doing something wrong and updating the id as a hack to make their questionable code work. You could add another auto increment column and make it the primary key and not tell them, but they will probably find it and update that too. You could create stored procedures that get called to update the rows in that table so the developers must use those and not update the table itself, but that will have performance implications if you have a reasonable amount of traffic. You could use a BEFORE UPDATE trigger to set the id back to the old value, but the tools like pt-online-schema-change would still not work in that case because it needs to create it’s own triggers. You could also revoke their access and re-grant it on a more granular level (i.e. down to the column) to prevent them from updating the id, but that is a management nightmare.

So basically it all points back to getting the devs to stop updating the surrogate primary key, whatever it takes. If they refuse to change, then honestly I might start looking for other employment, as that sounds like a ticking time bomb. And you know when it finally blows, you’ll get the blame, not the devs. :wink:

-Scott

Hi Scott,

Thank you for your patiently answering my question. I think the only way I can use the tool is to tell them not to update the primary key or I will be forced to perform blocking maintenance. To separate the coding logistic from database and improve our maintenance efficiency, we never use stored procedures or allow developers to do this. My last question is that, if I can guarantee the developers never update the primary key(all our tables have primary keys, so updating all the unique keys on a table without a primary key is not possible), Are there other DMLs that may cause extra records in new table when using pt-online-schema-change? I have tested “replace into”, when it replace a record with a new unique key value, the primary key is also updated(when binlog_format=row, the replace is logged as UPDATE in binlog when it replace an old unique key). When this UPDATE log replicates to slave, it will not cause extra records either. Would you please have a deep think of the question for me? I just want to guarantee using this tool is completely safe when the developers don’t update the primary key. If I can confirm this, I will give advice to my leader and prepare to use it. Thank you very much!

trikker

Hi trikker;

Unfortunately we can never say “never” when it comes to problems occurring, but if the developers stop updating the primary key, and you do not have foreign keys referencing the table, then it normally is safe. The best way to give yourself (and your managers) some assurance would be to test this in a test/development environment in order to duplicate your production data / procedures and verify for yourself that pt-online-schema-change will work for you. With that said, I personally use the tool often, on very large and very busy tables, and have yet to have an unexpected problem (i.e. if the table is very busy, you can get into blocking issues when the triggers are added and during the final rename, but that is to be expected).

If you have not already done so, I would definitely read through the manual to go over the few risks it mentions and cases where the tool is not recommended:
[url]https://www.percona.com/doc/percona-toolkit/2.2/pt-online-schema-change.html[/url]

-Scott

Thanks, Scott. I’ll test it in test environment and prepare to use it on production when I am ready. I registered on Linkedin and sent an invitation to you. Let’s make friends and talk more about MySQL and our lives.