pt-online-schema-change is not working right

Table t1 is the table has 4289656 rows. No any other people use this table.

CREATE TABLE t1 (
id int(11) NOT NULL AUTO_INCREMENT,
val1 tinyint ,
PRIMARY KEY (id)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

mysql> select count() from t1;
±---------+
| count(
) |
±---------+
| 4289656 |
±---------+
1 row in set (0.94 sec)

  1. In session 1, add one column val2 using pt-online-schema-change. it will take some time.

./pt-online-schema-change --user=root --alter “ADD COLUMN val2 tinyint not null default 0” D=test,t=t1 --print --execute

  1. During pt-online-schema-change running, open session 2, modify vaule of primary key. SQLs are below:

mysql> update t1 set id=10000000 where id=3000000;
Query OK, 1 row affected (0.28 sec)
Rows matched: 1 Changed: 1 Warnings: 0

mysql> select * from t1 where id=10000000;
±---------±-----+
| id | val1 |
±---------±-----+
| 10000000 | 127 |
±---------±-----+
1 row in set (0.00 sec)

mysql> select * from t1 where id=3000000;
Empty set (0.00 sec)

mysql> update t1 set id=3000000 where id=10000000;
Query OK, 1 row affected (0.00 sec)
Rows matched: 1 Changed: 1 Warnings: 0

mysql> select * from t1 where id=10000000;
Empty set (0.00 sec)

mysql> select * from t1 where id=3000000;
±--------±-----+
| id | val1 |
±--------±-----+
| 3000000 | 127 |
±--------±-----+
1 row in set (0.00 sec)

Things looks right.

  1. After pt-online-schema-change completed, query again in session 2:

mysql> select * from t1 where id=10000000;
±---------±-----±-----+
| id | val1 | val2 |
±---------±-----±-----+
| 10000000 | 127 | 0 |
±---------±-----±-----+
1 row in set (0.00 sec)

mysql> select count() from t1;
±---------+
| count(
) |
±---------+
| 4289657 |— before add column, count(*)=4289656
±---------+
1 row in set (1.00 sec)

Why the row with id=10000000 exists in table? Anybody can figure it out?

========================================
Output of pt-online-schema-change:

./pt-online-schema-change --user=root --alter “ADD COLUMN val2 tinyint not null default 0” D=test,t=t1 --print --execute

No slaves found. See --recursion-method if host has slaves.
Not checking slave lag because no slaves were found and --check-slave-lag was not specified.
Operation, tries, wait:
copy_rows, 10, 0.25
create_triggers, 10, 1
drop_triggers, 10, 1
swap_tables, 10, 1
update_foreign_keys, 10, 1
Altering test.t1
Creating new table…
CREATE TABLE test._t1_new (
id int(11) NOT NULL AUTO_INCREMENT,
val1 tinyint(4) DEFAULT NULL,
PRIMARY KEY (id)
) ENGINE=InnoDB AUTO_INCREMENT=4289657 DEFAULT CHARSET=utf8
Created new table test._t1_new OK.
Altering new table…
ALTER TABLE test._t1_new ADD COLUMN val2 tinyint not null default 0
Altered test._t1_new OK.
2014-07-04T10:41:29 Creating triggers…
CREATE TRIGGER pt_osc_test_t1_del AFTER DELETE ON test.t1 FOR EACH ROW DELETE IGNORE FROM test._t1_new WHERE test._t1_new.id <=> OLD.id
CREATE TRIGGER pt_osc_test_t1_upd AFTER UPDATE ON test.t1 FOR EACH ROW REPLACE INTO test._t1_new (id, val1) VALUES (NEW.id, NEW.val1)
CREATE TRIGGER pt_osc_test_t1_ins AFTER INSERT ON test.t1 FOR EACH ROW REPLACE INTO test._t1_new (id, val1) VALUES (NEW.id, NEW.val1)
2014-07-04T10:41:29 Created triggers OK.
2014-07-04T10:41:29 Copying approximately 4290498 rows…
INSERT LOW_PRIORITY IGNORE INTO test._t1_new (id, val1) SELECT id, val1 FROM test.t1 FORCE INDEX(PRIMARY) WHERE ((id >= ?)) AND ((id <= ?)) LOCK IN SHARE MODE /pt-online-schema-change 2127 copy nibble/
SELECT /*!40001 SQL_NO_CACHE */ id FROM test.t1 FORCE INDEX(PRIMARY) WHERE ((id >= ?)) ORDER BY id LIMIT ?, 2 /next chunk boundary/
Copying test.t1: 96% 00:01 remain
2014-07-04T10:42:00 Copied rows OK.
2014-07-04T10:42:00 Swapping tables…
RENAME TABLE test.t1 TO test._t1_old, test._t1_new TO test.t1
2014-07-04T10:42:00 Swapped original and new tables OK.
2014-07-04T10:42:00 Dropping old table…
DROP TABLE IF EXISTS test._t1_old
2014-07-04T10:42:01 Dropped old table test._t1_old OK.
2014-07-04T10:42:01 Dropping triggers…
DROP TRIGGER IF EXISTS test.pt_osc_test_t1_del;
DROP TRIGGER IF EXISTS test.pt_osc_test_t1_upd;
DROP TRIGGER IF EXISTS test.pt_osc_test_t1_ins;
2014-07-04T10:42:01 Dropped triggers OK.
Successfully altered test.t1.

Hi Gavin,

pt-online-schema-change uses triggers to apply the changes on the table being altered to the new table. You can see the UPDATE trigger was defined as such:

CREATE TRIGGER pt_osc_test_t1_upd AFTER UPDATE ON test.t1 FOR EACH ROW REPLACE INTO test._t1_new (id, val1) VALUES (NEW.id, NEW.val1)

When you ran:
update t1 set id=10000000 where id=3000000;

The triggered event is:

REPLACE INTO test._t1_new(id,val1) VALUES (10000000, 127);

Since id 1000000 didn’t exists, this query acted as an INSERT. And note, the id 30000000 record remained unchanged.

Then you ran:
update t1 set id=3000000 where id=10000000;

The triggered event is:
REPLACE INTO test._t1_new(id,val1) VALUES (30000000, 127);

Because id 30000000 existed, in this case, the existing row first got deleted, then REPLACE acted like an INSERT.

So that end result is that you had one more row of id 10000000 in the post-altered table.

In short, it is a limitation of pt-online-schema-change. It doesn’t support concurrent update on primary key value(or unique key if no primary key).

Thank you, Song.