Deadlocks when running pt-online-schema-change on XtraDB cluster

Hi,

We have fairly large databases (100GB+) and when running pt-online-schema-change on a table that is 50GB+ I see deadlocks on regular transactions against the table I’m running the migration against.

For example:
pt-online-schema-change --user $user --password $pw --recursion-method none --execute --progress time,30 --execute --alter ‘MODIFY COLUMN state VARCHAR(10) NOT NULL’,‘DROP INDEX bleh’ D=name_production,t=invoices

The app pointing to the invoices table gets deadlocks but it looks like the underlying error comes from the invoices_new (temp) table:

140221 14:13:13 *** (1) TRANSACTION: TRANSACTION E943A14E, ACTIVE 0 sec inserting mysql tables in use 2, locked 2 LOCK WAIT 7 lock struct(s), heap size 1248, 5 row lock(s), undo log entries 3 MySQL thread id 6231227, OS thread handle 0x7eff8cc8c700, query id 158584959 worker.hostname.com ip.add.ress name update REPLACE INTO name_production._invoices_new schema VALUES (NEW. *** (1) WAITING FOR THIS LOCK TO BE GRANTED: RECORD LOCKS space id 685 page no 279991 n bits 120 index index_invoices_on_site_id_and_invoice_number of table database_production._invoices_new trx id E943A14E lock_mode X waiting *** (2) TRANSACTION: TRANSACTION E943A151, ACTIVE 0 sec inserting mysql tables in use 2, locked 2 6 lock struct(s), heap size 1248, 4 row lock(s), undo log entries 3 MySQL thread id 6231228, OS thread handle 0x7eff8e169700, query id 158584964 worker.hostname.com 10.128.2.42 recurly update REPLACE INTO database_production._invoices_new (schema) VALUES (NEW. *** (2) HOLDS THE LOCK(S): RECORD LOCKS space id 685 page no 279991 n bits 120 index index_invoices_on_site_id_and_invoice_number of table database_production._invoices_new trx id E943A151 lock_mode X locks rec but not gap *** (2) WAITING FOR THIS LOCK TO BE GRANTED: RECORD LOCKS space id 685 page no 279991 n bits 120 index index_invoices_on_site_id_and_invoice_number of table database_production._invoices_new trx id E943A151 lock_mode X waiting *** WE ROLL BACK TRANSACTION (2) Any help would be appreciated! Thanks

Deadlocks between the trigger updates and the table copy can happen. These conflicts are more likely on a busy server, and even more likely when writes are made into more than one node. In PXC all applications must be able to deal with deadlocks in order to handle write conflicts, so such deadlocks should be a nuisance but not a problem.

I’m running a 3 node cluster, all writes and migrations are done on the first node. The other two nodes are used for read-only traffic and failover.

This didn’t seem to happen with LHM. Is this a problem specific to the percona online schema change tool?

What version of pt-osc you are using ? May be this bug is affecting you [url]https://bugs.launchpad.net/percona-toolkit/+bug/988036[/url]
I would suggest to try with latest version. Further, pt-osc deals with chunk size it might be possible that it’s using a very large chunk size and it can’t get locks for all those rows. So I would suggest to use smaller value of --chunk-size while using pt-osc to avoid the tool from selecting too many rows. Give it try with smaller chunk size. Details are here [URL]pt-online-schema-change — Percona Toolkit Documentation
Also, other option to try during offpeak time.

Hope it helps.