Error Code: 1213. WSREP with concurrent DML when DDL is executed

We have Percona-XtraDB-Cluster-5.7.16 Galera 3 node cluster on a Linux machine . The application is 24/7 online . Hence for doing schema upgrades, I was looking for an option to avoid locking/blocking and use RSU. For us , pt-online-schema-change is not an option as table sizes are very big and will take a long time.

This is what I tried
Node 0
1.Change the Schema Upgrade method to Rolling Schema Upgrade.(SET wsrep_OSU_method=‘RSU’:wink:
2.Run the ALTER statement.
3.Reset the Schema Upgrade method back to Total Order Isolation.(SET wsrep_OSU_method=‘TOI’:wink:

Followed by the same set of steps in Node 1 and the Node 2

While the replication worked and we did not see any issues. But concurrent DML running on node 0 was giving error
[COLOR=#1F497D]Error Code: 1213. WSREP detected deadlock/conflict and aborted the transaction. Try restarting the transaction

Please let me know if there is a way to overcome this issue and have concurrent DML running on the node while we do Alter with RSU

Yes, this is known behavior. Even in RSU method, when there is any local transaction ongoing that holds the MDL lock, an ALTER will just abort it.
There is a workaround - instead of using RSU, you can do a “manual RSU” instead, like this:

set global wsrep_desync=1;
set wsrep_on=0;
ALTER ...
set wsrep_on=1;
set global wsrep_desync=0; 

But beware, this method, as well as regular RSU, is safe ONLY for backwards schema compatible changes! So, while it’s a good method for adding or removing secondary indexes, or for optimizing table, but I would NOT recommend doing it for adding/removing columns, changing datatypes, etc.

Thank you . I have 3 questions regarding the workaround -

  1. Will it allow parallel DML on same table and node where the ‘Manual RSU’ is being run?
    2 Is there a workaround for (set global wsrep_desync=1; ) as application wont have super user privileges to run the command .
    3 After the alter is run on all 3 nodes, will the nodes try to be in sync or any manual intervention is needed

Thanks

Yes, in my tests, during ongoing ALTER started in that way, I am able to write to the table from other session on the same node. Of course, it may be limited to certain alter types, but for compatible ones, like noop alter or secondary index operations, it should not block anything.

No, you can’t change the node to desync state without super privilege.
Also, I don’t think such ALTERs should be part of normal client application. Rather it should be a maintenance script executed when needed, like during software deployments.

When you decide to alter in RSU or in “manual RSU” mode, it is your responsibility to make sure that all nodes are in sync in terms of table schema.
In terms of data though, replication for other sessions works as usual, no additional intervention needed.