ALTER TABLE without downtime in a cluster

I’m struggling to work out how to perform ALTER TABLES without down time in a cluster, even with the nice pt-online-schema-change tool.

If there are many other tables which have foreign key references to the table that you are altering, then my experience is that you’ll get a lot of downtime at the end (presumably even if it selects drop_swap). This also doesn’t seem to play nicely when there are multiple nodes running.

So for large tables with many key references to them, I’ve ended up reducing the cluster to a single node, doing an ALTER TABLE and not using pt-online-schema-change, then forcing SSTs to the other nodes to bring them back in by deleting grastate.dat.

This isn’t great - significant down time. Am I missing something? Yes, maybe I’m missing rolling schema upgrades (RSU).

I’ve not tried this yet - but it looks like the process would be:

  • take a node out of the load balancer that sends requests to it
  • set global wsrep_OSU_method=RSU;
  • do the ALTER TABLE
  • set global wsrep_OSU_method=TOI;
  • put the node back into the load balancer
  • repeat for the other nodes

Is that broadly what people do with RSU?

One special case here is the use of ALTER TABLE to reclaim InnoDB space. Is what I outline above a way to reclaim space without down time?

General hints and tips welcome.

Well, I tried RSU but it didn’t work very well.

  • set global wsrep_OSU_method=RSU; on one node
  • OPTIMIZE table on that node
  • the other nodes also start performing the OPTIMIZE, so the cluster locks up

Am I missing something?

(Edit) I later came across this, which suggests that set global won’t work:

Prior to Percona XtraDB Cluster 5.6.24-25.11, wsrep_OSU_method was only a global variable. Current behavior is now consistent with MySQL behavior for variables that have both global and session scope. This means if you want to change the variable in current session, you need to do it with: SET wsrep_OSU_method (without the GLOBALkeyword). Setting the variable with SET GLOBAL wsrep_OSU_method will change the variable globally but it won’t have effect on the current session.

Following up, just in case anyone else google and finds this - using SET wsrep_OSU_method did the trick, i.e. omitting the GLOBAL. This allows me to alter a table on one, and presumably then the others in turn.

So basically use RSU for any backwards compatible changes affecting large tables rather than pt-online-schema-change.