I am having issues with backup locking the cluster. I have to continually use DDL statements in production. I have a 5 node cluster I would like to remove 1 node from the cluster run the backup then add the node back to the cluster.
Do I go to each node and update
wsrep_cluster_address? If so how do I update
set wsrep_cluster_address = gcomm://192.168.2.59,192.168.2.60,192.168.2.61,192.168.2.62 leaving out the node i want to remove?
On the node i want to remove do I
For this use case what I would suggest doing is:
- Set up an async replica from the cluster and take the backup from there to avoid disrupting normal cluster operation
- Check how DDL creation needs to be executed so frequently. DDL locking is a limitation and characteristic of how PXC works and will be disruptive for operations.
As operations grows (and possibly amount of DDLs grows as well) amount of disruptions will take longer and longer, disrupting normal cluster operations .
I would suggest any of the above approaches, but if you still want to take a node out of the cluster for taking a backup.
- shutdown the node from where the backup will be taken
- backup grastate.dat file with current cluster status
- start the node in standalone (remove wsrep lines from config file)
- take backup
- undo changes (add back wsrep lines, restore grastate.dat file)
- restart the node to join the cluster
You might find this useful to avoid IST: Want IST Not SST for Node Rejoins? We Have a Solution! - Percona Database Performance Blog
Typo. Last line should have said “You might find this useful to avoid SST”
The ddl is on a temp table in a stored procedure and before I upgraded to xtradb cluster 8.0.26 i never had the issues. I was on 5.7. If
SET GLOBAL wsrep_desync=ON; do the backup then
SET GLOBAL wsrep_desync=OFF; would that exclude would that be the same as removing it from the cluster then adding it back to the cluster?
If I go with removing the node, do I have to update the other nodes gcomm line?
There is no setting that can remove a node from the cluster.
wsrep_desync only tells the node to NOT broadcast flow control messages if it becomes overwhelmed with incoming transactions. The node remains connected and continues to participate in general.
You NEVER need to do this.
wsrep_cluster_address does NOT define membership of your cluster. Membership/quorum of your cluster is defined by the cluster itself. The
wsrep_cluster_address parameter only requires at least 1 other member. Once connected, that node will inform of ever other node also connected.
If you want to remove a node completely, do as Carlos suggests which is to stop the node, comment out all wsrep_ params, and start mysql.
Since the temporary table engine was completely overhauled in MySQL 8, I can understand why you might now have such an issue with this DDL.
Ok thanks. I will go with that option stopping node altering gcomm and renaming wsrep_cluster_name. do backup, then undo changes.
But still: if there is an option to setup separate replica from cluster, this will look more organic than such a complex solution to exclude-reinclude node in cluster.
I looked at setting up a replica but I am lost. I don’t need auto fail over and I have a multi master setup. I didn’t quite understand the directions listed Async
Well, for starters, that blog post you linked is for Group Replication, which you are most certainly NOT using. (ie: you’re using PXC/Galera)
You need to follow any of the hundreds of posts out there for setting up simple async replication between two mysql servers. The fact that the source is also a PXC node doesn’t matter that much.
If you need a quick fix, you might want to consider changing internal_tmp_disk_storage_engine to revert to the previous value which is to use myisam for temp tables, although be aware of the implications.