SST (State Snapshot Transfer) process Time

Dear Percona Support Team,
I am currently managing a Percona XtraDB Cluster, and I’ve encountered a significant issue where one of our nodes is lagging far behind. The SST (State Snapshot Transfer) process to bring this node up to date is taking approximately two days to complete. This duration is proving to be highly disruptive to our operations.

Here are some details about our setup:
Cluster Version: 57
Mysql Version: 5.7.40
SST Method Used: xtrabackup

Could you please advise on potential optimizations or alternative approaches we could employ to reduce the time required for the SST process? Specifically, we are interested in any best practices or configuration tweaks that could help us achieve faster synchronization.

Hello @Omer_Faruk_Icen,
The SST process should go at about 100GB per hour on a typical server with Gigabit+ networking. How large is your dataset? If you are using any type of virtualization like EC2+EBS, VMWare, or K8S, you should be able to clone the storage quickly, and add the cloned storage to a new instance. That would get you online in a fraction of the time.

Yes, we are using virtualization. If we clone the disk of Node2 to Node1 and adjust the settings on Node1, would it quickly integrate and get online faster?

Yes, Make sure you have wsrep_provider_options=‘gcache.recover=yes’ otherwise IST will not be used to recover.