State transfer falling back to SST

donkeykong · April 6, 2024, 12:30pm

I am busy testing an upgrade of a cluster from 5.7 to 8.0.

After cloning the 3x DB nodes and bootstrapping the first node (I selected galera-db2, as it had the highest seqno), I try to join one of the other nodes, but get the error as below when I try to join galera-db1:

2024-04-06T11:55:39.108196Z 0 [Note] WSREP: Member 0.0 (galera-db01) requested state transfer from '*any*'. Selected 1.0 (galera-db02)(SYNCED) as donor.
2024-04-06T11:55:39.108240Z 0 [Note] WSREP: Shifting SYNCED -> DONOR/DESYNCED (TO: 564321670)
2024-04-06T11:55:39.108400Z 1 [Note] WSREP: IST request: 6fc9cf1e-96f0-11ed-a540-4b17e5dad47f:564317656-564321670|tcp://172.16.75.21:4568
2024-04-06T11:55:39.108499Z 1 [Note] WSREP: IST first seqno 564317657 not found from cache, falling back to SST
2024-04-06T11:55:39.108511Z 1 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
2024-04-06T11:55:39.109617Z 0 [Note] WSREP: Initiating SST/IST transfer on DONOR side (wsrep_sst_xtrabackup-v2 --role 'donor' --address '172.16.75.21:4444/xtrabackup_sst//1' --socket '/var/lib/mysql/mysql.sock' --datadir '/var/lib/mysql/' --defaults-file '/etc/my.cnf' --defaults-group-suffix '' --mysqld-version '5.7.26-29-57'   '' --gtid '6fc9cf1e-96f0-11ed-a540-4b17e5dad47f:564321670' )

How do I avoid this? This has been happening for a long time and I have literally never seen it work with IST. Do I need to configure something differently to keep more data to allow binlogs to service the IST?

In case it matters, the nodes were online and running when I cloned the VMs to test the upgrade procedure.

Something else I have noticed is on he third node the seqno in grastate.dat is -1 . Why is this?

P.S. I also know from past experience that the SST process will fail because the data partition is using 920 GB out of 1000 GB. Since this is a test environment, I will just drop one of the DBs and re-do it, but that is only viable in this case.

matthewb · April 7, 2024, 2:58am

You need to increase your Galera cache and enable gcache recovery:

wsrep_provider_options=gcache.size=1G;gcache.recover=yes

The binary logs have nothing to do with Galera/IST/SST. Binary logs are only for async replication. You can run PXC with binlogs completely disabled.

This is always true while a node is running. The grastate.dat file is not written/flushed until node shutdown.

donkeykong · April 7, 2024, 7:14am

Thank you, this is exactly the information I needed.

Topic		Replies	Views
IST fallback to SST due to safe_ist_seqno Percona XtraDB Cluster 5.x	0	725	April 17, 2019
IST first seqno 7 not found from cache, falling back to SST Percona XtraDB Cluster 5.x	0	669	September 30, 2012
IST fails when joining slaves back to cluster Percona XtraDB Cluster 5.x	0	405	December 18, 2012
Node Not Performing IST Percona XtraDB Cluster 5.x	4	1060	March 13, 2023
Failure to add new XtraDB cluster nodes Percona XtraDB Cluster 5.x	4	1764	September 12, 2014

State transfer falling back to SST

Related topics