State transfer falling back to SST

I am busy testing an upgrade of a cluster from 5.7 to 8.0.

After cloning the 3x DB nodes and bootstrapping the first node (I selected galera-db2, as it had the highest seqno), I try to join one of the other nodes, but get the error as below when I try to join galera-db1:

2024-04-06T11:55:39.108196Z 0 [Note] WSREP: Member 0.0 (galera-db01) requested state transfer from '*any*'. Selected 1.0 (galera-db02)(SYNCED) as donor.
2024-04-06T11:55:39.108240Z 0 [Note] WSREP: Shifting SYNCED -> DONOR/DESYNCED (TO: 564321670)
2024-04-06T11:55:39.108400Z 1 [Note] WSREP: IST request: 6fc9cf1e-96f0-11ed-a540-4b17e5dad47f:564317656-564321670|tcp://
2024-04-06T11:55:39.108499Z 1 [Note] WSREP: IST first seqno 564317657 not found from cache, falling back to SST
2024-04-06T11:55:39.108511Z 1 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
2024-04-06T11:55:39.109617Z 0 [Note] WSREP: Initiating SST/IST transfer on DONOR side (wsrep_sst_xtrabackup-v2 --role 'donor' --address '' --socket '/var/lib/mysql/mysql.sock' --datadir '/var/lib/mysql/' --defaults-file '/etc/my.cnf' --defaults-group-suffix '' --mysqld-version '5.7.26-29-57'   '' --gtid '6fc9cf1e-96f0-11ed-a540-4b17e5dad47f:564321670' )

How do I avoid this? This has been happening for a long time and I have literally never seen it work with IST. Do I need to configure something differently to keep more data to allow binlogs to service the IST?

In case it matters, the nodes were online and running when I cloned the VMs to test the upgrade procedure.

Something else I have noticed is on he third node the seqno in grastate.dat is -1 . Why is this?

P.S. I also know from past experience that the SST process will fail because the data partition is using 920 GB out of 1000 GB. Since this is a test environment, I will just drop one of the DBs and re-do it, but that is only viable in this case.

You need to increase your Galera cache and enable gcache recovery:


The binary logs have nothing to do with Galera/IST/SST. Binary logs are only for async replication. You can run PXC with binlogs completely disabled.

This is always true while a node is running. The grastate.dat file is not written/flushed until node shutdown.

Thank you, this is exactly the information I needed.