I have a 3 nodes MySQL (8.0) Galera Cluster.
After a disaster on my server, my cluster now has only two nodes online (DB01 and DB03), one node (DB02) now lost all data files in the data folder. I tried to start the node (with specified wsrep_sst_donor), and it triggers the SST process, but it failed.
This is the log of down node (DB02):
2023-12-30T22:53:27.191267Z 0 [Note] [MY-000000] [Galera] Quorum results:
version = 6,
component = PRIMARY,
conf_id = 23,
members = 2/3 (primary/total),
act_id = 74756843,
last_appl. = 74756765,
protocols = 2/10/4 (gcs/repl/appl),
vote policy= 0,
group UUID = df948d8d-5994-11ec-91b5-ff5ae1e8c4ba
2023-12-30T22:53:27.191610Z 0 [Note] [MY-000000] [Galera] Flow-control interval: [173, 173]
2023-12-30T22:53:27.191886Z 0 [Note] [MY-000000] [Galera] Shifting OPEN -> PRIMARY (TO: 74756844)
2023-12-30T22:53:27.192242Z 2 [Note] [MY-000000] [Galera] ####### processing CC 74756844, local, ordered
2023-12-30T22:53:27.192577Z 2 [Note] [MY-000000] [Galera] Maybe drain monitors from -1 upto current CC event 74756844 upto:-1
2023-12-30T22:53:27.192862Z 2 [Note] [MY-000000] [Galera] Drain monitors from -1 up to -1
2023-12-30T22:53:27.193155Z 2 [Note] [MY-000000] [Galera] Process first view: df948d8d-5994-11ec-91b5-ff5ae1e8c4ba my uuid: 3de2d256-a766-11ee-851e-0327836c7f5c
2023-12-30T22:53:27.193461Z 2 [Note] [MY-000000] [Galera] Server 10.44.161.186 connected to cluster at position df948d8d-5994-11ec-91b5-ff5ae1e8c4ba:74756844 with ID 3de2d256-a766-11ee-851e-0327836c7f5c
2023-12-30T22:53:27.193820Z 2 [Note] [MY-000000] [WSREP] Server status change disconnected -> connected
2023-12-30T22:53:27.194171Z 2 [Note] [MY-000000] [WSREP] wsrep_notify_cmd is not defined, skipping notification.
2023-12-30T22:53:27.194502Z 2 [Note] [MY-000000] [Galera] ####### My UUID: 3de2d256-a766-11ee-851e-0327836c7f5c
2023-12-30T22:53:27.194852Z 2 [Note] [MY-000000] [Galera] Cert index reset to 00000000-0000-0000-0000-000000000000:-1 (proto: 10), state transfer needed: yes
2023-12-30T22:53:27.195228Z 0 [Note] [MY-000000] [Galera] Service thread queue flushed.
2023-12-30T22:53:27.195640Z 2 [Note] [MY-000000] [Galera] ####### Assign initial position for certification: 00000000-0000-0000-0000-000000000000:-1, protocol version: -1
2023-12-30T22:53:27.195960Z 2 [Note] [MY-000000] [Galera] State transfer required:
Group state: df948d8d-5994-11ec-91b5-ff5ae1e8c4ba:74756844
Local state: 00000000-0000-0000-0000-000000000000:-1
2023-12-30T22:53:27.196291Z 2 [Note] [MY-000000] [WSREP] Server status change connected -> joiner
2023-12-30T22:53:27.196625Z 2 [Note] [MY-000000] [WSREP] wsrep_notify_cmd is not defined, skipping notification.
2023-12-30T22:53:27.197078Z 0 [Note] [MY-000000] [WSREP] Initiating SST/IST transfer on JOINER side (wsrep_sst_xtrabackup-v2 --role 'joiner' --address '10.44.161.186' --datadir '/var/lib/mysql/' --basedir '/usr/' --plugindir '/usr/lib64/mysql/plugin/' --defaults-file '/etc/my.cnf' --defaults-group-suffix '' --parent '28944' --mysqld-version '8.0.25-15.1' --binlog 'binlog' )
2023-12-30T22:53:27.955900Z 2 [Note] [MY-000000] [WSREP] Prepared SST request: xtrabackup-v2|10.44.161.186:4444/xtrabackup_sst//1
2023-12-30T22:53:27.956382Z 2 [Note] [MY-000000] [Galera] Check if state gap can be serviced using IST
2023-12-30T22:53:27.956747Z 2 [Note] [MY-000000] [Galera] Local UUID: 00000000-0000-0000-0000-000000000000 != Group UUID: df948d8d-5994-11ec-91b5-ff5ae1e8c4ba
2023-12-30T22:53:27.957081Z 2 [Note] [MY-000000] [Galera] ####### IST uuid:00000000-0000-0000-0000-000000000000 f: 0, l: 74756844, STRv: 3
2023-12-30T22:53:27.957514Z 2 [Note] [MY-000000] [Galera] IST receiver addr using ssl://10.44.161.186:4568
2023-12-30T22:53:27.957930Z 2 [Note] [MY-000000] [Galera] IST receiver using ssl
2023-12-30T22:53:27.958733Z 2 [Note] [MY-000000] [Galera] Prepared IST receiver for 0-74756844, listening at: ssl://10.44.161.186:4568
2023-12-30T22:53:27.959895Z 0 [Note] [MY-000000] [Galera] Member 0.0 (10.44.161.186) requested state transfer from '10.44.161.63'. Selected 1.0 (10.44.161.63)(SYNCED) as donor.
2023-12-30T22:53:27.960299Z 0 [Note] [MY-000000] [Galera] Shifting PRIMARY -> JOINER (TO: 74756844)
2023-12-30T22:53:27.960749Z 2 [Note] [MY-000000] [Galera] Requesting state transfer: success, donor: 1
2023-12-30T22:53:27.961130Z 2 [Note] [MY-000000] [Galera] Resetting GCache seqno map due to different histories.
2023-12-30T22:53:27.961512Z 2 [Note] [MY-000000] [Galera] GCache history reset: 00000000-0000-0000-0000-000000000000:0 -> df948d8d-5994-11ec-91b5-ff5ae1e8c4ba:74756844
2023-12-30T22:53:28.700385Z 0 [Note] [MY-000000] [WSREP-SST] Proceeding with SST.........
2023-12-30T22:53:28.725787Z 0 [Note] [MY-000000] [WSREP-SST] ............Waiting for SST streaming to complete!
2023-12-30T22:53:29.687830Z 0 [Note] [MY-000000] [Galera] (3de2d256-851e, 'ssl://0.0.0.0:4567') turning message relay requesting off
2023-12-30T23:12:51.151208Z 0 [ERROR] [MY-000000] [WSREP-SST] Killing SST (29492) with SIGKILL after stalling for 120 seconds
2023-12-30T23:12:51.175953Z 0 [Note] [MY-000000] [WSREP-SST] /usr/bin/wsrep_sst_xtrabackup-v2: line 185: 29494 Killed socat -u openssl-listen:4444,reuseaddr,cert=/etc/mysql/certs/server.crt,key=/etc/mysql/certs/server.key,cafile=/etc/mysql/certs/server_ca.crt,verify=1,retry=30 stdio
2023-12-30T23:12:51.176015Z 0 [Note] [MY-000000] [WSREP-SST] 29495 | /usr/bin/pxc_extra/pxb-8.0/bin/xbstream -x
2023-12-30T23:12:51.176289Z 0 [ERROR] [MY-000000] [WSREP-SST] ******************* FATAL ERROR **********************
2023-12-30T23:12:51.176359Z 0 [ERROR] [MY-000000] [WSREP-SST] Error while getting data from donor node: exit codes: 137 137
2023-12-30T23:12:51.176411Z 0 [ERROR] [MY-000000] [WSREP-SST] Line 1296
2023-12-30T23:12:51.176461Z 0 [ERROR] [MY-000000] [WSREP-SST] ******************************************************
2023-12-30T23:12:51.176922Z 0 [ERROR] [MY-000000] [WSREP-SST] Cleanup after exit with status:32
2023-12-30T23:12:51.247447Z 0 [ERROR] [MY-000000] [WSREP] Process completed with error: wsrep_sst_xtrabackup-v2 --role 'joiner' --address '10.44.161.186' --datadir '/var/lib/mysql/' --basedir '/usr/' --plugindir '/usr/lib64/mysql/plugin/' --defaults-file '/etc/my.cnf' --defaults-group-suffix '' --parent '28944' --mysqld-version '8.0.25-15.1' --binlog 'binlog' : 32 (Broken pipe)
2023-12-30T23:12:51.247559Z 0 [ERROR] [MY-000000] [WSREP] Failed to read uuid:seqno from joiner script.
2023-12-30T23:12:51.247632Z 0 [ERROR] [MY-000000] [WSREP] SST script aborted with error 32 (Broken pipe)
2023-12-30T23:12:51.247938Z 3 [Note] [MY-000000] [Galera] Processing SST received
2023-12-30T23:12:51.248002Z 3 [Note] [MY-000000] [Galera] SST request was cancelled
2023-12-30T23:12:51.248062Z 3 [ERROR] [MY-000000] [Galera] State transfer request failed unrecoverably: 32 (Broken pipe). Most likely it is due to inability to communicate with the cluster primary component. Restart required.
2023-12-30T23:12:51.248106Z 3 [Note] [MY-000000] [Galera] ReplicatorSMM::abort()
2023-12-30T23:12:51.248140Z 3 [Note] [MY-000000] [Galera] Closing send monitor...
2023-12-30T23:12:51.248179Z 3 [Note] [MY-000000] [Galera] Closed send monitor.
2023-12-30T23:12:51.248210Z 3 [Note] [MY-000000] [Galera] gcomm: terminating thread
2023-12-30T23:12:51.248249Z 3 [Note] [MY-000000] [Galera] gcomm: joining thread
2023-12-30T23:12:51.248463Z 3 [Note] [MY-000000] [Galera] gcomm: closing backend
2023-12-30T23:12:51.249436Z 3 [Note] [MY-000000] [Galera] Current view of cluster as seen by this node
And this is the log of DONOR node:
231231 07:12:46 >> log scanned up to (348894019774)
231231 07:12:47 >> log scanned up to (348894020705)
231231 07:12:48 >> log scanned up to (348894021839)
231231 07:12:49 >> log scanned up to (348894022973)
231231 07:12:50 >> log scanned up to (348894029347)
231231 07:12:51 >> log scanned up to (348894030604)
xtrabackup: Error writing file '<unopen fd>' (OS errno 32 - Broken pipe)
xb_stream_write_data() failed.
xtrabackup: Error writing file '<unopen fd>' (OS errno 32 - Broken pipe)
[03] xtrabackup: Error: xtrabackup_copy_datafile() failed.
[03] xtrabackup: Error: failed to copy datafile.
2023-12-30T23:12:51.296464Z 0 [ERROR] [MY-000000] [WSREP-SST] ------------ innobackup.backup.log (END) ------------
2023-12-30T23:12:51.296473Z 0 [ERROR] [MY-000000] [WSREP-SST] ******************************************************
2023-12-30T23:12:51.296507Z 0 [ERROR] [MY-000000] [WSREP-SST] Cleanup after exit with status:22
2023-12-30T23:12:51.325538Z 0 [ERROR] [MY-000000] [WSREP] Process completed with error: wsrep_sst_xtrabackup-v2 --role 'donor' --address '10.44.161.186:4444/xtrabackup_sst//1' --socket '/var/lib/mysql/mysql.sock' --datadir '/var/lib/mysql/' --basedir '/usr/' --plugindir '/usr/lib64/mysql/plugin/' --defaults-file '/etc/my.cnf' --defaults-group-suffix '' --mysqld-version '8.0.25-15.1' --binlog 'binlog' --gtid 'df948d8d-5994-11ec-91b5-ff5ae1e8c4ba:74756844' : 22 (Invalid argument)
2023-12-30T23:12:51.333491Z 0 [Note] [MY-000000] [Galera] SST sending failed: -22
2023-12-30T23:12:51.333528Z 0 [Note] [MY-000000] [WSREP] Server status change donor -> joined
2023-12-30T23:12:51.333550Z 0 [Note] [MY-000000] [WSREP] wsrep_notify_cmd is not defined, skipping notification.
2023-12-30T23:12:51.333607Z 0 [ERROR] [MY-000000] [WSREP] Command did not run: wsrep_sst_xtrabackup-v2 --role 'donor' --address '10.44.161.186:4444/xtrabackup_sst//1' --socket '/var/lib/mysql/mysql.sock' --datadir '/var/lib/mysql/' --basedir '/usr/' --plugindir '/usr/lib64/mysql/plugin/' --defaults-file '/etc/my.cnf' --defaults-group-suffix '' --mysqld-version '8.0.25-15.1' --binlog 'binlog' --gtid 'df948d8d-5994-11ec-91b5-ff5ae1e8c4ba:74756844'
2023-12-30T23:12:51.333968Z 0 [Warning] [MY-000000] [Galera] Could not find peer: 3de2d256-a766-11ee-851e-0327836c7f5c
2023-12-30T23:12:51.333996Z 0 [Warning] [MY-000000] [Galera] 0.0 (10.44.161.63): State transfer to -1.-1 (left the group) failed: -22 (Invalid argument)
2023-12-30T23:12:51.334018Z 0 [Note] [MY-000000] [Galera] Shifting DONOR/DESYNCED -> JOINED (TO: 74758908)
2023-12-30T23:12:51.334404Z 0 [Note] [MY-000000] [Galera] Member 0.0 (10.44.161.63) synced with group.
2023-12-30T23:12:51.334421Z 0 [Note] [MY-000000] [Galera] Shifting JOINED -> SYNCED (TO: 74758908)
2023-12-30T23:12:51.334469Z 14 [Note] [MY-000000] [Galera] Server 10.44.161.63 synced with group
2023-12-30T23:12:51.334514Z 14 [Note] [MY-000000] [WSREP] Server status change joined -> synced
2023-12-30T23:12:51.334527Z 14 [Note] [MY-000000] [WSREP] Synchronized with group, ready for connections
2023-12-30T23:12:51.334557Z 14 [Note] [MY-000000] [WSREP] wsrep_notify_cmd is not defined, skipping notification.
2023-12-30T23:12:56.339184Z 0 [Note] [MY-000000] [Galera] cleaning up 3de2d256-851e (ssl://10.44.161.186:4567)
This is the mysqld status in DB01 and DB03 still working.
[root@db01 ~]# systemctl status mysqld
mysql.service - Percona XtraDB Cluster
Loaded: loaded (/usr/lib/systemd/system/mysql.service; enabled; vendor preset: disabled)
Drop-In: /etc/systemd/system/mysql.service.d
└─override.conf
Active: active (running) since Wed 2024-01-03 15:53:06 +08; 16min ago
Process: 18927 ExecStopPost=/usr/bin/mysql-systemd stop-post (code=exited, status=0/SUCCESS)
Process: 21685 ExecStop=/usr/bin/mysql-systemd stop (code=exited, status=0/SUCCESS)
Process: 32300 ExecStartPost=/usr/bin/mysql-systemd start-post $MAINPID (code=exited, status=0/SUCCESS)
Process: 32297 ExecStartPost=/bin/sh -c systemctl unset-environment _WSREP_START_POSITION (code=exited, status=0/SUCCESS)
Process: 29922 ExecStartPre=/bin/sh -c VAR=`bash /usr/bin/mysql-systemd galera-recovery`; [ $? -eq 0 ] && systemctl set-environment _WSREP_START_POSITION=$VAR || exit 1 (code=exited, status=0/SUCCESS)
Process: 29919 ExecStartPre=/bin/sh -c systemctl unset-environment _WSREP_START_POSITION (code=exited, status=0/SUCCESS)
Process: 29891 ExecStartPre=/usr/bin/mysql-systemd start-pre (code=exited, status=0/SUCCESS)
Main PID: 29960 (mysqld)
Status: "Server is operational"
CGroup: /system.slice/mysql.service
└─29960 /usr/sbin/mysqld --wsrep_start_position=df948d8d-5994-11ec-91b5-ff5ae1e8c4ba:75099198
I tried many times, but it keeps the same error.