Hi there. My monitoring told me this morning, that my whole cluster ist down. I cloud bootstrap one Node, but the second and third node wont join the Cluster anymore. Here’s what my log tells me:
2022-09-07T09:26:17.675579Z 0 [Note] WSREP: discarding pending addr without UUID: tcp://172.30.0.65:4567
2022-09-07T09:26:18.027461Z 0 [Note] WSREP: gcomm: connected
2022-09-07T09:26:18.027598Z 0 [Note] WSREP: Shifting CLOSED -> OPEN (TO: 0)
2022-09-07T09:26:18.027770Z 0 [Note] WSREP: Waiting for SST/IST to complete.
2022-09-07T09:26:18.027777Z 0 [Note] WSREP: New COMPONENT: primary = yes, bootstrap = no, my_idx = 0, memb_num = 2
2022-09-07T09:26:18.027935Z 0 [Note] WSREP: STATE_EXCHANGE: sent state UUID: 20965914-2e8f-11ed-961c-bbc22ab886f3
2022-09-07T09:26:18.029104Z 0 [Note] WSREP: STATE EXCHANGE: sent state msg: 20965914-2e8f-11ed-961c-bbc22ab886f3
2022-09-07T09:26:18.029470Z 0 [Note] WSREP: STATE EXCHANGE: got state msg: 20965914-2e8f-11ed-961c-bbc22ab886f3 from 0 (srv-db02)
2022-09-07T09:26:18.029779Z 0 [Note] WSREP: STATE EXCHANGE: got state msg: 20965914-2e8f-11ed-961c-bbc22ab886f3 from 1 (srv-db01)
2022-09-07T09:26:18.029807Z 0 [Note] WSREP: Quorum results:
version = 6,
component = PRIMARY,
conf_id = 11,
members = 1/2 (primary/total),
act_id = 52210161,
last_appl. = -1,
protocols = 0/9/3 (gcs/repl/appl),
group UUID = 8054bb0c-c25d-11e9-8c73-ab46b0e3a8d9
2022-09-07T09:26:18.029836Z 0 [Note] WSREP: Flow-control interval: [141, 141]
2022-09-07T09:26:18.029857Z 0 [Note] WSREP: Shifting OPEN -> PRIMARY (TO: 52210161)
2022-09-07T09:26:18.029945Z 1 [Note] WSREP: State transfer required:
Group state: 8054bb0c-c25d-11e9-8c73-ab46b0e3a8d9:52210161
Local state: 00000000-0000-0000-0000-000000000000:-1
2022-09-07T09:26:18.029979Z 1 [Note] WSREP: REPL Protocols: 9 (4, 2)
2022-09-07T09:26:18.029986Z 1 [Note] WSREP: REPL Protocols: 9 (4, 2)
2022-09-07T09:26:18.029999Z 1 [Note] WSREP: New cluster view: global state: 8054bb0c-c25d-11e9-8c73-ab46b0e3a8d9:52210161, view# 12: Primary, number of nodes: 2, my index: 0, protocol version 3
2022-09-07T09:26:18.030009Z 1 [Note] WSREP: Setting wsrep_ready to true
2022-09-07T09:26:18.030014Z 1 [Warning] WSREP: Gap in state sequence. Need state transfer.
2022-09-07T09:26:18.030019Z 1 [Note] WSREP: Setting wsrep_ready to false
2022-09-07T09:26:18.030116Z 0 [Note] WSREP: Initiating SST/IST transfer on JOINER side (wsrep_sst_xtrabackup-v2 --role 'joiner' --address '172.30.0.64' --datadir '/var/lib/mysql/' --defaults-file '/etc/mysql/my.cnf' --defaults-group-suffix '' --parent '7368' --mysqld-version '5.7.38-41-57' '' )
2022-09-07T09:26:18.367800Z WSREP_SST: [INFO] Streaming with xbstream
2022-09-07T09:26:18.376212Z WSREP_SST: [WARNING] Found a stale sst_in_progress file: /var/lib/mysql//sst_in_progress
2022-09-07T09:26:18.702153Z 1 [Note] WSREP: Prepared SST/IST request: xtrabackup-v2|172.30.0.64:4444/xtrabackup_sst//1
2022-09-07T09:26:18.702234Z 1 [Note] WSREP: Auto Increment Offset/Increment re-align with cluster membership change (Offset: 1 -> 1) (Increment: 1 -> 2)
2022-09-07T09:26:18.702272Z 1 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
2022-09-07T09:26:18.702304Z 1 [Note] WSREP: Assign initial position for certification: 52210161, protocol version: 4
2022-09-07T09:26:18.702342Z 0 [Note] WSREP: Service thread queue flushed.
2022-09-07T09:26:18.702364Z 1 [Note] WSREP: Check if state gap can be serviced using IST
2022-09-07T09:26:18.702374Z 1 [Note] WSREP: Local UUID: 00000000-0000-0000-0000-000000000000 != Group UUID: 8054bb0c-c25d-11e9-8c73-ab46b0e3a8d9
2022-09-07T09:26:18.702424Z 1 [Note] WSREP: State gap can't be serviced using IST. Switching to SST
2022-09-07T09:26:18.702433Z 1 [Note] WSREP: Failed to prepare for incremental state transfer: Local state UUID (00000000-0000-0000-0000-000000000000) does not match group state UUID (8054bb0c-c25d-11e9-8c73-ab46b0e3a8d9): 1 (Operation not permitted)
at galera/src/replicator_str.cpp:prepare_for_IST():518. IST will be unavailable.
2022-09-07T09:26:18.703175Z 0 [Note] WSREP: Member 0.0 (srv-db02) requested state transfer from '*any*'. Selected 1.0 (srv-db01)(SYNCED) as donor.
2022-09-07T09:26:18.703212Z 0 [Note] WSREP: Shifting PRIMARY -> JOINER (TO: 52210161)
2022-09-07T09:26:18.703254Z 1 [Note] WSREP: Requesting state transfer: success, donor: 1
2022-09-07T09:26:18.703271Z 1 [Note] WSREP: GCache history reset: 00000000-0000-0000-0000-000000000000:0 -> 8054bb0c-c25d-11e9-8c73-ab46b0e3a8d9:52210161
2022-09-07T09:26:19.454000Z WSREP_SST: [INFO] Streaming with xbstream
2022-09-07T09:26:19.460987Z WSREP_SST: [INFO] WARNING: Stale temporary SST directory: /var/lib/mysql//.sst from previous state transfer. Removing
2022-09-07T09:26:19.465540Z WSREP_SST: [INFO] Proceeding with SST.........
2022-09-07T09:26:19.490339Z WSREP_SST: [INFO] ............Waiting for SST streaming to complete!
2022-09-07T09:26:21.027786Z 0 [Note] WSREP: (2049cf39, 'tcp://0.0.0.0:4567') turning message relay requesting off
2022-09-07T09:26:30.144070Z WSREP_SST: [ERROR] ******************* FATAL ERROR **********************
2022-09-07T09:26:30.145611Z WSREP_SST: [ERROR] xtrabackup_checkpoints missing. xtrabackup/SST failed on DONOR. Check DONOR log
2022-09-07T09:26:30.147132Z WSREP_SST: [ERROR] ******************************************************
2022-09-07T09:26:30.148738Z WSREP_SST: [ERROR] Cleanup after exit with status:2
2022-09-07T09:26:30.160282Z 0 [Warning] WSREP: 1.0 (srv-db01): State transfer to 0.0 (srv-db02) failed: -22 (Invalid argument)
2022-09-07T09:26:30.160368Z 0 [ERROR] WSREP: gcs/src/gcs_group.cpp:gcs_group_handle_join_msg():811: Will never receive state. Need to abort.
2022-09-07T09:26:30.160428Z 0 [Note] WSREP: gcomm: terminating thread
2022-09-07T09:26:30.160481Z 0 [Note] WSREP: gcomm: joining thread
2022-09-07T09:26:30.160598Z 0 [Note] WSREP: gcomm: closing backend
2022-09-07T09:26:30.524573Z 0 [ERROR] WSREP: Process completed with error: wsrep_sst_xtrabackup-v2 --role 'joiner' --address '172.30.0.64' --datadir '/var/lib/mysql/' --defaults-file '/etc/mysql/my.cnf' --defaults-group-suffix '' --parent '7368' --mysqld-version '5.7.38-41-57' '' : 2 (No such file or directory)
2022-09-07T09:26:30.524671Z 0 [ERROR] WSREP: Failed to read uuid:seqno from joiner script.
2022-09-07T09:26:30.524699Z 0 [ERROR] WSREP: SST script aborted with error 2 (No such file or directory)
2022-09-07T09:26:30.524750Z 0 [ERROR] WSREP: SST failed: 2 (No such file or directory)
2022-09-07T09:26:30.524773Z 0 [ERROR] Aborting
2022-09-07T09:26:30.524792Z 0 [Note] WSREP: Signalling cancellation of the SST request.
2022-09-07T09:26:30.524825Z 0 [Note] WSREP: SST request was cancelled
2022-09-07T09:26:30.524854Z 0 [Note] Giving 2 client threads a chance to die gracefully
2022-09-07T09:26:30.524889Z 1 [Note] WSREP: Closing send monitor...
2022-09-07T09:26:30.524958Z 1 [Note] WSREP: Closed send monitor.
2022-09-07T09:26:31.164254Z 0 [Note] WSREP: Current view of cluster as seen by this node
view (view_id(NON_PRIM,2049cf39,1311)
memb {
2049cf39,0
}
joined {
}
left {
}
partitioned {
b4cfd8a6,0
}
)
2022-09-07T09:26:31.164322Z 0 [Note] WSREP: Current view of cluster as seen by this node
view ((empty))
2022-09-07T09:26:31.164455Z 0 [Note] WSREP: gcomm: closed
2022-09-07T09:26:31.164472Z 0 [Note] WSREP: /usr/sbin/mysqld: Terminated.
Can someone please help me an tell me how to recover the second and third node?
Thanks in advance.