I have 3 servers running MariaDB 1.4.17 and Galera 4-26.4.6 on CentOS 7 (to be identified as galera-4 thru galera-6) being migrated to from another running, synchronized set of MariaDB / Galera servers (to be identified as galera-1 through galera-3). I added the IP addresses of all 3 new servers to the existing and restarted to pickup the changes. New servers galera-4 and galera-5 then joined the cluster with no issues. However server galera-6 is unable to join the cluster. I cannot find any issues with firewall connectivity.
Everything looks normal in the logs until this stage:
2021-02-10 19:20:27 2 [Note] WSREP: State transfer required:
Group state: ab5dc5b8-5414-11ea-8dc3-c6f77e135d98:3780274
Local state: 00000000-0000-0000-0000-000000000000:-1
2021-02-10 19:20:27 2 [Note] [Debug] WSREP: server galera-6 state change: connected -> joiner
2021-02-10 19:20:27 2 [Note] WSREP: Server status change connected -> joiner
2021-02-10 19:20:27 2 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
2021-02-10 19:20:27 2 [Note] WSREP: wsrep_guess_ip() : 172.30.33.190
2021-02-10 19:20:27 0 [Note] WSREP: Joiner monitor thread started to monitor
2021-02-10 19:20:27 0 [Note] WSREP: Running: 'wsrep_sst_rsync --role 'joiner' --address '172.30.33.190' --datadir '/var/
lib/mysql/' --parent '29986' --mysqld-args --wsrep_start_position=00000000-0000-0000-0000-000000000000:-1'
WSREP_SST: [INFO] Role: joiner (20210210 19:20:27.890)
WSREP_SST: [INFO] Starting rsync process as joiner (20210210 19:20:27.893)
WSREP_SST: [INFO] rsync listening on 172.30.33.190:4444 (20210210 19:20:27.899)
2021-02-10 19:20:27 2 [Note] WSREP: Prepared SST request: rsync|172.30.33.190:4444/rsync_sst
2021-02-10 19:20:27 2 [Warning] WSREP: Failed to prepare for incremental state transfer: Local state UUID (00000000-0000
-0000-0000-000000000000) does not match group state UUID (ab5dc5b8-5414-11ea-8dc3-c6f77e135d98): 1 (Operation not permit
ted)
at galera/src/replicator_str.cpp:prepare_for_IST():616. IST will be unavailable.
2021-02-10 19:20:27 0 [Note] WSREP: Member 0.0 (galera-6) requested state transfer from '*any*'. Selected 1.0 (galera-5)(SYNCED) as donor.
2021-02-10 19:20:27 0 [Note] WSREP: Shifting PRIMARY -> JOINER (TO: 3780275)
2021-02-10 19:20:27 2 [Note] WSREP: Requesting state transfer: success, donor: 1
2021-02-10 19:20:27 2 [Note] WSREP: Resetting GCache seqno map due to different histories.
2021-02-10 19:20:27 2 [Note] WSREP: GCache history reset: ab5dc5b8-5414-11ea-8dc3-c6f77e135d98:0 -> ab5dc5b8-5414-11ea-8
dc3-c6f77e135d98:3780274
2021-02-10 19:20:27 0 [Note] WSREP: 1.0 (galera-5): State transfer to 0.0 (galera-6) complete.
2021-02-10 19:20:27 0 [Note] WSREP: Member 1.0 (galera-5) synced with group.
2021-02-10 19:20:30 0 [Note] WSREP: (0821022c-8605, 'ssl://0.0.0.0:4567') turning message relay requesting off
2021-02-10 19:20:57 0 [Note] WSREP: Joiner waited 30 sec, extending systemd startup timeout as SSTis not completed
2021-02-10 19:21:27 0 [Note] WSREP: Joiner waited 60 sec, extending systemd startup timeout as SSTis not completed
2021-02-10 19:21:57 0 [Note] WSREP: Joiner waited 90 sec, extending systemd startup timeout as SSTis not completed
2021-02-10 19:22:27 0 [Note] WSREP: Joiner waited 120 sec, extending systemd startup timeout as SSTis not completed
2021-02-10 19:22:57 0 [Note] WSREP: Joiner waited 150 sec, extending systemd startup timeout as SSTis not completed
2021-02-10 19:23:27 0 [Note] WSREP: Joiner waited 180 sec, extending systemd startup timeout as SSTis not completed
2021-02-10 19:23:57 0 [Note] WSREP: Joiner waited 210 sec, extending systemd startup timeout as SSTis not completed
2021-02-10 19:24:27 0 [Note] WSREP: Joiner waited 240 sec, extending systemd startup timeout as SSTis not completed
2021-02-10 19:24:57 0 [Note] WSREP: Joiner waited 270 sec, extending systemd startup timeout as SSTis not completed
2021-02-10 19:25:27 0 [Note] WSREP: Joiner waited 300 sec, extending systemd startup timeout as SSTis not completed
2021-02-10 19:25:57 0 [Note] WSREP: Joiner waited 330 sec, extending systemd startup timeout as SSTis not completed
2021-02-10 19:26:27 0 [Note] WSREP: Joiner waited 360 sec, extending systemd startup timeout as SSTis not completed
2021-02-10 19:26:57 0 [Note] WSREP: Joiner waited 390 sec, extending systemd startup timeout as SSTis not completed
2021-02-10 19:27:27 0 [Note] WSREP: Joiner waited 420 sec, extending systemd startup timeout as SSTis not completed
2021-02-10 19:27:57 0 [Note] WSREP: Joiner waited 450 sec, extending systemd startup timeout as SSTis not completed
2021-02-10 19:28:27 0 [Note] WSREP: Joiner waited 480 sec, extending systemd startup timeout as SSTis not completed
2021-02-10 19:28:57 0 [Note] WSREP: Joiner waited 510 sec, extending systemd startup timeout as SSTis not completed
2021-02-10 19:29:27 0 [Note] WSREP: Joiner waited 540 sec, extending systemd startup timeout as SSTis not completed
2021-02-10 19:29:57 0 [Note] WSREP: Joiner waited 570 sec, extending systemd startup timeout as SSTis not completed
Terminated
WSREP_SST: [INFO] Joiner cleanup. rsync PID: 30058 (20210210 19:30:27.521)
2021-02-10 19:30:27 0 [Note] WSREP: Joiner waited 600 sec, extending systemd startup timeout as SSTis not completed
WSREP_SST: [INFO] Joiner cleanup done. (20210210 19:30:28.027)
2021-02-10 19:30:28 0 [ERROR] WSREP: Process completed with error: wsrep_sst_rsync --role 'joiner' --address '172.30.33.
190' --datadir '/var/lib/mysql/' --parent '29986' --mysqld-args --wsrep_start_position=00000000-0000-0000-0000-000000000
000:-1: 3 (No such process)
2021-02-10 19:30:28 0 [ERROR] WSREP: Failed to read uuid:seqno and wsrep_gtid_domain_id from joiner script.
2021-02-10 19:30:28 3 [Note] WSREP: SST received
2021-02-10 19:30:28 3 [Note] WSREP: SST received: 00000000-0000-0000-0000-000000000000:-1
2021-02-10 19:30:28 0 [Note] WSREP: Joiner monitor thread ended with total time 601 sec
2021-02-10 19:30:28 2 [ERROR] WSREP: Application received wrong state:
Received: 00000000-0000-0000-0000-000000000000
Required: ab5dc5b8-5414-11ea-8dc3-c6f77e135d98
2021-02-10 19:30:28 2 [ERROR] WSREP: Application state transfer failed. This is unrecoverable condition, restart require
d.
2021-02-10 19:30:28 2 [Note] WSREP: ReplicatorSMM::abort()
2021-02-10 19:30:28 2 [Note] WSREP: Closing send monitor...
2021-02-10 19:30:28 2 [Note] WSREP: Closed send monitor.
2021-02-10 19:30:28 2 [Note] WSREP: gcomm: terminating thread
Donor:
2021-02-10 19:30:36 0 [Note] WSREP: Member 2.0 (galera-6) requested state transfer from '*any*'. Selected 0.0 (galera-5)(SYNCED) as donor.
2021-02-10 19:30:36 0 [Note] WSREP: Shifting SYNCED -> DONOR/DESYNCED (TO: 3780880)
2021-02-10 19:30:36 12 [Note] WSREP: Detected STR version: 1, req_len: 49, req: STRv1
2021-02-10 19:30:36 0 [Note] WSREP: 0.0 (galera-5): State transfer to 2.0 (galera-6) complete.
2021-02-10 19:30:36 0 [Note] WSREP: Shifting DONOR/DESYNCED -> JOINED (TO: 3780880)
2021-02-10 19:30:36 0 [Note] WSREP: Member 0.0 (galera-5) synced with group.
2021-02-10 19:30:36 0 [Note] WSREP: Shifting JOINED -> SYNCED (TO: 3780880)
2021-02-10 19:30:36 2 [Note] WSREP: Server galera-5 synced with group
The configuration across galera-4,galera-5, and galera-6 are identical so I’m not reasonably confident it’s not a configuration issue (although I may be proven wrong). The other nodes had default start timeouts and did not encounter any issues. I have extended the timeout to 1800 seconds and it still fails.
This specific error seems to be uncommon. Where should be looking?