Galera SST failure on additional node

I have 3 servers running MariaDB 1.4.17 and Galera 4-26.4.6 on CentOS 7 (to be identified as galera-4 thru galera-6) being migrated to from another running, synchronized set of MariaDB / Galera servers (to be identified as galera-1 through galera-3). I added the IP addresses of all 3 new servers to the existing and restarted to pickup the changes. New servers galera-4 and galera-5 then joined the cluster with no issues. However server galera-6 is unable to join the cluster. I cannot find any issues with firewall connectivity.

Everything looks normal in the logs until this stage:

2021-02-10 19:20:27 2 [Note] WSREP: State transfer required:
        Group state: ab5dc5b8-5414-11ea-8dc3-c6f77e135d98:3780274
        Local state: 00000000-0000-0000-0000-000000000000:-1
2021-02-10 19:20:27 2 [Note] [Debug] WSREP: server galera-6 state change: connected -> joiner
2021-02-10 19:20:27 2 [Note] WSREP: Server status change connected -> joiner
2021-02-10 19:20:27 2 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
2021-02-10 19:20:27 2 [Note] WSREP: wsrep_guess_ip() : 172.30.33.190
2021-02-10 19:20:27 0 [Note] WSREP: Joiner monitor thread started to monitor
2021-02-10 19:20:27 0 [Note] WSREP: Running: 'wsrep_sst_rsync --role 'joiner' --address '172.30.33.190' --datadir '/var/
lib/mysql/' --parent '29986' --mysqld-args --wsrep_start_position=00000000-0000-0000-0000-000000000000:-1'
WSREP_SST: [INFO] Role: joiner (20210210 19:20:27.890)
WSREP_SST: [INFO] Starting rsync process as joiner (20210210 19:20:27.893)
WSREP_SST: [INFO] rsync listening on 172.30.33.190:4444 (20210210 19:20:27.899)
2021-02-10 19:20:27 2 [Note] WSREP: Prepared SST request: rsync|172.30.33.190:4444/rsync_sst
2021-02-10 19:20:27 2 [Warning] WSREP: Failed to prepare for incremental state transfer: Local state UUID (00000000-0000
-0000-0000-000000000000) does not match group state UUID (ab5dc5b8-5414-11ea-8dc3-c6f77e135d98): 1 (Operation not permit
ted)
         at galera/src/replicator_str.cpp:prepare_for_IST():616. IST will be unavailable.
2021-02-10 19:20:27 0 [Note] WSREP: Member 0.0 (galera-6) requested state transfer from '*any*'. Selected 1.0 (galera-5)(SYNCED) as donor.
2021-02-10 19:20:27 0 [Note] WSREP: Shifting PRIMARY -> JOINER (TO: 3780275)
2021-02-10 19:20:27 2 [Note] WSREP: Requesting state transfer: success, donor: 1
2021-02-10 19:20:27 2 [Note] WSREP: Resetting GCache seqno map due to different histories.
2021-02-10 19:20:27 2 [Note] WSREP: GCache history reset: ab5dc5b8-5414-11ea-8dc3-c6f77e135d98:0 -> ab5dc5b8-5414-11ea-8
dc3-c6f77e135d98:3780274
2021-02-10 19:20:27 0 [Note] WSREP: 1.0 (galera-5): State transfer to 0.0 (galera-6) complete.
2021-02-10 19:20:27 0 [Note] WSREP: Member 1.0 (galera-5) synced with group.
2021-02-10 19:20:30 0 [Note] WSREP: (0821022c-8605, 'ssl://0.0.0.0:4567') turning message relay requesting off
2021-02-10 19:20:57 0 [Note] WSREP: Joiner waited 30 sec, extending systemd startup timeout as SSTis not completed
2021-02-10 19:21:27 0 [Note] WSREP: Joiner waited 60 sec, extending systemd startup timeout as SSTis not completed
2021-02-10 19:21:57 0 [Note] WSREP: Joiner waited 90 sec, extending systemd startup timeout as SSTis not completed
2021-02-10 19:22:27 0 [Note] WSREP: Joiner waited 120 sec, extending systemd startup timeout as SSTis not completed
2021-02-10 19:22:57 0 [Note] WSREP: Joiner waited 150 sec, extending systemd startup timeout as SSTis not completed
2021-02-10 19:23:27 0 [Note] WSREP: Joiner waited 180 sec, extending systemd startup timeout as SSTis not completed
2021-02-10 19:23:57 0 [Note] WSREP: Joiner waited 210 sec, extending systemd startup timeout as SSTis not completed
2021-02-10 19:24:27 0 [Note] WSREP: Joiner waited 240 sec, extending systemd startup timeout as SSTis not completed
2021-02-10 19:24:57 0 [Note] WSREP: Joiner waited 270 sec, extending systemd startup timeout as SSTis not completed
2021-02-10 19:25:27 0 [Note] WSREP: Joiner waited 300 sec, extending systemd startup timeout as SSTis not completed
2021-02-10 19:25:57 0 [Note] WSREP: Joiner waited 330 sec, extending systemd startup timeout as SSTis not completed
2021-02-10 19:26:27 0 [Note] WSREP: Joiner waited 360 sec, extending systemd startup timeout as SSTis not completed
2021-02-10 19:26:57 0 [Note] WSREP: Joiner waited 390 sec, extending systemd startup timeout as SSTis not completed
2021-02-10 19:27:27 0 [Note] WSREP: Joiner waited 420 sec, extending systemd startup timeout as SSTis not completed
2021-02-10 19:27:57 0 [Note] WSREP: Joiner waited 450 sec, extending systemd startup timeout as SSTis not completed
2021-02-10 19:28:27 0 [Note] WSREP: Joiner waited 480 sec, extending systemd startup timeout as SSTis not completed
2021-02-10 19:28:57 0 [Note] WSREP: Joiner waited 510 sec, extending systemd startup timeout as SSTis not completed
2021-02-10 19:29:27 0 [Note] WSREP: Joiner waited 540 sec, extending systemd startup timeout as SSTis not completed
2021-02-10 19:29:57 0 [Note] WSREP: Joiner waited 570 sec, extending systemd startup timeout as SSTis not completed
Terminated
WSREP_SST: [INFO] Joiner cleanup. rsync PID: 30058 (20210210 19:30:27.521)
2021-02-10 19:30:27 0 [Note] WSREP: Joiner waited 600 sec, extending systemd startup timeout as SSTis not completed
WSREP_SST: [INFO] Joiner cleanup done. (20210210 19:30:28.027)
2021-02-10 19:30:28 0 [ERROR] WSREP: Process completed with error: wsrep_sst_rsync --role 'joiner' --address '172.30.33.
190' --datadir '/var/lib/mysql/' --parent '29986' --mysqld-args --wsrep_start_position=00000000-0000-0000-0000-000000000
000:-1: 3 (No such process)
2021-02-10 19:30:28 0 [ERROR] WSREP: Failed to read uuid:seqno and wsrep_gtid_domain_id from joiner script.
2021-02-10 19:30:28 3 [Note] WSREP: SST received
2021-02-10 19:30:28 3 [Note] WSREP: SST received: 00000000-0000-0000-0000-000000000000:-1
2021-02-10 19:30:28 0 [Note] WSREP: Joiner monitor thread ended with total time 601 sec
2021-02-10 19:30:28 2 [ERROR] WSREP: Application received wrong state:
        Received: 00000000-0000-0000-0000-000000000000
        Required: ab5dc5b8-5414-11ea-8dc3-c6f77e135d98
2021-02-10 19:30:28 2 [ERROR] WSREP: Application state transfer failed. This is unrecoverable condition, restart require
d.
2021-02-10 19:30:28 2 [Note] WSREP: ReplicatorSMM::abort()
2021-02-10 19:30:28 2 [Note] WSREP: Closing send monitor...
2021-02-10 19:30:28 2 [Note] WSREP: Closed send monitor.
2021-02-10 19:30:28 2 [Note] WSREP: gcomm: terminating thread

Donor:

2021-02-10 19:30:36 0 [Note] WSREP: Member 2.0 (galera-6) requested state transfer from '*any*'. Selected 0.0 (galera-5)(SYNCED) as donor.
2021-02-10 19:30:36 0 [Note] WSREP: Shifting SYNCED -> DONOR/DESYNCED (TO: 3780880)
2021-02-10 19:30:36 12 [Note] WSREP: Detected STR version: 1, req_len: 49, req: STRv1
2021-02-10 19:30:36 0 [Note] WSREP: 0.0 (galera-5): State transfer to 2.0 (galera-6) complete.
2021-02-10 19:30:36 0 [Note] WSREP: Shifting DONOR/DESYNCED -> JOINED (TO: 3780880)
2021-02-10 19:30:36 0 [Note] WSREP: Member 0.0 (galera-5) synced with group.
2021-02-10 19:30:36 0 [Note] WSREP: Shifting JOINED -> SYNCED (TO: 3780880)
2021-02-10 19:30:36 2 [Note] WSREP: Server galera-5 synced with group

The configuration across galera-4,galera-5, and galera-6 are identical so I’m not reasonably confident it’s not a configuration issue (although I may be proven wrong). The other nodes had default start timeouts and did not encounter any issues. I have extended the timeout to 1800 seconds and it still fails.

This specific error seems to be uncommon. Where should be looking?

1 Like

Hi @Solarbell - welcome to the Percona Forums!

Looking at the JOINER log on galera-6 , can you confirm you have an empty datadir, specifically that there are no grastate.dat files laying around?

Have you tried with SST using Percona XtraBackup?

Have you confirmed that the SST rsync port is open 4444/tcp between all nodes?

1 Like

Yes, I have tried it with (and without) the entire /var/lib/mysql/* cleared.

I have not tried with XtraBackup as I am not running the Percona packages.

I have confirmed with tcp connections and monitoring tcpdump that each server is able to reach port 4444.

1 Like

Running percona packages is not required to use the xtrabackup-v2 SST method. Percona’s software works with all flavors of MySQL, including MariaDB. rsync SST method is pretty much deprecated due to the massive impact of table locking required in order to achieve a consistent snapshot.

Can you also confirm that selinux is set to disabled or permissive mode on donor/joiner?

1 Like

Selinux is disabled on all nodes. I will look into xtrabackup, but it’s a little odd since it worked for every other node.

1 Like

Hi @Solarbell
One thing I notice is that your timestamps aren’t lining up, for example the JOINER is failing SST at 19:30:28 but the DONOR isn’t reporting galera-6 state until 19:30:36. Do you have ntp or chrony running?

1 Like

Chrony is configured and running on all servers. Manually checking the date on each shows them to be within a second or less.

1 Like

Hi @Solarbell thank you for confirming. Then this is an issue that your timestamp of events don’t line up - you’re showing a slew of start and end activity on the JOINER (ending at 19:30:28) and your DONOR log only starts at 19:30:36. The timestamps need to line up otherwise we’re looking at different events here.

So my ask is to gather logs from three hosts in the Galera cluster for the same join event (all timestamps ought to / must line up):

  1. JOINER
  2. DONOR
  3. Another member of the cluster

Having the data from three nodes for the same event ought to describe the reason for the failure case.

Finally can you please enable wsrep_debug option in the [mysqld] section of your configuration file, on at least the JOINER node (and ideally on the DONOR - look at using wsrep_sst_donor option to ensure you hit the right member).

[mysqld]
wsrep_debug=CLIENT
wsrep_sst_donor=galera-5
1 Like

Here are all of the logs from a test run tonight. I set wsrep_debug=1 on both JOINER and DONOR. My apologies for the long logs. Start timeout was set to 600 seconds on the JOINER. The DONOR I snipped some of the normal galera activity to keep the logs from being too crazy long.

Sorry for the long posts, but I’m not allowed to upload yet.

JOINER:

2021-02-12  3:36:10 0 [Note] WSREP: wsrep_guess_ip() : 172.30.33.190
2021-02-12  3:36:10 0 [Note] WSREP: wsrep_guess_ip() : 172.30.33.190
2021-02-12  3:36:10 0 [Note] WSREP: Server initial position: 00000000-0000-0000-0000-000000000000:-1
2021-02-12  3:36:10 0 [Note] WSREP: Loading provider /usr/lib64/galera/libgalera_smm.so initial position: 00000000-0000-0000-0000-000000000000:-1
2021-02-12  3:36:10 0 [Note] WSREP: wsrep_load(): loading provider library '/usr/lib64/galera/libgalera_smm.so'
2021-02-12  3:36:10 0 [Note] WSREP: wsrep_load(): Galera 26.4.6(r1d8d67c) by Codership Oy <info@codership.com> loaded successfully.
2021-02-12  3:36:10 0 [Note] WSREP: CRC-32C: using 64-bit x86 acceleration.
2021-02-12  3:36:10 0 [Warning] WSREP: Could not open state file for reading: '/var/lib/mysql//grastate.dat'
2021-02-12  3:36:10 0 [Note] WSREP: Found saved state: 00000000-0000-0000-0000-000000000000:-1, safe_to_bootstrap: 1
2021-02-12  3:36:10 0 [Note] WSREP: GCache DEBUG: opened preamble:
Version: 0
UUID: 00000000-0000-0000-0000-000000000000
Seqno: -1 - -1
Offset: -1
Synced: 0
2021-02-12  3:36:10 0 [Note] WSREP: Skipped GCache ring buffer recovery: could not determine history UUID.
2021-02-12  3:36:10 0 [Note] WSREP: Passing config to GCS: base_dir = /var/lib/mysql/; base_host = 172.30.33.190; base_port = 4567; cert.log_conflicts = no; cert.optimistic_pa = yes; debug = no; evs.auto_evict = 0; evs.delay_margin = PT1S; evs.delayed_keep_period = PT30S; evs.inactive_check_period = PT0.5S; evs.inactive_timeout = PT15S; evs.join_retrans_period = PT1S; evs.max_install_timeouts = 3; evs.send_window = 4; evs.stats_report_period = PT1M; evs.suspect_timeout = PT5S; evs.user_send_window = 2; evs.view_forget_timeout = PT24H; gcache.dir = /var/lib/mysql/; gcache.keep_pages_size = 0; gcache.mem_size = 0; gcache.name = galera.cache; gcache.page_size = 128M; gcache.recover = yes; gcache.size = 128M; gcomm.thread_prio = ; gcs.fc_debug = 0; gcs.fc_factor = 1.0; gcs.fc_limit = 16; gcs.fc_master_slave = no; gcs.max_packet_size = 64500; gcs.max_throttle = 0.25; gcs.recv_q_hard_limit = 9223372036854775807; gcs.recv_q_soft_limit = 0.25; gcs.sync_donor = no; gmcast.segment = 0; gmcast.version = 0; pc.announce_timeout = PT3S; pc.checksum =
2021-02-12  3:36:10 0 [Note] WSREP: SR storage init for: table
2021-02-12  3:36:10 0 [Note] WSREP: wsrep_start_replication
2021-02-12  3:36:10 0 [Note] WSREP: Start replication
2021-02-12  3:36:10 0 [Note] WSREP: Connecting with bootstrap option: 0
2021-02-12  3:36:10 0 [Note] WSREP: Setting GCS initial position to 00000000-0000-0000-0000-000000000000:-1
2021-02-12  3:36:10 0 [Note] WSREP: protonet asio version 0
2021-02-12  3:36:10 0 [Note] WSREP: Using CRC-32C for message checksums.
2021-02-12  3:36:10 0 [Note] WSREP: initializing ssl context
2021-02-12  3:36:10 0 [Note] WSREP: backend: asio
2021-02-12  3:36:10 0 [Note] WSREP: gcomm thread scheduling priority set to other:0
2021-02-12  3:36:10 0 [Warning] WSREP: access file(/var/lib/mysql//gvwstate.dat) failed(No such file or directory)
2021-02-12  3:36:10 0 [Note] WSREP: restore pc from disk failed
2021-02-12  3:36:10 0 [Note] WSREP: GMCast version 0
2021-02-12  3:36:10 0 [Note] WSREP: (72d4871e-bc74, 'ssl://0.0.0.0:4567') listening at ssl://0.0.0.0:4567
2021-02-12  3:36:10 0 [Note] WSREP: (72d4871e-bc74, 'ssl://0.0.0.0:4567') multicast: , ttl: 1
2021-02-12  3:36:10 0 [Note] WSREP: EVS version 1
2021-02-12  3:36:10 0 [Note] WSREP: gcomm: connecting to group 'my_wsrep_cluster', peer '172.32.32.6:,172.32.34.166:,172.32.37.242:,172.30.33.77:,172.30.35.61:,172.30.33.190:'
2021-02-12  3:36:10 0 [Note] WSREP: SSL handshake successful, remote endpoint ssl://172.32.32.6:4567 local endpoint ssl://172.30.33.190:41216 cipher: ECDHE-RSA-AES256-GCM-SHA384 compression: none
2021-02-12  3:36:10 0 [Note] WSREP: SSL handshake successful, remote endpoint ssl://172.30.33.77:4567 local endpoint ssl://172.30.33.190:39104 cipher: ECDHE-RSA-AES256-GCM-SHA384 compression: none
2021-02-12  3:36:10 0 [Note] WSREP: SSL handshake successful, remote endpoint ssl://172.30.33.190:48180 local endpoint ssl://172.30.33.190:4567 cipher: ECDHE-RSA-AES256-GCM-SHA384 compression: none
2021-02-12  3:36:10 0 [Note] WSREP: SSL handshake successful, remote endpoint ssl://172.30.33.190:4567 local endpoint ssl://172.30.33.190:48180 cipher: ECDHE-RSA-AES256-GCM-SHA384 compression: none
2021-02-12  3:36:10 0 [Note] WSREP: SSL handshake successful, remote endpoint ssl://172.30.35.61:4567 local endpoint ssl://172.30.33.190:38322 cipher: ECDHE-RSA-AES256-GCM-SHA384 compression: none
2021-02-12  3:36:10 0 [Note] WSREP: SSL handshake successful, remote endpoint ssl://172.32.34.166:4567 local endpoint ssl://172.30.33.190:48830 cipher: ECDHE-RSA-AES256-GCM-SHA384 compression: none
2021-02-12  3:36:10 0 [Note] WSREP: (72d4871e-bc74, 'ssl://0.0.0.0:4567') Found matching local endpoint for a connection, blacklisting address ssl://172.30.33.190:4567
2021-02-12  3:36:10 0 [Note] WSREP: (72d4871e-bc74, 'ssl://0.0.0.0:4567') connection established to 936cf414-b980 ssl://172.30.33.77:4567
2021-02-12  3:36:10 0 [Note] WSREP: (72d4871e-bc74, 'ssl://0.0.0.0:4567') turning message relay requesting on, nonlive peers:
2021-02-12  3:36:10 0 [Note] WSREP: (72d4871e-bc74, 'ssl://0.0.0.0:4567') connection established to aac1418b-8e23 ssl://172.32.32.6:4567
2021-02-12  3:36:10 0 [Note] WSREP: SSL handshake successful, remote endpoint ssl://172.32.37.242:4567 local endpoint ssl://172.30.33.190:38508 cipher: ECDHE-RSA-AES256-GCM-SHA384 compression: none
2021-02-12  3:36:10 0 [Note] WSREP: (72d4871e-bc74, 'ssl://0.0.0.0:4567') connection established to f395a896-9f5f ssl://172.32.34.166:4567
2021-02-12  3:36:10 0 [Note] WSREP: (72d4871e-bc74, 'ssl://0.0.0.0:4567') connection established to ed0197e1-8036 ssl://172.30.35.61:4567
2021-02-12  3:36:10 0 [Note] WSREP: (72d4871e-bc74, 'ssl://0.0.0.0:4567') connection established to 6984ad01-ba5f ssl://172.32.37.242:4567
2021-02-12  3:36:10 0 [Note] WSREP: declaring 6984ad01-ba5f at ssl://172.32.37.242:4567 stable
2021-02-12  3:36:10 0 [Note] WSREP: declaring 936cf414-b980 at ssl://172.30.33.77:4567 stable
2021-02-12  3:36:10 0 [Note] WSREP: declaring aac1418b-8e23 at ssl://172.32.32.6:4567 stable
2021-02-12  3:36:10 0 [Note] WSREP: declaring ed0197e1-8036 at ssl://172.30.35.61:4567 stable
2021-02-12  3:36:10 0 [Note] WSREP: declaring f395a896-9f5f at ssl://172.32.34.166:4567 stable
2021-02-12  3:36:10 0 [Note] WSREP: Node 6984ad01-ba5f state prim
2021-02-12  3:36:10 0 [Note] WSREP: view(view_id(PRIM,6984ad01-ba5f,247) memb {
	6984ad01-ba5f,0
	72d4871e-bc74,0
	936cf414-b980,0
	aac1418b-8e23,0
	ed0197e1-8036,0
	f395a896-9f5f,0
} joined {
} left {
} partitioned {
})
2021-02-12  3:36:10 0 [Note] WSREP: save pc into disk
2021-02-12  3:36:11 0 [Note] WSREP: gcomm: connected
2021-02-12  3:36:11 0 [Note] WSREP: Changing maximum packet size to 64500, resulting msg size: 32636
2021-02-12  3:36:11 0 [Note] WSREP: Shifting CLOSED -> OPEN (TO: 0)
2021-02-12  3:36:11 0 [Note] WSREP: Opened channel 'my_wsrep_cluster'
2021-02-12  3:36:11 0 [Note] WSREP: New COMPONENT: primary = yes, bootstrap = no, my_idx = 1, memb_num = 6
2021-02-12  3:36:11 0 [Note] WSREP: STATE EXCHANGE: Waiting for state UUID.
2021-02-12  3:36:11 0 [Note] WSREP: STATE EXCHANGE: sent state msg: 7322775d-6ce3-11eb-b462-1254a6b23145
2021-02-12  3:36:11 0 [Note] WSREP: STATE EXCHANGE: got state msg: 7322775d-6ce3-11eb-b462-1254a6b23145 from 0 (galera-1)
2021-02-12  3:36:11 0 [Note] WSREP: STATE EXCHANGE: got state msg: 7322775d-6ce3-11eb-b462-1254a6b23145 from 2 (galera-4)
2021-02-12  3:36:11 2 [Note] WSREP: wsrep system thread 2, 0x7f44000009a8 starting
2021-02-12  3:36:11 0 [Note] WSREP: STATE EXCHANGE: got state msg: 7322775d-6ce3-11eb-b462-1254a6b23145 from 3 (galera-1)
2021-02-12  3:36:11 2 [Note] WSREP: Starting applier thread 2
2021-02-12  3:36:11 0 [Note] WSREP: STATE EXCHANGE: got state msg: 7322775d-6ce3-11eb-b462-1254a6b23145 from 4 (galera-5)
2021-02-12  3:36:11 0 [Note] WSREP: STATE EXCHANGE: got state msg: 7322775d-6ce3-11eb-b462-1254a6b23145 from 5 (galera-1)
2021-02-12  3:36:11 1 [Note] WSREP: wsrep system thread 1, 0x7f440c0009a8 starting
2021-02-12  3:36:11 1 [Note] WSREP: Starting rollbacker thread 1
2021-02-12  3:36:11 0 [Note] WSREP: STATE EXCHANGE: got state msg: 7322775d-6ce3-11eb-b462-1254a6b23145 from 1 (galera-6)
2021-02-12  3:36:11 0 [Note] WSREP: Quorum results:
	version    = 4,
	component  = PRIMARY,
	conf_id    = 207,
	members    = 5/6 (joined/total),
	act_id     = 3887411,
	last_appl. = 0,
	protocols  = 0/9/3 (gcs/repl/appl),
	vote policy= 1,
	group UUID = ab5dc5b8-5414-11ea-8dc3-c6f77e135d98
2021-02-12  3:36:11 0 [Note] WSREP: Flow-control interval: [39, 39]
2021-02-12  3:36:11 0 [Note] WSREP: Shifting OPEN -> PRIMARY (TO: 3887411)
2021-02-12  3:36:11 2 [Note] WSREP: ####### processing CC 3887411, local, unordered
2021-02-12  3:36:11 2 [Note] WSREP: Process first view: ab5dc5b8-5414-11ea-8dc3-c6f77e135d98 my uuid: 72d4871e-6ce3-11eb-bc74-862688521489
2021-02-12  3:36:11 2 [Note] WSREP: Server galera-6 connected to cluster at position ab5dc5b8-5414-11ea-8dc3-c6f77e135d98:3887411 with ID 72d4871e-6ce3-11eb-bc74-862688521489
2021-02-12  3:36:11 2 [Note] [Debug] WSREP: server galera-6 state change: disconnected -> connected
2021-02-12  3:36:11 2 [Note] WSREP: Server status change disconnected -> connected
2021-02-12  3:36:11 2 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
2021-02-12  3:36:11 2 [Note] WSREP: ####### My UUID: 72d4871e-6ce3-11eb-bc74-862688521489
2021-02-12  3:36:11 2 [Note] WSREP: Cert index reset to ab5dc5b8-5414-11ea-8dc3-c6f77e135d98:3887411 (proto: 9), state transfer needed: yes
2021-02-12  3:36:11 0 [Note] WSREP: Service thread queue flushed.
2021-02-12  3:36:11 2 [Note] WSREP: ####### Assign initial position for certification: ab5dc5b8-5414-11ea-8dc3-c6f77e135d98:3887411, protocol version: 4
2021-02-12  3:36:11 2 [Note] WSREP: State transfer required:
	Group state: ab5dc5b8-5414-11ea-8dc3-c6f77e135d98:3887411
	Local state: 00000000-0000-0000-0000-000000000000:-1
2021-02-12  3:36:11 2 [Note] [Debug] WSREP: server galera-6 state change: connected -> joiner
2021-02-12  3:36:11 2 [Note] WSREP: Server status change connected -> joiner
2021-02-12  3:36:11 2 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
2021-02-12  3:36:11 2 [Note] WSREP: wsrep_guess_ip() : 172.30.33.190
2021-02-12  3:36:11 0 [Note] WSREP: Joiner monitor thread started to monitor
2021-02-12  3:36:11 0 [Note] WSREP: Running: 'wsrep_sst_rsync --role 'joiner' --address '172.30.33.190' --datadir '/var/lib/mysql/' --parent '816' --mysqld-args --wsrep_start_position=00000000-0000-0000-0000-000000000000:-1'
2021-02-12  3:36:11 2 [Note] WSREP: Prepared SST request: rsync|172.30.33.190:4444/rsync_sst
2021-02-12  3:36:11 2 [Warning] WSREP: Failed to prepare for incremental state transfer: Local state UUID (00000000-0000-0000-0000-000000000000) does not match group state UUID (ab5dc5b8-5414-11ea-8dc3-c6f77e135d98): 1 (Operation not permitted)
	 at galera/src/replicator_str.cpp:prepare_for_IST():616. IST will be unavailable.
2021-02-12  3:36:11 0 [Note] WSREP: Member 1.0 (galera-6) requested state transfer from 'galera-5'. Selected 4.0 (galera-5)(SYNCED) as donor.
2021-02-12  3:36:11 0 [Note] WSREP: Shifting PRIMARY -> JOINER (TO: 3887411)
2021-02-12  3:36:11 2 [Note] WSREP: Requesting state transfer: success, donor: 4
2021-02-12  3:36:11 2 [Note] WSREP: Resetting GCache seqno map due to different histories.
2021-02-12  3:36:11 2 [Note] WSREP: GCache history reset: 00000000-0000-0000-0000-000000000000:0 -> ab5dc5b8-5414-11ea-8dc3-c6f77e135d98:3887411
2021-02-12  3:36:11 0 [Note] WSREP: 4.0 (galera-5): State transfer to 1.0 (galera-6) complete.
2021-02-12  3:36:11 0 [Note] WSREP: Member 4.0 (galera-5) synced with group.
2021-02-12  3:36:13 0 [Note] WSREP: (72d4871e-bc74, 'ssl://0.0.0.0:4567') turning message relay requesting off
2021-02-12  3:36:41 0 [Note] WSREP: Joiner waited 30 sec, extending systemd startup timeout as SSTis not completed
2021-02-12  3:37:11 0 [Note] WSREP: Joiner waited 60 sec, extending systemd startup timeout as SSTis not completed
2021-02-12  3:37:41 0 [Note] WSREP: Joiner waited 90 sec, extending systemd startup timeout as SSTis not completed
2021-02-12  3:38:11 0 [Note] WSREP: Joiner waited 120 sec, extending systemd startup timeout as SSTis not completed
2021-02-12  3:38:41 0 [Note] WSREP: Joiner waited 150 sec, extending systemd startup timeout as SSTis not completed
2021-02-12  3:39:11 0 [Note] WSREP: Joiner waited 180 sec, extending systemd startup timeout as SSTis not completed
2021-02-12  3:39:41 0 [Note] WSREP: Joiner waited 210 sec, extending systemd startup timeout as SSTis not completed
2021-02-12  3:40:11 0 [Note] WSREP: Joiner waited 240 sec, extending systemd startup timeout as SSTis not completed
2021-02-12  3:40:41 0 [Note] WSREP: Joiner waited 270 sec, extending systemd startup timeout as SSTis not completed
2021-02-12  3:41:11 0 [Note] WSREP: Joiner waited 300 sec, extending systemd startup timeout as SSTis not completed
2021-02-12  3:41:41 0 [Note] WSREP: Joiner waited 330 sec, extending systemd startup timeout as SSTis not completed
2021-02-12  3:42:11 0 [Note] WSREP: Joiner waited 360 sec, extending systemd startup timeout as SSTis not completed
2021-02-12  3:42:41 0 [Note] WSREP: Joiner waited 390 sec, extending systemd startup timeout as SSTis not completed
2021-02-12  3:43:11 0 [Note] WSREP: Joiner waited 420 sec, extending systemd startup timeout as SSTis not completed
2021-02-12  3:43:41 0 [Note] WSREP: Joiner waited 450 sec, extending systemd startup timeout as SSTis not completed
2021-02-12  3:44:11 0 [Note] WSREP: Joiner waited 480 sec, extending systemd startup timeout as SSTis not completed
2021-02-12  3:44:41 0 [Note] WSREP: Joiner waited 510 sec, extending systemd startup timeout as SSTis not completed
2021-02-12  3:45:11 0 [Note] WSREP: Joiner waited 540 sec, extending systemd startup timeout as SSTis not completed
2021-02-12  3:45:41 0 [Note] WSREP: Joiner waited 570 sec, extending systemd startup timeout as SSTis not completed
Terminated
WSREP_SST: [INFO] Joiner cleanup. rsync PID: 878 (20210212 03:46:10.521)
WSREP_SST: [INFO] Joiner cleanup done. (20210212 03:46:11.027)
2021-02-12  3:46:11 0 [ERROR] WSREP: Process completed with error: wsrep_sst_rsync --role 'joiner' --address '172.30.33.190' --datadir '/var/lib/mysql/' --parent '816' --mysqld-args --wsrep_start_position=00000000-0000-0000-0000-000000000000:-1: 3 (No such process)
2021-02-12  3:46:11 0 [ERROR] WSREP: Failed to read uuid:seqno and wsrep_gtid_domain_id from joiner script.
2021-02-12  3:46:11 3 [Note] WSREP: SST received
2021-02-12  3:46:11 3 [Note] WSREP: SST received: 00000000-0000-0000-0000-000000000000:-1
2021-02-12  3:46:11 2 [ERROR] WSREP: Application received wrong state:
	Received: 00000000-0000-0000-0000-000000000000
	Required: ab5dc5b8-5414-11ea-8dc3-c6f77e135d98
2021-02-12  3:46:11 2 [ERROR] WSREP: Application state transfer failed. This is unrecoverable condition, restart required.
2021-02-12  3:46:11 2 [Note] WSREP: ReplicatorSMM::abort()
2021-02-12  3:46:11 2 [Note] WSREP: Closing send monitor...
2021-02-12  3:46:11 2 [Note] WSREP: Closed send monitor.
2021-02-12  3:46:11 0 [Note] WSREP: Joiner monitor thread ended with total time 600 sec
2021-02-12  3:46:11 2 [Note] WSREP: gcomm: terminating thread
2021-02-12  3:46:11 2 [Note] WSREP: gcomm: joining thread
2021-02-12  3:46:11 2 [Note] WSREP: gcomm: closing backend
2021-02-12  3:46:11 2 [Note] WSREP: view(view_id(NON_PRIM,6984ad01-ba5f,247) memb {
	72d4871e-bc74,0
} joined {
} left {
} partitioned {
	6984ad01-ba5f,0
	936cf414-b980,0
	aac1418b-8e23,0
	ed0197e1-8036,0
	f395a896-9f5f,0
})
2021-02-12  3:46:11 2 [Note] WSREP: view((empty))
2021-02-12  3:46:11 2 [Note] WSREP: gcomm: closed
2021-02-12  3:46:11 0 [Note] WSREP: New COMPONENT: primary = no, bootstrap = no, my_idx = 0, memb_num = 1
2021-02-12  3:46:11 0 [Note] WSREP: Flow-control interval: [16, 16]
2021-02-12  3:46:11 0 [Note] WSREP: Received NON-PRIMARY.
2021-02-12  3:46:11 0 [Note] WSREP: Shifting JOINER -> OPEN (TO: 3887642)
2021-02-12  3:46:11 0 [Note] WSREP: New SELF-LEAVE.
2021-02-12  3:46:11 0 [Note] WSREP: Flow-control interval: [0, 0]
2021-02-12  3:46:11 0 [Note] WSREP: Received SELF-LEAVE. Closing connection.
2021-02-12  3:46:11 0 [Note] WSREP: Shifting OPEN -> CLOSED (TO: 3887642)
2021-02-12  3:46:11 0 [Note] WSREP: RECV thread exiting 0: Success
2021-02-12  3:46:11 2 [Note] WSREP: recv_thread() joined.
2021-02-12  3:46:11 2 [Note] WSREP: Closing replication queue.
2021-02-12  3:46:11 2 [Note] WSREP: Closing slave action queue.
2021-02-12  3:46:11 2 [Note] WSREP: /usr/sbin/mysqld: Terminated.
2 Likes

DONOR:

2021-02-12  3:36:10 0 [Note] WSREP: SSL handshake successful, remote endpoint ssl://172.30.33.190:38322 local endpoint s
sl://172.30.35.61:4567 cipher: ECDHE-RSA-AES256-GCM-SHA384 compression: none
2021-02-12  3:36:10 0 [Note] WSREP: (ed0197e1-8036, 'ssl://0.0.0.0:4567') connection established to 72d4871e-bc74 ssl://172.30.33.190:4567
2021-02-12  3:36:10 0 [Note] WSREP: (ed0197e1-8036, 'ssl://0.0.0.0:4567') turning message relay requesting on, nonlive peers:
2021-02-12  3:36:10 0 [Note] WSREP: declaring 6984ad01-ba5f at ssl://172.32.37.242:4567 stable
2021-02-12  3:36:10 0 [Note] WSREP: declaring 72d4871e-bc74 at ssl://172.30.33.190:4567 stable
2021-02-12  3:36:10 0 [Note] WSREP: declaring 936cf414-b980 at ssl://172.30.33.77:4567 stable
2021-02-12  3:36:10 0 [Note] WSREP: declaring aac1418b-8e23 at ssl://172.32.32.6:4567 stable
2021-02-12  3:36:10 0 [Note] WSREP: declaring f395a896-9f5f at ssl://172.32.34.166:4567 stable
2021-02-12  3:36:10 0 [Note] WSREP: Node 6984ad01-ba5f state prim
2021-02-12  3:36:10 0 [Note] WSREP: view(view_id(PRIM,6984ad01-ba5f,247) memb {
        6984ad01-ba5f,0
        72d4871e-bc74,0
        936cf414-b980,0
        aac1418b-8e23,0
        ed0197e1-8036,0
        f395a896-9f5f,0
} joined {
} left {
} partitioned {
})
2021-02-12  3:36:10 0 [Note] WSREP: save pc into disk
2021-02-12  3:36:10 0 [Note] WSREP: New COMPONENT: primary = yes, bootstrap = no, my_idx = 4, memb_num = 6
2021-02-12  3:36:10 0 [Note] WSREP: STATE EXCHANGE: Waiting for state UUID.
2021-02-12  3:36:10 0 [Note] WSREP: STATE EXCHANGE: sent state msg: 7322775d-6ce3-11eb-b462-1254a6b23145
2021-02-12  3:36:10 0 [Note] WSREP: STATE EXCHANGE: got state msg: 7322775d-6ce3-11eb-b462-1254a6b23145 from 0 (galera-1)
2021-02-12  3:36:10 0 [Note] WSREP: STATE EXCHANGE: got state msg: 7322775d-6ce3-11eb-b462-1254a6b23145 from 2 (gal
era-4)
2021-02-12  3:36:10 0 [Note] WSREP: STATE EXCHANGE: got state msg: 7322775d-6ce3-11eb-b462-1254a6b23145 from 3 (gal
era-1)
2021-02-12  3:36:10 0 [Note] WSREP: STATE EXCHANGE: got state msg: 7322775d-6ce3-11eb-b462-1254a6b23145 from 4 (gal
era-5)
2021-02-12  3:36:10 0 [Note] WSREP: STATE EXCHANGE: got state msg: 7322775d-6ce3-11eb-b462-1254a6b23145 from 5 (gal
era-1)
2021-02-12  3:36:11 0 [Note] WSREP: STATE EXCHANGE: got state msg: 7322775d-6ce3-11eb-b462-1254a6b23145 from 1 (gal
era-6)
2021-02-12  3:36:11 0 [Note] WSREP: Quorum results:
        version    = 4,
        component  = PRIMARY,
        conf_id    = 207,
        members    = 5/6 (joined/total),
        act_id     = 3887411,
        last_appl. = 0,
        protocols  = 0/9/3 (gcs/repl/appl),
        vote policy= 1,
        group UUID = ab5dc5b8-5414-11ea-8dc3-c6f77e135d98
2021-02-12  3:36:11 0 [Note] WSREP: Flow-control interval: [39, 39]
2021-02-12  3:36:11 11 [Note] WSREP: ####### processing CC 3887411, local, unordered
2021-02-12  3:36:11 11 [Note] WSREP: ####### My UUID: ed0197e1-6ce2-11eb-8036-762de9cf7f42
2021-02-12  3:36:11 11 [Note] WSREP: Cert index reset to ab5dc5b8-5414-11ea-8dc3-c6f77e135d98:3887411 (proto: 9), state
transfer needed: no
2021-02-12  3:36:11 0 [Note] WSREP: Service thread queue flushed.
2021-02-12  3:36:11 11 [Note] WSREP: ####### Assign initial position for certification: ab5dc5b8-5414-11ea-8dc3-c6f77e13
5d98:3887411, protocol version: 4
2021-02-12  3:36:11 11 [Note] WSREP: REPL Protocols: 9 (4)
2021-02-12  3:36:11 11 [Note] WSREP: ####### Adjusting cert position: 3887411 -> 3887411
2021-02-12  3:36:11 0 [Note] WSREP: Service thread queue flushed.
2021-02-12  3:36:11 11 [Note] WSREP: ================================================

View:
  id: ab5dc5b8-5414-11ea-8dc3-c6f77e135d98:3887411
  status: primary
  protocol_version: 3
  capabilities: MULTI-MASTER, CERTIFICATION, PARALLEL_APPLYING, REPLAY, ISOLATION, PAUSE, CAUSAL_READ, INCREMENTAL_WS, U
NORDERED, PREORDERED, STREAMING, NBO
  final: no
  own_index: 4
  members(6):
        0: 6984ad01-600c-11eb-ba5f-1aad3ac840e1, galera-1
        1: 72d4871e-6ce3-11eb-bc74-862688521489, galera-6
        2: 936cf414-6b20-11eb-b980-77e6f18abe33, galera-4
        3: aac1418b-5f8e-11eb-8e23-ea35715e447b, galera-1
        4: ed0197e1-6ce2-11eb-8036-762de9cf7f42, galera-5
        5: f395a896-600a-11eb-9f5f-ce0b9a1af273, galera-1
=================================================
2021-02-12  3:36:11 11 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
2021-02-12  3:36:11 11 [Note] WSREP: Restored cluster view:
  id: ab5dc5b8-5414-11ea-8dc3-c6f77e135d98:3887306
  status: primary
  protocol_version: 3
  capabilities: MULTI-MASTER, CERTIFICATION, PARALLEL_APPLYING, REPLAY, ISOLATION, PAUSE, CAUSAL_READ, INCREMENTAL_WS, U
NORDERED, PREORDERED, STREAMING, NBO
  final: no
  own_index: -1
  members(5):
        0: 6984ad01-600c-11eb-ba5f-1aad3ac840e1, galera-1
        1: 936cf414-6b20-11eb-b980-77e6f18abe33, galera-4
        2: aac1418b-5f8e-11eb-8e23-ea35715e447b, galera-1
        3: ed0197e1-6ce2-11eb-8036-762de9cf7f42, galera-5
        4: f395a896-600a-11eb-9f5f-ce0b9a1af273, galera-1

2021-02-12  3:36:11 11 [Note] WSREP: Storing cluster view:
  id: ab5dc5b8-5414-11ea-8dc3-c6f77e135d98:3887411
  status: primary
  protocol_version: 3
  capabilities: MULTI-MASTER, CERTIFICATION, PARALLEL_APPLYING, REPLAY, ISOLATION, PAUSE, CAUSAL_READ, INCREMENTAL_WS, U
NORDERED, PREORDERED, STREAMING, NBO
  final: no
  own_index: 4
  members(6):
        0: 6984ad01-600c-11eb-ba5f-1aad3ac840e1, galera-1
        1: 72d4871e-6ce3-11eb-bc74-862688521489, galera-6
        2: 936cf414-6b20-11eb-b980-77e6f18abe33, galera-4
        3: aac1418b-5f8e-11eb-8e23-ea35715e447b, galera-1
        4: ed0197e1-6ce2-11eb-8036-762de9cf7f42, galera-5
        5: f395a896-600a-11eb-9f5f-ce0b9a1af273, galera-1

2021-02-12  3:36:11 11 [Note] WSREP: SQL: 31 DELETE FROM mysql.wsrep_cluster thd: 11
2021-02-12  3:36:11 11 [Note] WSREP: SQL: 39 DELETE FROM mysql.wsrep_cluster_members thd: 11
2021-02-12  3:36:11 11 [Note] WSREP: Lowest cert index boundary for CC from group: 3887411
2021-02-12  3:36:11 11 [Note] WSREP: Min available from gcache for CC from group: 3798334
2021-02-12  3:36:11 0 [Note] WSREP: Member 1.0 (galera-6) requested state transfer from 'galera-5'. Selected 4
.0 (galera-5)(SYNCED) as donor.
2021-02-12  3:36:11 0 [Note] WSREP: Shifting SYNCED -> DONOR/DESYNCED (TO: 3887411)
2021-02-12  3:36:11 11 [Note] WSREP: Detected STR version: 1, req_len: 49, req: STRv1
2021-02-12  3:36:11 0 [Note] WSREP: 4.0 (galera-5): State transfer to 1.0 (galera-6) complete.
2021-02-12  3:36:11 0 [Note] WSREP: Shifting DONOR/DESYNCED -> JOINED (TO: 3887411)
2021-02-12  3:36:11 0 [Note] WSREP: Member 4.0 (galera-5) synced with group.
2021-02-12  3:36:11 0 [Note] WSREP: Shifting JOINED -> SYNCED (TO: 3887411)
2021-02-12  3:36:11 12 [Note] WSREP: Server galera-5 synced with group
2021-02-12  3:36:12 15 [Note] WSREP: assigned new next trx id: 126
2021-02-12  3:36:13 0 [Note] WSREP: (ed0197e1-8036, 'ssl://0.0.0.0:4567') turning message relay requesting off
2021-02-12  3:36:15 17 [Note] WSREP: assigned new next trx id: 126
2021-02-12  3:36:17 18 [Note] WSREP: assigned new next trx id: 126
2021-02-12  3:36:20 28 [Note] WSREP: assigned new next trx id: 126
2021-02-12  3:36:21 15 [Note] WSREP: assigned new next trx id: 127
2021-02-12  3:36:25 29 [Note] WSREP: assigned new next trx id: 127
2021-02-12  3:36:25 17 [Note] WSREP: assigned new next trx id: 128
—
2021-02-12  3:37:30 2 [Note] WSREP: wsrep_commit_empty(2)
2021-02-12  3:37:30 2 [Note] WSREP: assigned new next trx id: 151
2021-02-12  3:37:30 2 [Note] WSREP: wsrep_before_commit: 1, 3887432
2021-02-12  3:37:30 2 [Note] WSREP: wsrep_after_commit: 1, 1, 3887432, 0
2021-02-12  3:37:30 2 [Note] WSREP: wsrep_after_apply 2
2021-02-12  3:37:30 2 [Note] WSREP: wsrep_thd_binlog_reset
—
2021-02-12  3:46:11 0 [Note] WSREP: (ed0197e1-8036, 'ssl://0.0.0.0:4567') turning message relay requesting on, nonlive peers: ssl://172.30.33.190:4567
2021-02-12  3:46:11 0 [Note] WSREP: declaring 6984ad01-ba5f at ssl://172.32.37.242:4567 stable
2021-02-12  3:46:11 0 [Note] WSREP: declaring 936cf414-b980 at ssl://172.30.33.77:4567 stable
2021-02-12  3:46:11 0 [Note] WSREP: declaring aac1418b-8e23 at ssl://172.32.32.6:4567 stable
2021-02-12  3:46:11 0 [Note] WSREP: declaring f395a896-9f5f at ssl://172.32.34.166:4567 stable
2021-02-12  3:46:11 0 [Note] WSREP: forgetting 72d4871e-bc74 (ssl://172.30.33.190:4567)
2021-02-12  3:46:11 0 [Note] WSREP: (ed0197e1-8036, 'ssl://0.0.0.0:4567') turning message relay requesting off
2021-02-12  3:46:11 0 [Note] WSREP: Node 6984ad01-ba5f state prim
2021-02-12  3:46:11 0 [Note] WSREP: view(view_id(PRIM,6984ad01-ba5f,248) memb {
        6984ad01-ba5f,0
        936cf414-b980,0
        aac1418b-8e23,0
        ed0197e1-8036,0
        f395a896-9f5f,0
} joined {
} left {
} partitioned {
        72d4871e-bc74,0
})
2021-02-12  3:46:11 0 [Note] WSREP: save pc into disk
2021-02-12  3:46:11 0 [Note] WSREP: forgetting 72d4871e-bc74 (ssl://172.30.33.190:4567)
2021-02-12  3:46:11 0 [Note] WSREP: New COMPONENT: primary = yes, bootstrap = no, my_idx = 3, memb_num = 5
2021-02-12  3:46:11 0 [Note] WSREP: STATE EXCHANGE: Waiting for state UUID.
2021-02-12  3:46:11 0 [Note] WSREP: STATE EXCHANGE: sent state msg: d8c9d69e-6ce4-11eb-aab7-6b08dc62fc0c
2021-02-12  3:46:11 0 [Note] WSREP: STATE EXCHANGE: got state msg: d8c9d69e-6ce4-11eb-aab7-6b08dc62fc0c from 0 (galera-1)
2021-02-12  3:46:11 0 [Note] WSREP: STATE EXCHANGE: got state msg: d8c9d69e-6ce4-11eb-aab7-6b08dc62fc0c from 1 (galera-4)
2021-02-12  3:46:11 0 [Note] WSREP: STATE EXCHANGE: got state msg: d8c9d69e-6ce4-11eb-aab7-6b08dc62fc0c from 2 (galera-1)
2021-02-12  3:46:11 0 [Note] WSREP: STATE EXCHANGE: got state msg: d8c9d69e-6ce4-11eb-aab7-6b08dc62fc0c from 3 (galera-5)
2021-02-12  3:46:11 0 [Note] WSREP: STATE EXCHANGE: got state msg: d8c9d69e-6ce4-11eb-aab7-6b08dc62fc0c from 4 (galera-1)
2021-02-12  3:46:11 0 [Note] WSREP: Quorum results:
        version    = 4,
        component  = PRIMARY,
        conf_id    = 208,
        members    = 5/5 (joined/total),
        act_id     = 3887642,
        last_appl. = 3887561,
        protocols  = 0/9/3 (gcs/repl/appl),
        vote policy= 1,
        group UUID = ab5dc5b8-5414-11ea-8dc3-c6f77e135d98
2021-02-12  3:46:11 0 [Note] WSREP: Flow-control interval: [36, 36]
2021-02-12  3:46:11 11 [Note] WSREP: ####### processing CC 3887642, local, unordered
2021-02-12  3:46:11 11 [Note] WSREP: ####### My UUID: ed0197e1-6ce2-11eb-8036-762de9cf7f42
2021-02-12  3:46:11 11 [Note] WSREP: Cert index reset to ab5dc5b8-5414-11ea-8dc3-c6f77e135d98:3887642 (proto: 9), state
transfer needed: no
2021-02-12  3:46:11 0 [Note] WSREP: Service thread queue flushed.
2021-02-12  3:46:11 11 [Note] WSREP: ####### Assign initial position for certification: ab5dc5b8-5414-11ea-8dc3-c6f77e13
5d98:3887642, protocol version: 4
2021-02-12  3:46:11 11 [Note] WSREP: REPL Protocols: 9 (4)
2021-02-12  3:46:11 11 [Note] WSREP: ####### Adjusting cert position: 3887642 -> 3887642
2021-02-12  3:46:11 0 [Note] WSREP: Service thread queue flushed.
2021-02-12  3:46:11 11 [Note] WSREP: ================================================
View:
  id: ab5dc5b8-5414-11ea-8dc3-c6f77e135d98:3887642
  status: primary
  protocol_version: 3
  capabilities: MULTI-MASTER, CERTIFICATION, PARALLEL_APPLYING, REPLAY, ISOLATION, PAUSE, CAUSAL_READ, INCREMENTAL_WS, U
NORDERED, PREORDERED, STREAMING, NBO
  final: no
  own_index: 3
  members(5):
        0: 6984ad01-600c-11eb-ba5f-1aad3ac840e1, galera-1
        1: 936cf414-6b20-11eb-b980-77e6f18abe33, galera-4
        2: aac1418b-5f8e-11eb-8e23-ea35715e447b, galera-1
        3: ed0197e1-6ce2-11eb-8036-762de9cf7f42, galera-5
        4: f395a896-600a-11eb-9f5f-ce0b9a1af273, galera-1
=================================================
2021-02-12  3:46:11 11 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
2021-02-12  3:46:11 11 [Note] WSREP: Restored cluster view:
  id: ab5dc5b8-5414-11ea-8dc3-c6f77e135d98:3887411
  status: primary
  protocol_version: 3
  capabilities: MULTI-MASTER, CERTIFICATION, PARALLEL_APPLYING, REPLAY, ISOLATION, PAUSE, CAUSAL_READ, INCREMENTAL_WS, U
NORDERED, PREORDERED, STREAMING, NBO
  final: no
  own_index: -1
  members(6):
        0: 6984ad01-600c-11eb-ba5f-1aad3ac840e1, galera-1
        1: 72d4871e-6ce3-11eb-bc74-862688521489, galera-6
        2: 936cf414-6b20-11eb-b980-77e6f18abe33, galera-4
        3: aac1418b-5f8e-11eb-8e23-ea35715e447b, galera-1
        4: ed0197e1-6ce2-11eb-8036-762de9cf7f42, galera-5
        5: f395a896-600a-11eb-9f5f-ce0b9a1af273, galera-1

2021-02-12  3:46:11 11 [Note] WSREP: Storing cluster view:
  id: ab5dc5b8-5414-11ea-8dc3-c6f77e135d98:3887642
  status: primary
  protocol_version: 3
  capabilities: MULTI-MASTER, CERTIFICATION, PARALLEL_APPLYING, REPLAY, ISOLATION, PAUSE, CAUSAL_READ, INCREMENTAL_WS, U
NORDERED, PREORDERED, STREAMING, NBO
  final: no
  own_index: 3
  members(5):
        0: 6984ad01-600c-11eb-ba5f-1aad3ac840e1, galera-1
        1: 936cf414-6b20-11eb-b980-77e6f18abe33, galera-4
        2: aac1418b-5f8e-11eb-8e23-ea35715e447b, galera-1
        3: ed0197e1-6ce2-11eb-8036-762de9cf7f42, galera-5
        4: f395a896-600a-11eb-9f5f-ce0b9a1af273, galera-1

2021-02-12  3:46:11 11 [Note] WSREP: SQL: 31 DELETE FROM mysql.wsrep_cluster thd: 11
2021-02-12  3:46:11 11 [Note] WSREP: SQL: 39 DELETE FROM mysql.wsrep_cluster_members thd: 11
2021-02-12  3:46:11 11 [Note] WSREP: Lowest cert index boundary for CC from group: 3887642
2021-02-12  3:46:11 11 [Note] WSREP: Min available from gcache for CC from group: 3798603
2021-02-12  3:46:12 11 [Note] WSREP: wsrep_commit_empty(11)
2021-02-12  3:46:12 11 [Note] WSREP: assigned new next trx id: 391
2021-02-12  3:46:12 11 [Note] WSREP: wsrep_before_commit: 1, 3887643
2021-02-12  3:46:12 11 [Note] WSREP: wsrep_after_commit: 1, 1, 3887643, 0
2021-02-12  3:46:12 11 [Note] WSREP: wsrep_after_apply 11
2021-02-12  3:46:12 11 [Note] WSREP: wsrep_thd_binlog_reset
2021-02-12  3:46:12 15 [Note] WSREP: assigned new next trx id: 392
2021-02-12  3:46:15 10 [Note] WSREP: wsrep_commit_empty(10)
2021-02-12  3:46:15 10 [Note] WSREP: assigned new next trx id: 392
2021-02-12  3:46:15 10 [Note] WSREP: wsrep_before_commit: 1, 3887644
2021-02-12  3:46:15 10 [Note] WSREP: wsrep_after_commit: 1, 1, 3887644, 0
2021-02-12  3:46:15 10 [Note] WSREP: wsrep_after_apply 10
2021-02-12  3:46:15 10 [Note] WSREP: wsrep_thd_binlog_reset
2021-02-12  3:46:15 17 [Note] WSREP: assigned new next trx id: 393
2021-02-12  3:46:16 0 [Note] WSREP:  cleaning up 72d4871e-bc74 (ssl://172.30.33.190:4567)
2021-02-12  3:46:16 12 [Note] WSREP: wsrep_commit_empty(12)
2021-02-12  3:46:16 12 [Note] WSREP: assigned new next trx id: 393
2021-02-12  3:46:16 12 [Note] WSREP: wsrep_before_commit: 1, 3887645
2021-02-12  3:46:16 12 [Note] WSREP: wsrep_after_commit: 1, 1, 3887645, 0
2021-02-12  3:46:16 12 [Note] WSREP: wsrep_after_apply 12
2021-02-12  3:46:16 12 [Note] WSREP: wsrep_thd_binlog_reset
2021-02-12  3:46:17 18 [Note] WSREP: assigned new next trx id: 394
2 Likes

MEMBER:

2021-02-12  3:36:10 0 [Note] WSREP: SSL handshake successful, remote endpoint ssl://172.30.33.190:39104 local endpoint ssl://172.30.33.77:4567 cipher: ECDHE-RSA-AES256-GCM-SHA384 compression: none
2021-02-12  3:36:10 0 [Note] WSREP: (936cf414-b980, 'ssl://0.0.0.0:4567') connection established to 72d4871e-bc74 ssl://172.30.33.190:4567
2021-02-12  3:36:10 0 [Note] WSREP: (936cf414-b980, 'ssl://0.0.0.0:4567') turning message relay requesting on, nonlive peers:
2021-02-12  3:36:10 0 [Note] WSREP: declaring 6984ad01-ba5f at ssl://172.32.37.242:4567 stable
2021-02-12  3:36:10 0 [Note] WSREP: declaring 72d4871e-bc74 at ssl://172.30.33.190:4567 stable
2021-02-12  3:36:10 0 [Note] WSREP: declaring aac1418b-8e23 at ssl://172.32.32.6:4567 stable
2021-02-12  3:36:10 0 [Note] WSREP: declaring ed0197e1-8036 at ssl://172.30.35.61:4567 stable
2021-02-12  3:36:10 0 [Note] WSREP: declaring f395a896-9f5f at ssl://172.32.34.166:4567 stable
2021-02-12  3:36:10 0 [Note] WSREP: Node 6984ad01-ba5f state prim
2021-02-12  3:36:10 0 [Note] WSREP: view(view_id(PRIM,6984ad01-ba5f,247) memb {
        6984ad01-ba5f,0
        72d4871e-bc74,0
        936cf414-b980,0
        aac1418b-8e23,0
        ed0197e1-8036,0
        f395a896-9f5f,0
} joined {
} left {
} partitioned {
})
2021-02-12  3:36:10 0 [Note] WSREP: save pc into disk
2021-02-12  3:36:10 0 [Note] WSREP: New COMPONENT: primary = yes, bootstrap = no, my_idx = 2, memb_num = 6
2021-02-12  3:36:10 0 [Note] WSREP: STATE EXCHANGE: Waiting for state UUID.
2021-02-12  3:36:10 0 [Note] WSREP: STATE EXCHANGE: sent state msg: 7322775d-6ce3-11eb-b462-1254a6b23145
2021-02-12  3:36:10 0 [Note] WSREP: STATE EXCHANGE: got state msg: 7322775d-6ce3-11eb-b462-1254a6b23145 from 0 (gal
era-1)
2021-02-12  3:36:10 0 [Note] WSREP: STATE EXCHANGE: got state msg: 7322775d-6ce3-11eb-b462-1254a6b23145 from 2 (gal
era-4)
2021-02-12  3:36:10 0 [Note] WSREP: STATE EXCHANGE: got state msg: 7322775d-6ce3-11eb-b462-1254a6b23145 from 3 (gal
era-1)
2021-02-12  3:36:10 0 [Note] WSREP: STATE EXCHANGE: got state msg: 7322775d-6ce3-11eb-b462-1254a6b23145 from 4 (gal
era-5)
2021-02-12  3:36:10 0 [Note] WSREP: STATE EXCHANGE: got state msg: 7322775d-6ce3-11eb-b462-1254a6b23145 from 5 (gal
era-1)
2021-02-12  3:36:11 0 [Note] WSREP: STATE EXCHANGE: got state msg: 7322775d-6ce3-11eb-b462-1254a6b23145 from 1 (gal
era-6)
2021-02-12  3:36:11 0 [Note] WSREP: Quorum results:
        version    = 4,
        component  = PRIMARY,
        conf_id    = 207,
        members    = 5/6 (joined/total),
        act_id     = 3887411,
        last_appl. = 0,
        protocols  = 0/9/3 (gcs/repl/appl),
        vote policy= 1,
        group UUID = ab5dc5b8-5414-11ea-8dc3-c6f77e135d98
2021-02-12  3:36:11 0 [Note] WSREP: Flow-control interval: [39, 39]
2021-02-12  3:36:11 2 [Note] WSREP: ####### processing CC 3887411, local, unordered
2021-02-12  3:36:11 2 [Note] WSREP: ####### My UUID: 936cf414-6b20-11eb-b980-77e6f18abe33
2021-02-12  3:36:11 2 [Note] WSREP: Cert index reset to ab5dc5b8-5414-11ea-8dc3-c6f77e135d98:3887411 (proto: 9), state t
ransfer needed: no
2021-02-12  3:36:11 0 [Note] WSREP: Service thread queue flushed.
2021-02-12  3:36:11 2 [Note] WSREP: ####### Assign initial position for certification: ab5dc5b8-5414-11ea-8dc3-c6f77e135
d98:3887411, protocol version: 4
2021-02-12  3:36:11 2 [Note] WSREP: REPL Protocols: 9 (4)
2021-02-12  3:36:11 2 [Note] WSREP: ####### Adjusting cert position: 3887411 -> 3887411
2021-02-12  3:36:11 0 [Note] WSREP: Service thread queue flushed.
2021-02-12  3:36:11 2 [Note] WSREP: ================================================
View:
  id: ab5dc5b8-5414-11ea-8dc3-c6f77e135d98:3887411
  status: primary
  protocol_version: 3
  capabilities: MULTI-MASTER, CERTIFICATION, PARALLEL_APPLYING, REPLAY, ISOLATION, PAUSE, CAUSAL_READ, INCREMENTAL_WS, U
NORDERED, PREORDERED, STREAMING, NBO
  final: no
  own_index: 2
  members(6):
        0: 6984ad01-600c-11eb-ba5f-1aad3ac840e1, galera-1
        1: 72d4871e-6ce3-11eb-bc74-862688521489, galera-6
        2: 936cf414-6b20-11eb-b980-77e6f18abe33, galera-4
        3: aac1418b-5f8e-11eb-8e23-ea35715e447b, galera-1
        4: ed0197e1-6ce2-11eb-8036-762de9cf7f42, galera-5
        5: f395a896-600a-11eb-9f5f-ce0b9a1af273, galera-1
=================================================
2021-02-12  3:36:11 2 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
2021-02-12  3:36:11 2 [Note] WSREP: Lowest cert index boundary for CC from group: 3887411
2021-02-12  3:36:11 2 [Note] WSREP: Min available from gcache for CC from group: 3798334
2021-02-12  3:36:11 0 [Note] WSREP: Member 1.0 (galera-6) requested state transfer from 'galera-5'. Selected 4
.0 (galera-5)(SYNCED) as donor.
2021-02-12  3:36:11 0 [Note] WSREP: 4.0 (galera-5): State transfer to 1.0 (galera-6) complete.
2021-02-12  3:36:11 0 [Note] WSREP: Member 4.0 (galera-5) synced with group.
2021-02-12  3:36:13 0 [Note] WSREP: (936cf414-b980, 'ssl://0.0.0.0:4567') turning message relay requesting off
2021-02-12  3:46:11 0 [Note] WSREP: (936cf414-b980, 'ssl://0.0.0.0:4567') turning message relay requesting on, nonlive p
eers: ssl://172.30.33.190:4567
2021-02-12  3:46:11 0 [Note] WSREP: declaring 6984ad01-ba5f at ssl://172.32.37.242:4567 stable
2021-02-12  3:46:11 0 [Note] WSREP: declaring aac1418b-8e23 at ssl://172.32.32.6:4567 stable
2021-02-12  3:46:11 0 [Note] WSREP: declaring ed0197e1-8036 at ssl://172.30.35.61:4567 stable
2021-02-12  3:46:11 0 [Note] WSREP: declaring f395a896-9f5f at ssl://172.32.34.166:4567 stable
2021-02-12  3:46:11 0 [Note] WSREP: forgetting 72d4871e-bc74 (ssl://172.30.33.190:4567)
2021-02-12  3:46:11 0 [Note] WSREP: (936cf414-b980, 'ssl://0.0.0.0:4567') turning message relay requesting off
2021-02-12  3:46:11 0 [Note] WSREP: Node 6984ad01-ba5f state prim
2021-02-12  3:46:11 0 [Note] WSREP: view(view_id(PRIM,6984ad01-ba5f,248) memb {
        6984ad01-ba5f,0
        936cf414-b980,0
        aac1418b-8e23,0
        ed0197e1-8036,0
        f395a896-9f5f,0
} joined {
} left {
} partitioned {
        72d4871e-bc74,0
})
2021-02-12  3:46:11 0 [Note] WSREP: save pc into disk
2021-02-12  3:46:11 0 [Note] WSREP: forgetting 72d4871e-bc74 (ssl://172.30.33.190:4567)
2021-02-12  3:46:11 0 [Note] WSREP: New COMPONENT: primary = yes, bootstrap = no, my_idx = 1, memb_num = 5
2021-02-12  3:46:11 0 [Note] WSREP: STATE EXCHANGE: Waiting for state UUID.
2021-02-12  3:46:11 0 [Note] WSREP: STATE EXCHANGE: sent state msg: d8c9d69e-6ce4-11eb-aab7-6b08dc62fc0c
2021-02-12  3:46:11 0 [Note] WSREP: STATE EXCHANGE: got state msg: d8c9d69e-6ce4-11eb-aab7-6b08dc62fc0c from 0 (gal
era-1)
2021-02-12  3:46:11 0 [Note] WSREP: STATE EXCHANGE: got state msg: d8c9d69e-6ce4-11eb-aab7-6b08dc62fc0c from 1 (gal
era-4)
2021-02-12  3:46:11 0 [Note] WSREP: STATE EXCHANGE: got state msg: d8c9d69e-6ce4-11eb-aab7-6b08dc62fc0c from 2 (gal
era-1)
2021-02-12  3:46:11 0 [Note] WSREP: STATE EXCHANGE: got state msg: d8c9d69e-6ce4-11eb-aab7-6b08dc62fc0c from 3 (gal
era-5)
2021-02-12  3:46:11 0 [Note] WSREP: STATE EXCHANGE: got state msg: d8c9d69e-6ce4-11eb-aab7-6b08dc62fc0c from 4 (gal
era-1)
2021-02-12  3:46:11 0 [Note] WSREP: Quorum results:
        version    = 4,
        component  = PRIMARY,
        conf_id    = 208,
        members    = 5/5 (joined/total),
        act_id     = 3887642,
        last_appl. = 3887561,
        protocols  = 0/9/3 (gcs/repl/appl),
        vote policy= 1,
        group UUID = ab5dc5b8-5414-11ea-8dc3-c6f77e135d98
2021-02-12  3:46:11 0 [Note] WSREP: Flow-control interval: [36, 36]
2021-02-12  3:46:11 12 [Note] WSREP: ####### processing CC 3887642, local, unordered
2021-02-12  3:46:11 12 [Note] WSREP: ####### My UUID: 936cf414-6b20-11eb-b980-77e6f18abe33
2021-02-12  3:46:11 12 [Note] WSREP: Cert index reset to ab5dc5b8-5414-11ea-8dc3-c6f77e135d98:3887642 (proto: 9), state
transfer needed: no
2021-02-12  3:46:11 0 [Note] WSREP: Service thread queue flushed.
2021-02-12  3:46:11 12 [Note] WSREP: ####### Assign initial position for certification: ab5dc5b8-5414-11ea-8dc3-c6f77e13
5d98:3887642, protocol version: 4
2021-02-12  3:46:11 12 [Note] WSREP: REPL Protocols: 9 (4)
2021-02-12  3:46:11 12 [Note] WSREP: ####### Adjusting cert position: 3887642 -> 3887642
2021-02-12  3:46:11 0 [Note] WSREP: Service thread queue flushed.
2021-02-12  3:46:11 12 [Note] WSREP: ================================================
View:
  id: ab5dc5b8-5414-11ea-8dc3-c6f77e135d98:3887642
  status: primary
  protocol_version: 3
  capabilities: MULTI-MASTER, CERTIFICATION, PARALLEL_APPLYING, REPLAY, ISOLATION, PAUSE, CAUSAL_READ, INCREMENTAL_WS, U
NORDERED, PREORDERED, STREAMING, NBO
  final: no
  own_index: 1
  members(5):
        0: 6984ad01-600c-11eb-ba5f-1aad3ac840e1, galera-1
        1: 936cf414-6b20-11eb-b980-77e6f18abe33, galera-4
        2: aac1418b-5f8e-11eb-8e23-ea35715e447b, galera-1
        3: ed0197e1-6ce2-11eb-8036-762de9cf7f42, galera-5
        4: f395a896-600a-11eb-9f5f-ce0b9a1af273, galera-1
=================================================
2021-02-12  3:46:11 12 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
2021-02-12  3:46:11 12 [Note] WSREP: Lowest cert index boundary for CC from group: 3887642
2021-02-12  3:46:11 12 [Note] WSREP: Min available from gcache for CC from group: 3798603
2021-02-12  3:46:16 0 [Note] WSREP:  cleaning up 72d4871e-bc74 (ssl://172.30.33.190:4567)
2 Likes

Sorry if something is missing. I’m having trouble posting the logs.

2 Likes

Your logs went to spam for some reason, I just canceled them in the spam filter and they are published.

1 Like

I see this in the joiner. I do not see anything similar in your donor logs. This tells me that your two nodes cannot communicate. I’m also seeing the same name ‘galera-1’ with multiple UUIDs. Do you have a misconfiguration somewhere?

2 Likes

The naming is a misconfiguration of the first 3 nodes. I have not addressed them at this time since they are scheduled to be decommissioned after the migration is complete. If that might be a cause of the issue then I can address it.

I’m not sure what else to check for communication, as all of the nodes are configured identically for permissions and I have verified ports 3306/4444/4567.

2 Likes

I don’t have the ability to bring the cluster down but I changed galera-6 to use mariabackup instead of rsync as it’s already installed. The messages changed a little, but the donor still seems to come back immediately so I think I may need to dig into that more.

galera-6 (joiner):

2021-02-16 21:10:59 0 [Note] WSREP: Joiner waited 300 sec, extending systemd startup timeout as SSTis not completed
WSREP_SST: [ERROR] Possible timeout in receiving first data from  (20210216 21:10:59.700)
/usr//bin/wsrep_sst_mariabackup: line 631: donor in gtid stage: exit codes: 124: command not found
WSREP_SST: [ERROR] Cleanup after exit with status:127 (20210216 21:10:59.702)
2021-02-16 21:10:59 0 [ERROR] WSREP: Process completed with error: wsrep_sst_mariabackup --role 'joiner' --address '172.30.33.190' --datadir '/var/lib/mysql/' --parent '20957' --mysqld-args --wsrep_start_position=00000000-0000-0000-0000-000000000000:-1: 2 (No such file or directory)
2021-02-16 21:10:59 0 [ERROR] WSREP: Failed to read uuid:seqno and wsrep_gtid_domain_id from joiner script.

galera-5 (donor):

2021-02-16 21:11:09 0 [Note] WSREP: Member 1.0 (use2-galera-6) requested state transfer from 'use2-galera-5'. Selected 4
.0 (use2-galera-5)(SYNCED) as donor.
2021-02-16 21:11:09 0 [Note] WSREP: Shifting SYNCED -> DONOR/DESYNCED (TO: 4145157)
2021-02-16 21:11:09 11 [Note] WSREP: Detected STR version: 1, req_len: 63, req: STRv1
2021-02-16 21:11:09 0 [Note] WSREP: 4.0 (use2-galera-5): State transfer to 1.0 (use2-galera-6) complete.
2021-02-16 21:11:09 0 [Note] WSREP: Shifting DONOR/DESYNCED -> JOINED (TO: 4145157)
2021-02-16 21:11:09 0 [Note] WSREP: Member 4.0 (use2-galera-5) synced with group.
2021-02-16 21:11:09 0 [Note] WSREP: Shifting JOINED -> SYNCED (TO: 4145157)
2021-02-16 21:11:09 2 [Note] WSREP: Server use2-galera-5 synced with group
2 Likes

Hi @Solarbell

It looks like the binary isn’t where the sst script is expecting it - do a which wsrep_sst_mariabackup.sh or a find / -name wsrep_sst_mariabackup* to verify where the script is on your filesystem. The error reads that it can’t find this at the named location /usr/bin/ .
2. It still looks like your logs are still off by 10 seconds

Could you try using setting wsrep_sst_donor on galera-6 and force it to use one of the original nodes galera-1 through -3 ?

1 Like

Line 631 is the middle of the following code segment. It looks like the script is inadvertently trying to run a command of the return code 124:

if [[ ${RC[0]} -eq 124 ]];then
    wsrep_log_error "Possible timeout in receiving first data from "
                    "donor in gtid stage: exit codes: ${RC[@]}"
    exit 32
fi

I set the donor to be galera-1 and found two different error conditions (since it automatically retries):

2021-02-18  3:37:59 2 [Warning] WSREP: Failed to prepare for incremental state transfer: Local state UUID (00000000-0000
-0000-0000-000000000000) does not match group state UUID (ab5dc5b8-5414-11ea-8dc3-c6f77e135d98): 1 (Operation not permit
ted)
         at galera/src/replicator_str.cpp:prepare_for_IST():616. IST will be unavailable.
2021-02-18  3:37:59 0 [Note] WSREP: Member 3.0 (galera-6) requested state transfer from 'galera-1'. Selected 0
.0 (use2-galera-1)(SYNCED) as donor.
2021-02-18  3:37:59 0 [Note] WSREP: Shifting PRIMARY -> JOINER (TO: 4213777)
2021-02-18  3:37:59 2 [Note] WSREP: Requesting state transfer: success, donor: 0
2021-02-18  3:37:59 2 [Note] WSREP: Resetting GCache seqno map due to different histories.
2021-02-18  3:37:59 2 [Note] WSREP: GCache history reset: ab5dc5b8-5414-11ea-8dc3-c6f77e135d98:0 -> ab5dc5b8-5414-11ea-8
dc3-c6f77e135d98:4213777
2021-02-18  3:38:00 0 [Note] WSREP: 0.0 (galera-1): State transfer to 3.0 (galera-6) complete.
2021-02-18  3:38:00 0 [Note] WSREP: Member 0.0 (galera-1) synced with group.
2021-02-18  3:38:01 0 [Note] WSREP: (b11ea3e2-9cb9, 'ssl://0.0.0.0:4567') turning message relay requesting off
2021-02-18  3:38:29 0 [Note] WSREP: Joiner waited 30 sec, extending systemd startup timeout as SSTis not completed
2021-02-18  3:38:59 0 [Note] WSREP: Joiner waited 60 sec, extending systemd startup timeout as SSTis not completed
2021-02-18  3:39:29 0 [Note] WSREP: Joiner waited 90 sec, extending systemd startup timeout as SSTis not completed
2021-02-18  3:39:59 0 [Note] WSREP: Joiner waited 120 sec, extending systemd startup timeout as SSTis not completed
2021-02-18  3:40:29 0 [Note] WSREP: Joiner waited 150 sec, extending systemd startup timeout as SSTis not completed
2021-02-18  3:40:59 0 [Note] WSREP: Joiner waited 180 sec, extending systemd startup timeout as SSTis not completed
2021-02-18  3:41:29 0 [Note] WSREP: Joiner waited 210 sec, extending systemd startup timeout as SSTis not completed
2021-02-18  3:41:59 0 [Note] WSREP: Joiner waited 240 sec, extending systemd startup timeout as SSTis not completed
2021-02-18  3:42:29 0 [Note] WSREP: Joiner waited 270 sec, extending systemd startup timeout as SSTis not completed
2021-02-18  3:42:59 0 [Note] WSREP: Joiner waited 300 sec, extending systemd startup timeout as SSTis not completed
WSREP_SST: [ERROR] Possible timeout in receiving first data from  (20210218 03:42:59.708)
/usr//bin/wsrep_sst_mariabackup: line 631: donor in gtid stage: exit codes: 124: command not found
WSREP_SST: [ERROR] Cleanup after exit with status:127 (20210218 03:42:59.710)
2021-02-18  3:42:59 0 [ERROR] WSREP: Process completed with error: wsrep_sst_mariabackup --role 'joiner' --address '172.
30.33.190' --datadir '/var/lib/mysql/' --parent '17648' --mysqld-args --wsrep_start_position=00000000-0000-0000-0000-000
000000000:-1: 2 (No such file or directory)
2021-02-18  3:42:59 0 [ERROR] WSREP: Failed to read uuid:seqno and wsrep_gtid_domain_id from joiner script.
2021-02-18  3:42:59 3 [Note] WSREP: SST received
2021-02-18  3:42:59 3 [Note] WSREP: SST received: 00000000-0000-0000-0000-000000000000:-1
2021-02-18  3:42:59 0 [Note] WSREP: Joiner monitor thread ended with total time 300 sec

Second:

2021-02-18  3:43:09 2 [Note] WSREP: State transfer required:
        Group state: ab5dc5b8-5414-11ea-8dc3-c6f77e135d98:4213931
        Local state: 00000000-0000-0000-0000-000000000000:-1
2021-02-18  3:43:09 2 [Note] [Debug] WSREP: server galera-6 state change: connected -> joiner
2021-02-18  3:43:09 2 [Note] WSREP: Server status change connected -> joiner
2021-02-18  3:43:09 2 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
2021-02-18  3:43:09 2 [Note] WSREP: wsrep_guess_ip() : 172.30.33.190
2021-02-18  3:43:09 0 [Note] WSREP: Joiner monitor thread started to monitor
2021-02-18  3:43:09 0 [Note] WSREP: Running: 'wsrep_sst_mariabackup --role 'joiner' --address '172.30.33.190' --datadir
'/var/lib/mysql/' --parent '18405' --mysqld-args --wsrep_start_position=00000000-0000-0000-0000-000000000000:-1'
WSREP_SST: [INFO] Streaming with xbstream (20210218 03:43:09.527)
WSREP_SST: [INFO] Using socat as streamer (20210218 03:43:09.529)
WSREP_SST: [INFO] Stale sst_in_progress file: /var/lib/mysql//sst_in_progress (20210218 03:43:09.532)
WSREP_SST: [INFO] Evaluating timeout -k 310 300 socat -u TCP-LISTEN:4444,reuseaddr stdio | mbstream -x; RC=( ${PIPESTATU
S[@]} ) (20210218 03:43:09.555)
2021-02-18  3:43:09 2 [Note] WSREP: Prepared SST request: mariabackup|172.30.33.190:4444/xtrabackup_sst//1
2021-02-18  3:43:09 2 [Warning] WSREP: Failed to prepare for incremental state transfer: Local state UUID (00000000-0000
-0000-0000-000000000000) does not match group state UUID (ab5dc5b8-5414-11ea-8dc3-c6f77e135d98): 1 (Operation not permit
ted)
         at galera/src/replicator_str.cpp:prepare_for_IST():616. IST will be unavailable.
2021-02-18  3:43:09 0 [Note] WSREP: Member 1.0 (galera-6) requested state transfer from 'galera-1'. Selected 0
.0 (use2-galera-1)(SYNCED) as donor.
2021-02-18  3:43:09 0 [Note] WSREP: Shifting PRIMARY -> JOINER (TO: 4213933)
2021-02-18  3:43:09 2 [Note] WSREP: Requesting state transfer: success, donor: 0
2021-02-18  3:43:09 2 [Note] WSREP: Resetting GCache seqno map due to different histories.
2021-02-18  3:43:09 2 [Note] WSREP: GCache history reset: ab5dc5b8-5414-11ea-8dc3-c6f77e135d98:0 -> ab5dc5b8-5414-11ea-8
dc3-c6f77e135d98:4213931
2021-02-18  3:43:09 0 [Note] WSREP: 0.0 (galera-1): State transfer to 1.0 (galera-6) complete.
2021-02-18  3:43:09 0 [Note] WSREP: Member 0.0 (galera-1) synced with group.
2021-02-18  3:43:10 0 [Note] WSREP: (69cdc906-9983, 'ssl://0.0.0.0:4567') turning message relay requesting off
Terminated
WSREP_SST: [ERROR] Removing /tmp/tmp.zICNXlyTfm/xtrabackup_galera_info file due to signal (20210218 03:43:18.743)
WSREP_SST: [ERROR] Error while getting data from donor node:  exit codes: 143 143 (20210218 03:43:18.745)
WSREP_SST: [ERROR] Cleanup after exit with status:32 (20210218 03:43:18.747)
2021-02-18  3:43:18 0 [ERROR] WSREP: Process completed with error: wsrep_sst_mariabackup --role 'joiner' --address '172.
30.33.190' --datadir '/var/lib/mysql/' --parent '18405' --mysqld-args --wsrep_start_position=00000000-0000-0000-0000-000
000000000:-1: 32 (Broken pipe)
2021-02-18  3:43:18 0 [ERROR] WSREP: Failed to read uuid:seqno and wsrep_gtid_domain_id from joiner script.

Now, I have the start timeout set to 600 seconds and the sync seems to be failing out after 300 seconds. Is there another timeout setting somewhere? I also tried the same thing with another node, same result.

2 Likes

Sorry, forgot about one of the questions.

[root@galera-6 my.cnf.d]# which wsrep_sst_mariabackup
/bin/wsrep_sst_mariabackup
2 Likes

On my CentOS 8 system /bin is a symlink to /usr/bin so the script should be found - can you confirm the same on your distribution?

# ls -l / | grep bin | head -1
lrwxrwxrwx.   1 root root    7 may 11  2019 bin -> usr/bin

Also we need the logs from the DONOR because that’s where the SST happens. Both logs shown are from JOINER perspective only.

1 Like