Problem when node goes down

hieu_nguyen · June 27, 2022, 9:11am

i was running 3 nodes , i shut down my node 1 , and when i restarted node 1 was down i shut down node2 and node 3 then turn on boostrap with node 3 , start again with node2 , but when starting node 1 again an error occurs

2022-06-27 15:54:19 12660 [Note] WSREP: Member 1.0 (localhost.localdomain) requested state transfer from ‘any’. Selected 0.0 (localhost.localdomain)(SYNCED) as donor.
2022-06-27 15:54:19 12660 [Note] WSREP: Shifting PRIMARY → JOINER (TO: 4)
2022-06-27 15:54:19 12660 [Note] WSREP: Requesting state transfer: success, donor: 0
2022-06-27 15:54:19 12660 [Warning] WSREP: 0.0 (localhost.localdomain): State transfer to 1.0 (localhost.localdomain) failed: -255 (Unknown error 255)
2022-06-27 15:54:19 12660 [ERROR] WSREP: gcs/src/gcs_group.cpp:gcs_group_handle_join_msg():736: Will never receive state. Need to abort.
2022-06-27 15:54:19 12660 [Note] WSREP: gcomm: terminating thread
2022-06-27 15:54:19 12660 [Note] WSREP: gcomm: joining thread
2022-06-27 15:54:19 12660 [Note] WSREP: gcomm: closing backend
2022-06-27 15:54:20 12660 [Note] WSREP: view(view_id(NON_PRIM,aab8cec1,2) memb {
ba9c56fc,0
} joined {
} left {
} partitioned {
aab8cec1,0
})
2022-06-27 15:54:20 12660 [Note] WSREP: view((empty))
2022-06-27 15:54:20 12660 [Note] WSREP: gcomm: closed
2022-06-27 15:54:20 12660 [Note] WSREP: /usr/sbin/mysqld: Terminated.
220627 15:54:20 mysqld_safe mysqld from pid file /var/lib/mysql/localhost.localdomain.pid ended
WSREP_SST: [ERROR] Parent mysqld process (PID:12660) terminated unexpectedly. (20220627 15:54:21.423)
WSREP_SST: [INFO] Joiner cleanup. rsync PID: 12701 (20220627 15:54:21.425)
WSREP_SST: [INFO] Joiner cleanup done. (20220627 15:54:21.929)

hieu_nguyen · June 27, 2022, 10:28am

I want to provide more log

2022-06-27 15:54:18 0 [Warning] TIMESTAMP with implicit DEFAULT value is deprecated. Please use --explicit_defaults_for_timestamp server option (see documentation for more details).
2022-06-27 15:54:18 0 [Note] /usr/sbin/mysqld (mysqld 5.6.30-76.3-56) starting as process 12660 …
2022-06-27 15:54:18 12660 [Note] WSREP: Read nil XID from storage engines, skipping position init
2022-06-27 15:54:18 12660 [Note] WSREP: wsrep_load(): loading provider library ‘/usr/lib64/libgalera_smm.so’
2022-06-27 15:54:18 12660 [Note] WSREP: wsrep_load(): Galera 3.16(r5c765eb) by Codership Oy info@codership.com loaded successfully.
2022-06-27 15:54:18 12660 [Note] WSREP: CRC-32C: using hardware acceleration.
2022-06-27 15:54:18 12660 [Note] WSREP: Found saved state: 00000000-0000-0000-0000-000000000000:-1
2022-06-27 15:54:18 12660 [Note] WSREP: Passing config to GCS: base_dir = /var/lib/mysql/; base_host = 192.168.254.128; base_port = 4567; cert.log_conflicts = no; debug = no; evs.auto_evict = 0; evs.delay_margin = PT1S; evs.delayed_keep_period = PT30S; evs.inactive_check_period = PT0.5S; evs.inactive_timeout = PT15S; evs.join_retrans_period = PT1S; evs.max_install_timeouts = 3; evs.send_window = 4; evs.stats_report_period = PT1M; evs.suspect_timeout = PT5S; evs.user_send_window = 2; evs.view_forget_timeout = PT24H; gcache.dir = /var/lib/mysql/; gcache.keep_pages_count = 0; gcache.keep_pages_size = 0; gcache.mem_size = 0; gcache.name = /var/lib/mysql//galera.cache; gcache.page_size = 128M; gcache.size = 128M; gcomm.thread_prio = ; gcs.fc_debug = 0; gcs.fc_factor = 1.0; gcs.fc_limit = 16; gcs.fc_master_slave = no; gcs.max_packet_size = 64500; gcs.max_throttle = 0.25; gcs.recv_q_hard_limit = 9223372036854775807; gcs.recv_q_soft_limit = 0.25; gcs.sync_donor = no; gmcast.segment = 0; gmcast.version = 0; pc.announce_timeout = PT3S; pc.checksum =
2022-06-27 15:54:18 12660 [Note] WSREP: Service thread queue flushed.
2022-06-27 15:54:18 12660 [Note] WSREP: Assign initial position for certification: -1, protocol version: -1
2022-06-27 15:54:18 12660 [Note] WSREP: wsrep_sst_grab()
2022-06-27 15:54:18 12660 [Note] WSREP: Start replication
2022-06-27 15:54:18 12660 [Note] WSREP: Setting initial position to 00000000-0000-0000-0000-000000000000:-1
2022-06-27 15:54:18 12660 [Note] WSREP: protonet asio version 0
2022-06-27 15:54:18 12660 [Note] WSREP: Using CRC-32C for message checksums.
2022-06-27 15:54:18 12660 [Note] WSREP: backend: asio
2022-06-27 15:54:18 12660 [Note] WSREP: gcomm thread scheduling priority set to other:0
2022-06-27 15:54:18 12660 [Warning] WSREP: access file(/var/lib/mysql//gvwstate.dat) failed(No such file or directory)
2022-06-27 15:54:18 12660 [Note] WSREP: restore pc from disk failed
2022-06-27 15:54:18 12660 [Note] WSREP: GMCast version 0
2022-06-27 15:54:18 12660 [Note] WSREP: (ba9c56fc, ‘tcp://0.0.0.0:4567’) listening at tcp://0.0.0.0:4567
2022-06-27 15:54:18 12660 [Note] WSREP: (ba9c56fc, ‘tcp://0.0.0.0:4567’) multicast: , ttl: 1
2022-06-27 15:54:18 12660 [Note] WSREP: EVS version 0
2022-06-27 15:54:18 12660 [Note] WSREP: gcomm: connecting to group ‘my_centos_cluster’, peer ‘192.168.254.128:,192.168.254.102:,192.168.254.150:’
2022-06-27 15:54:18 12660 [Warning] WSREP: (ba9c56fc, ‘tcp://0.0.0.0:4567’) address ‘tcp://192.168.254.128:4567’ points to own listening address, blacklisting
2022-06-27 15:54:18 12660 [Note] WSREP: (ba9c56fc, ‘tcp://0.0.0.0:4567’) turning message relay requesting on, nonlive peers:
2022-06-27 15:54:18 12660 [Note] WSREP: declaring aab8cec1 at tcp://192.168.254.150:4567 stable
2022-06-27 15:54:18 12660 [Note] WSREP: Node aab8cec1 state prim
2022-06-27 15:54:18 12660 [Note] WSREP: view(view_id(PRIM,aab8cec1,2) memb {
aab8cec1,0
ba9c56fc,0
} joined {
} left {
} partitioned {
})
2022-06-27 15:54:18 12660 [Note] WSREP: save pc into disk
2022-06-27 15:54:18 12660 [Note] WSREP: discarding pending addr without UUID: tcp://192.168.254.102:4567
2022-06-27 15:54:19 12660 [Note] WSREP: gcomm: connected
2022-06-27 15:54:19 12660 [Note] WSREP: Changing maximum packet size to 64500, resulting msg size: 32636
2022-06-27 15:54:19 12660 [Note] WSREP: Shifting CLOSED → OPEN (TO: 0)
2022-06-27 15:54:19 12660 [Note] WSREP: Opened channel ‘my_centos_cluster’
2022-06-27 15:54:19 12660 [Note] WSREP: New COMPONENT: primary = yes, bootstrap = no, my_idx = 1, memb_num = 2
2022-06-27 15:54:19 12660 [Note] WSREP: STATE EXCHANGE: Waiting for state UUID.
2022-06-27 15:54:19 12660 [Note] WSREP: Waiting for SST to complete.
2022-06-27 15:54:19 12660 [Note] WSREP: STATE EXCHANGE: sent state msg: b9ac93c0-f5f6-11ec-b706-7ab2a73219b6
2022-06-27 15:54:19 12660 [Note] WSREP: STATE EXCHANGE: got state msg: b9ac93c0-f5f6-11ec-b706-7ab2a73219b6 from 0 (localhost.localdomain)
2022-06-27 15:54:19 12660 [Note] WSREP: STATE EXCHANGE: got state msg: b9ac93c0-f5f6-11ec-b706-7ab2a73219b6 from 1 (localhost.localdomain)
2022-06-27 15:54:19 12660 [Note] WSREP: Quorum results:
version = 4,
component = PRIMARY,
conf_id = 1,
members = 1/2 (joined/total),
act_id = 4,
last_appl. = -1,
protocols = 0/7/3 (gcs/repl/appl),
group UUID = f6c8c6d1-e0b3-11ec-82be-cfbd0b61dbd0
2022-06-27 15:54:19 12660 [Note] WSREP: Flow-control interval: [23, 23]
2022-06-27 15:54:19 12660 [Note] WSREP: Shifting OPEN → PRIMARY (TO: 4)
2022-06-27 15:54:19 12660 [Note] WSREP: State transfer required:
Group state: f6c8c6d1-e0b3-11ec-82be-cfbd0b61dbd0:4
Local state: 00000000-0000-0000-0000-000000000000:-1
2022-06-27 15:54:19 12660 [Note] WSREP: New cluster view: global state: f6c8c6d1-e0b3-11ec-82be-cfbd0b61dbd0:4, view# 2: Primary, number of nodes: 2, my index: 1, protocol version 3
2022-06-27 15:54:19 12660 [Warning] WSREP: Gap in state sequence. Need state transfer.
2022-06-27 15:54:19 12660 [Note] WSREP: Running: 'wsrep_sst_rsync --role ‘joiner’ --address ‘192.168.254.128’ --datadir ‘/var/lib/mysql/’ --defaults-file ‘/etc/my.cnf’ --defaults-group-suffix ‘’ --parent ‘12660’ ‘’ ’
2022-06-27 15:54:19 12660 [Note] WSREP: Prepared SST request: rsync|192.168.254.128:4444/rsync_sst
2022-06-27 15:54:19 12660 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
2022-06-27 15:54:19 12660 [Note] WSREP: REPL Protocols: 7 (3, 2)
2022-06-27 15:54:19 12660 [Note] WSREP: Service thread queue flushed.
2022-06-27 15:54:19 12660 [Note] WSREP: Assign initial position for certification: 4, protocol version: 3
2022-06-27 15:54:19 12660 [Note] WSREP: Service thread queue flushed.
2022-06-27 15:54:19 12660 [Warning] WSREP: Failed to prepare for incremental state transfer: Local state UUID (00000000-0000-0000-0000-000000000000) does not match group state UUID (f6c8c6d1-e0b3-11ec-82be-cfbd0b61dbd0): 1 (Operation not permitted)
at galera/src/replicator_str.cpp:prepare_for_IST():507. IST will be unavailable.
2022-06-27 15:54:19 12660 [Note] WSREP: Member 1.0 (localhost.localdomain) requested state transfer from ‘any’. Selected 0.0 (localhost.localdomain)(SYNCED) as donor.
2022-06-27 15:54:19 12660 [Note] WSREP: Shifting PRIMARY → JOINER (TO: 4)
2022-06-27 15:54:19 12660 [Note] WSREP: Requesting state transfer: success, donor: 0
2022-06-27 15:54:19 12660 [Warning] WSREP: 0.0 (localhost.localdomain): State transfer to 1.0 (localhost.localdomain) failed: -255 (Unknown error 255)
2022-06-27 15:54:19 12660 [ERROR] WSREP: gcs/src/gcs_group.cpp:gcs_group_handle_join_msg():736: Will never receive state. Need to abort.
2022-06-27 15:54:19 12660 [Note] WSREP: gcomm: terminating thread
2022-06-27 15:54:19 12660 [Note] WSREP: gcomm: joining thread
2022-06-27 15:54:19 12660 [Note] WSREP: gcomm: closing backend
2022-06-27 15:54:20 12660 [Note] WSREP: view(view_id(NON_PRIM,aab8cec1,2) memb {
ba9c56fc,0
} joined {
} left {
} partitioned {
aab8cec1,0
})
2022-06-27 15:54:20 12660 [Note] WSREP: view((empty))
2022-06-27 15:54:20 12660 [Note] WSREP: gcomm: closed
2022-06-27 15:54:20 12660 [Note] WSREP: /usr/sbin/mysqld: Terminated.
220627 15:54:20 mysqld_safe mysqld from pid file /var/lib/mysql/localhost.localdomain.pid ended
WSREP_SST: [ERROR] Parent mysqld process (PID:12660) terminated unexpectedly. (20220627 15:54:21.423)
WSREP_SST: [INFO] Joiner cleanup. rsync PID: 12701 (20220627 15:54:21.425)
WSREP_SST: [INFO] Joiner cleanup done. (20220627 15:54:21.929)

CTutte · June 28, 2022, 4:35pm

Hi hieu_nguyen,

What versions are all the nodes ?
Note that node failing to join is 5.6.30 which is ~6 years old and already reached end of life.
Possible reasons for failing include 1) a bug, 2) cluster being bootstrapped by a node with version 5.7 thus not possible for 5.6 to join, 3) some of the communication ports are blocked

I suggest you double check versions and ports, and as a last resort remove grastate.dat file from 5.6 data dir. Doing the latter will force a clean SST from the node

Also keep in mind that all node versions should be same major version (or cluster bootstraped by the oldest version of the nodes). Ideally all nodes should have same version to avoid conflicts

Regards

Topic		Replies	Views
Failed to rejoin the cluster Percona XtraDB Cluster 5.x	2	1295	May 21, 2016
Node Shutdown after start Percona XtraDB Cluster 5.x	12	3811	August 4, 2015
Failed to start/join new node (Second node in cluster) Percona XtraDB Cluster 5.x	0	1254	February 24, 2016
node crashed and i am unable to recover Percona XtraDB Cluster 5.x	2	1179	May 21, 2016
Node is not starting 5.6 Percona Server for MySQL 5.6	8	978	April 10, 2023

Problem when node goes down

Related topics