Hi Percona Team,
I am currently using a 4-node Percona XtraDB cluster (A, B, C and D). Recently, Node A went down with the error messages provided below. Could you please help me identify whether this issue is related to a network error or a database failure during the IST transfer?
Please make a note of mysql versions for all 4 nodes:
root@o6node87 ~ # date; mysql --version
Tue 21 Jan 2025 10:04:11 AM CET
mysql Ver 14.14 Distrib 5.7.43-47, for debian-linux-gnu (x86_64) using 8.0
root@o6node84 ~ # date; mysql --version
Tue 21 Jan 2025 10:04:40 AM CET
mysql Ver 14.14 Distrib 5.7.35-38, for debian-linux-gnu (x86_64) using 8.0
root@o6node85 ~ # date; mysql --version
Tue 21 Jan 2025 10:04:45 AM CET
mysql Ver 14.14 Distrib 5.7.35-38, for debian-linux-gnu (x86_64) using 8.0
root@o6node86 ~ # date; mysql --version
Tue 21 Jan 2025 10:04:52 AM CET
mysql Ver 14.14 Distrib 5.7.44-48, for debian-linux-gnu (x86_64) using 8.0
I have attached the log files for all 4 nodes for your reference. Kindly review them and assist me in determining the root cause of Node A’s failure.
Node A:
All cluster nodes were stable till 7:46:07 AM
2025-01-20T07:46:07.954759Z 0 [Note] WSREP: declaring 393253ce at tcp://10.0.1.84:4567 stable
2025-01-20T07:46:07.954775Z 0 [Note] WSREP: declaring a88b2d35 at tcp://10.0.1.85:4567 stable
2025-01-20T07:46:07.954782Z 0 [Note] WSREP: declaring f36e38d5 at tcp://10.0.1.86:4567 stable
2025-01-20T07:46:07.955258Z 0 [Note] WSREP: re-bootstrapping prim from partitioned components
2025-01-20T07:46:07.955917Z 0 [Note] WSREP: Current view of cluster as seen by this node
Node A Failure time: 07:46:08 AM
2025-01-20T07:46:08.965930Z 0 [Note] WSREP: Member 0.0 (o6node84) requested state transfer from ‘any’. Selected 1.0 (o6node85)(SYNCED) as donor.
2025-01-20T07:46:08.965939Z 0 [Note] WSREP: Member 2.0 (o6node87) requested state transfer from ‘any’. Selected 3.0 (o6node86)(SYNCED) as donor.
2025-01-20T07:46:08.965942Z 0 [Note] WSREP: Shifting PRIMARY → JOINER (TO: 3098538301)
2025-01-20T07:46:08.965986Z 9 [Note] WSREP: Requesting state transfer: success after 2 tries, donor: 3
2025-01-20T07:46:08.965998Z 9 [Note] WSREP: GCache history reset: d668a121-632f-11ec-a0b8-0b5f2060536a:3098538174 → d668a121-632f-11ec-a0b8-0b5f2060536a:3098538301
2025-01-20T07:46:08.967106Z 0 [Warning] WSREP: 3.0 (o6node86): State transfer to 2.0 (o6node87) failed: -61 (No data available)
025-01-20T07:46:08.967106Z 0 [Warning] WSREP: 3.0 (o6node86): State transfer to 2.0 (o6node87) failed: -61 (No data available)
2025-01-20T07:46:08.967113Z 0 [ERROR] WSREP: gcs/src/gcs_group.cpp:gcs_group_handle_join_msg():800: State transfer request failed unrecoverably because the donor seqno had gone forward during IST, but SST request was not prepared from our side due to selected state transfer method (which do not supports SST during node operation). Restart required.
2025-01-20T07:46:08.967118Z 0 [Note] WSREP: gcomm: terminating thread
2025-01-20T07:46:08.967121Z 0 [Note] WSREP: gcomm: joining thread
2025-01-20T07:46:08.967428Z 0 [Note] WSREP: gcomm: closing backend
2025-01-20T07:46:08.973333Z 9 [Note] WSREP: GCache DEBUG: RingBuffer::seqno_reset(): full reset
2025-01-20T07:46:08.974104Z 9 [Note] WSREP: Receiving IST: 127 writesets, seqnos 3098538174-3098538301
2025-01-20T07:46:09.968583Z 0 [Note] WSREP: Current view of cluster as seen by this node
view (view_id(NON_PRIM,393253ce,152)
Node B:
All cluster nodes were stable till 7:46:07 AM
2025-01-20T07:46:07.338853Z 22172278 [Note] Got timeout reading communication packets
2025-01-20T07:46:07.954682Z 0 [Note] WSREP: declaring 393253ce at tcp://10.0.1.84:4567 stable
2025-01-20T07:46:07.954694Z 0 [Note] WSREP: declaring ab6d8263 at tcp://10.0.1.87:4567 stable
2025-01-20T07:46:07.954697Z 0 [Note] WSREP: declaring f36e38d5 at tcp://10.0.1.86:4567 stable
2025-01-20T07:46:07.955199Z 0 [Note] WSREP: re-bootstrapping prim from partitioned components
2025-01-20T07:46:07.955903Z 0 [Note] WSREP: Current view of cluster as seen by this node
view (view_id(PRIM,393253ce,152)
Node A failed discovered 7:46:09 AM
2025-01-20T07:46:08.967100Z 0 [Warning] WSREP: 3.0 (o6node86): State transfer to 2.0 (o6node87) failed: -61 (No data available)
2025-01-20T07:46:08.967656Z 0 [Note] WSREP: 1.0 (o6node85): State transfer to 0.0 (o6node84) complete.
2025-01-20T07:46:08.967665Z 0 [Note] WSREP: Shifting DONOR/DESYNCED → JOINED (TO: 3098538301)
2025-01-20T07:46:08.984633Z 0 [Note] WSREP: async IST sender served
2025-01-20T07:46:09.452248Z 0 [Note] WSREP: (a88b2d35, ‘tcp://0.0.0.0:4567’) turning message relay requesting off
2025-01-20T07:46:09.968972Z 0 [Note] WSREP: (a88b2d35, ‘tcp://0.0.0.0:4567’) turning message relay requesting on, nonlive peers: tcp://10.0.1.87:4567
2025-01-20T07:46:09.970000Z 0 [Note] WSREP: declaring 393253ce at tcp://10.0.1.84:4567 stable
2025-01-20T07:46:09.970009Z 0 [Note] WSREP: declaring f36e38d5 at tcp://10.0.1.86:4567 stable
2025-01-20T07:46:09.970013Z 0 [Note] WSREP: forgetting ab6d8263 (tcp://10.0.1.87:4567)
2025-01-20T07:46:09.970028Z 0 [Note] WSREP: (a88b2d35, ‘tcp://0.0.0.0:4567’) turning message relay requesting off
Node C:
All cluster nodes were stable till 7:46:07 AM
2025-01-20T07:46:07.954773Z 0 [Note] WSREP: declaring a88b2d35 at tcp://10.0.1.85:4567 stable
2025-01-20T07:46:07.954784Z 0 [Note] WSREP: declaring ab6d8263 at tcp://10.0.1.87:4567 stable
2025-01-20T07:46:07.954787Z 0 [Note] WSREP: declaring f36e38d5 at tcp://10.0.1.86:4567 stable
2025-01-20T07:46:07.955320Z 0 [Note] WSREP: re-bootstrapping prim from partitioned components
2025-01-20T07:46:07.955914Z 0 [Note] WSREP: Current view of cluster as seen by this node
Node A failed discovered 7:46:09 AM
2025-01-20T07:46:08.967083Z 0 [Warning] WSREP: 3.0 (o6node86): State transfer to 2.0 (o6node87) failed: -61 (No
data available)
2025-01-20T07:46:08.967665Z 0 [Note] WSREP: 1.0 (o6node85): State transfer to 0.0 (o6node84) complete.
2025-01-20T07:46:08.980816Z 6 [Note] WSREP: GCache DEBUG: RingBuffer::seqno_reset(): full reset
2025-01-20T07:46:08.981040Z 6 [Note] WSREP: Receiving IST: 127 writesets, seqnos 3098538174-3098538301
2025-01-20T07:46:08.981109Z 0 [Note] WSREP: Receiving IST… 0.0% ( 0/127 events) complete.
2025-01-20T07:46:08.984395Z 0 [Note] WSREP: Receiving IST…100.0% (127/127 events) complete.
2025-01-20T07:46:08.984510Z 6 [Note] WSREP: IST received: d668a121-632f-11ec-a0b8-0b5f2060536a:3098538301
2025-01-20T07:46:09.083411Z 0 [Note] WSREP: (393253ce, ‘tcp://0.0.0.0:4567’) turning message relay requesting off
2025-01-20T07:46:09.969284Z 0 [Note] WSREP: (393253ce, ‘tcp://0.0.0.0:4567’) turning message relay requesting on, nonlive peers: tcp://10.0.1.87:4567
2025-01-20T07:46:09.969571Z 0 [Note] WSREP: declaring a88b2d35 at tcp://10.0.1.85:4567 stable
2025-01-20T07:46:09.969601Z 0 [Note] WSREP: declaring f36e38d5 at tcp://10.0.1.86:4567 stable
2025-01-20T07:46:09.969616Z 0 [Note] WSREP: forgetting ab6d8263 (tcp://10.0.1.87:4567)
2025-01-20T07:46:09.969614Z 0 [Note] WSREP: Member 3.0 (o6node86) synced with group.
2025-01-20T07:46:09.969664Z 0 [Note] WSREP: (393253ce, ‘tcp://0.0.0.0:4567’) turning message relay requesting off
Node D:
All cluster nodes were stable till 7:46:07 AM
2025-01-20T07:46:07.954732Z 0 [Note] WSREP: declaring 393253ce at tcp://10.0.1.84:4567 stable
2025-01-20T07:46:07.954744Z 0 [Note] WSREP: declaring a88b2d35 at tcp://10.0.1.85:4567 stable
2025-01-20T07:46:07.954747Z 0 [Note] WSREP: declaring ab6d8263 at tcp://10.0.1.87:4567 stable
2025-01-20T07:46:07.955195Z 0 [Note] WSREP: re-bootstrapping prim from partitioned components
2025-01-20T07:46:07.955858Z 0 [Note] WSREP: Current view of cluster as seen by this node
Node A failed discovered 7:46:09 AM
2025-01-20T07:46:08.967176Z 0 [Note] WSREP: Shifting DONOR/DESYNCED → JOINED (TO: 3098538301)
2025-01-20T07:46:08.967723Z 0 [Note] WSREP: 1.0 (o6node85): State transfer to 0.0 (o6node84) complete.
2025-01-20T07:46:09.605775Z 0 [Note] WSREP: (f36e38d5, ‘tcp://0.0.0.0:4567’) turning message relay requesting off
2025-01-20T07:46:09.968786Z 0 [Note] WSREP: (f36e38d5, ‘tcp://0.0.0.0:4567’) turning message relay requesting on, nonlive peers: tcp://10.0.1.87:4567
2025-01-20T07:46:09.970004Z 0 [Note] WSREP: declaring 393253ce at tcp://10.0.1.84:4567 stable
2025-01-20T07:46:09.970013Z 0 [Note] WSREP: declaring a88b2d35 at tcp://10.0.1.85:4567 stable
2025-01-20T07:46:09.970017Z 0 [Note] WSREP: forgetting ab6d8263 (tcp://10.0.1.87:4567)
2025-01-20T07:46:09.970030Z 0 [Note] WSREP: Member 3.0 (o6node86) synced with group.
2025-01-20T07:46:09.970041Z 0 [Note] WSREP: Shifting JOINED → SYNCED (TO: 3098538301)
Thank you in advance for your assistance!
Thanks and Regards,
Rahul Ambekar