Hi Percona Team,
I am currently using a 3-node Percona XtraDB cluster (A, B, and C). Recently, Node A went down with the error messages provided below. Could you please help me identify whether this issue is related to a network error or a database failure during the IST transfer?
I have attached the log files for all three nodes for your reference. Kindly review them and assist me in determining the root cause of Node A’s failure.
NodeA failure time: 7th January 3:11
Error log:
2025-01-07T03:11:39.981239Z 0 [ERROR] WSREP: got error while reading ist stream: asio.system:104 ec: 104 what: read: Connection reset by peer
2025-01-07T03:11:39.981262Z 0 [ERROR] WSREP: IST didn’t contain all write sets, expected last: 245041306 last received: 245041304
2025-01-07T03:11:39.981344Z 10 [ERROR] WSREP: receiving IST failed, node restart required: IST receiver reported error: 71 (Protocol error)
2025-01-07T03:11:39.981390Z 4 [ERROR] WSREP: receiving IST failed, node restart required: IST receiver reported error: 71 (Protocol error)
2025-01-07T03:11:39.981401Z 4 [Note] WSREP: Closing send monitor…
2025-01-07T03:11:39.981403Z 7 [ERROR] WSREP: receiving IST failed, node restart required: IST receiver reported error: 71 (Protocol error)
2025-01-07T03:11:39.981404Z 4 [Note] WSREP: Closed send monitor.
2025-01-07T03:11:39.981414Z 7 [Note] WSREP: /usr/sbin/mysqld: Terminated.
2025-01-07T03:11:39.981414Z 4 [Note] WSREP: gcomm: terminating thread
2025-01-07T03:11:39.981453Z 3 [ERROR] WSREP: receiving IST failed, node restart required: IST receiver
Error log for node A:
2025-01-07T03:11:37.179727Z 0 [Note] WSREP: Flow-control interval: [100, 100]
2025-01-07T03:11:37.179735Z 0 [Note] WSREP: Received NON-PRIMARY.
2025-01-07T03:11:38.618881Z 0 [Note] WSREP: (5332b2a9, ‘tcp://0.0.0.0:4567’) connection established to 657392be tcp://10.0.1.23:4567
2025-01-07T03:11:39.981239Z 0 [ERROR] WSREP: got error while reading ist stream: asio.system:104 ec: 104 what: read: Connection reset by peer
2025-01-07T03:11:39.981262Z 0 [ERROR] WSREP: IST didn’t contain all write sets, expected last: 245041306 last received: 245041304
2025-01-07T03:11:39.981344Z 10 [ERROR] WSREP: receiving IST failed, node restart required: IST receiver reported error: 71 (Protocol error)
2025-01-07T03:11:39.981390Z 4 [ERROR] WSREP: receiving IST failed, node restart required: IST receiver reported error: 71 (Protocol error)
2025-01-07T03:11:39.981401Z 4 [Note] WSREP: Closing send monitor…
2025-01-07T03:11:39.981403Z 7 [ERROR] WSREP: receiving IST failed, node restart required: IST receiver reported error: 71 (Protocol error)
2025-01-07T03:11:39.981404Z 4 [Note] WSREP: Closed send monitor.
2025-01-07T03:11:39.981414Z 7 [Note] WSREP: /usr/sbin/mysqld: Terminated.
2025-01-07T03:11:39.981414Z 4 [Note] WSREP: gcomm: terminating thread
2025-01-07T03:11:39.981453Z 3 [ERROR] WSREP: receiving IST failed, node restart required: IST receiver reported error: 71 (Protocol error)
2025-01-07T06:13:50.727344Z 0 [Warning] TIMESTAMP with implicit DEFAULT value is deprecated. Please use --explicit_defaults_for_timestamp server option (see documentation for more details).
2025-01-07T06:13:50.727719Z 0 [Warning] ‘NO_ZERO_DATE’, ‘NO_ZERO_IN_DATE’ and ‘ERROR_FOR_DIVISION_BY_ZERO’ sql modes should be used with strict mode. They will be merged with strict mode in a future release.
2025-01-07T06:13:50.728866Z 0 [Note] /usr/sbin/mysqld (mysqld 5.7.41-44-57-log) starting as process 2586052 …
2025-01-07T06:13:50.737298Z 0 [Warning] No argument was provided to --log-bin, and --log-bin-index was not used; so replication may break when this MySQL server acts as a master and has his hostname changed!! Please use ‘–log-bin=o6node21-bin’ to avoid this problem.
2025-01-07T06:13:50.739089Z 0 [Note] InnoDB: PUNCH HOLE support available
2025-01-07T06:13:50.739095Z 0 [Note] InnoDB: Mutexes and rw_locks use GCC atomic builtins
2025-01-07T06:13:50.739097Z 0 [Note] InnoDB: Uses event mutexes
2025-01-07T06:13:50.739099Z 0 [Note] InnoDB: GCC builtin __atomic_thread_fence() is used for memory barrier
Error log for node B:
2025-01-07T03:10:53.259723Z 0 [Note] WSREP: declaring node with index 0 inactive (evs.inactive_timeout)
2025-01-07T03:10:54.260879Z 0 [Note] WSREP: declaring 657392be at tcp://10.0.1.23:4567 stable
2025-01-07T03:10:54.261534Z 0 [Note] WSREP: Node 657392be state primary
2025-01-07T03:10:54.264727Z 0 [Note] WSREP: Current view of cluster as seen by this node
2025-01-07T03:10:54.264737Z 0 [Note] WSREP: Save the discovered primary-component to disk
2025-01-07T03:10:54.265724Z 0 [Note] WSREP: forgetting 5332b2a9 (tcp://10.0.1.21:4567)
2025-01-07T03:10:54.265731Z 0 [Note] WSREP: deleting entry tcp://10.0.1.21:4567
2025-01-07T03:10:54.265733Z 0 [Note] WSREP: New COMPONENT: primary = yes, bootstrap = no, my_idx = 1, memb_num = 2
2025-01-07T03:10:54.265745Z 0 [Note] WSREP: STATE EXCHANGE: Waiting for state UUID.
2025-01-07T03:10:54.265754Z 0 [Note] WSREP: (7ba17c8c, ‘tcp://0.0.0.0:4567’) turning message relay requesting off
2025-01-07T03:10:54.266233Z 0 [Note] WSREP: STATE EXCHANGE: sent state msg: 01bd9dba-cca5-11ef-9302-82abdc4b1687
2025-01-07T03:10:54.266657Z 0 [Note] WSREP: STATE EXCHANGE: got state msg: 01bd9dba-cca5-11ef-9302-82abdc4b1687 from 0 (o6node23)
2025-01-07T03:10:54.266662Z 0 [Note] WSREP: STATE EXCHANGE: got state msg: 01bd9dba-cca5-11ef-9302-82abdc4b1687 from 1 (o6node22)
2025-01-07T03:10:54.266666Z 0 [Note] WSREP: Quorum results:
2025-01-07T03:10:54.266682Z 0 [Note] WSREP: Flow-control interval: [141, 141]
2025-01-07T03:10:54.266765Z 10 [Note] WSREP: REPL Protocols: 9 (4, 2)
2025-01-07T03:10:54.266791Z 10 [Note] WSREP: REPL Protocols: 9 (4, 2)
2025-01-07T03:10:54.266797Z 10 [Note] WSREP: New cluster view: global state: 5332e02d-4f0c-11ef-b257-cfef470d118c:245041306, view# 31: Primary, number of nodes: 2, my index: 1, protocol version 3
Error log for node C:
2025-01-07T03:10:48.725139Z 0 [Note] WSREP: (657392be, ‘tcp://0.0.0.0:4567’) turning message relay requesting on, nonlive peers: tcp://10.0.1.21:4567
2025-01-07T03:10:49.726437Z 0 [Note] WSREP: (657392be, ‘tcp://0.0.0.0:4567’) reconnecting to 5332b2a9 (tcp://10.0.1.21:4567), attempt 0
2025-01-07T03:10:50.759495Z 0 [Note] WSREP: declaring node with index 0 suspected, timeout PT5S (evs.suspect_timeout)
2025-01-07T03:10:50.759548Z 0 [Note] WSREP: declaring node with index 0 inactive (evs.inactive_timeout)
2025-01-07T03:10:51.611819Z 0 [Note] WSREP: (657392be, ‘tcp://0.0.0.0:4567’) connection established to 5332b2a9 tcp://10.0.1.21:4567
2025-01-07T03:10:52.731455Z 0 [Note] WSREP: (657392be, ‘tcp://0.0.0.0:4567’) connection to peer 00000000 with addr tcp://10.0.1.21:4567 timed out, no messages seen in PT3S (gmcast.peer_timeout), socket stats: rtt: 0 rttvar: 250000 rto: 2000000 lost: 1 last_data_recv: 2603341568 cwnd: 1 last_queued_since: 67028151243469275 last_delivered_since: 67028151243469275 send_queue_length: 0 send_queue_bytes: 0
2025-01-07T03:10:54.261131Z 0 [Note] WSREP: declaring 7ba17c8c at tcp://10.0.1.22:4567 stable
2025-01-07T03:10:54.261792Z 0 [Note] WSREP: Node 657392be state primary
2025-01-07T03:10:54.262247Z 0 [Note] WSREP: Current view of cluster as seen by this node
2025-01-07T03:10:54.262294Z 0 [Note] WSREP: Save the discovered primary-component to disk
2025-01-07T03:10:54.264415Z 0 [Note] WSREP: forgetting 5332b2a9 (tcp://10.0.1.21:4567)
2025-01-07T03:10:54.264828Z 0 [Note] WSREP: New COMPONENT: primary = yes, bootstrap = no, my_idx = 0, memb_num = 2
2025-01-07T03:10:54.265367Z 0 [Note] WSREP: STATE_EXCHANGE: sent state UUID: 01bd9dba-cca5-11ef-9302-82abdc4b1687
2025-01-07T03:10:54.266252Z 0 [Note] WSREP: STATE EXCHANGE: sent state msg: 01bd9dba-cca5-11ef-9302-82abdc4b1687
2025-01-07T03:10:54.266720Z 0 [Note] WSREP: STATE EXCHANGE: got state msg: 01bd9dba-cca5-11ef-9302-82abdc4b1687 from 0 (o6node23)
2025-01-07T03:10:54.266770Z 0 [Note] WSREP: STATE EXCHANGE: got state msg: 01bd9dba-cca5-11ef-9302-82abdc4b1687 from 1 (o6node22)
2025-01-07T03:10:54.266795Z 0 [Note] WSREP: Quorum results:
2025-01-07T03:10:54.266812Z 0 [Note] WSREP: Flow-control interval: [141, 141]
2025-01-07T03:10:54.267712Z 2 [Note] WSREP: REPL Protocols: 9 (4, 2)
2025-01-07T03:10:54.267784Z 2 [Note] WSREP: REPL Protocols: 9 (4, 2)
2025-01-07T03:10:54.267812Z 2 [Note] WSREP: New cluster view: global state: 5332e02d-4f0c-11ef-b257-cfef470d118c:245041306, view# 31: Primary, number of nodes: 2, my index: 0, protocol version 3
Thank you in advance for your assistance!
Thanks and Regards,
Rahul Ambekar.