Percona Cluster : Issue with node recovery

Hi,

PXC cluster with 5 nodes (mysql-pxc-0,mysql-pxc-1,mysql-pxc-2 ,mysql-pxc-3, mysql-pxc-4 are the pod names )

Operator version: 1.13.0

percona-xtradb-cluster image: percona/percona-xtradb-cluster:8.0.35

kubernetes version: 1.31.3 microk8s

As part of DR activities, 2 nodes were taken down from the cluster for 10mins. On these two nodes, mysql.pxc-1 and mysql.pxc-3 were running. After 10 mins of downtime, the nodes were back up and running. However, the pods mysql-pxc-1 and mysql-pxc-3 didn’t come up as expected. From the logs of mysql-pxc-1, it was noticed that “2025-08-27T12:44:58.626640Z 0 [System] [MY-013172] [Server] Received SHUTDOWN from user . Shutting down mysqld (Version: 8.0.35-27.1).” message was received and the shutdown was initiated. K8s tried to start up the pod again but ended up going in cycle of startup and shutdown process. Please refer to attached logs. Apologies, couldn’t download the log for mysql-pxc-1 but retained the screenshots. Also attached is the log from mysql-pxc-2 pod.

Can you please help with the issue to understand why mysql-pxc-1 received shutdown signal after ~6 mins into the startup?

Thanks - Vithal

mysql-pxc-1.docx (391.1 KB)

mysql-pxc-2 - toshare.log (37.7 KB)

@VithalAkunuri

Thanks for reaching out to us.

The cluster view looks fine around the below time frame.

2025-08-27T12:42:09.140921Z 0 [Note] [MY-000000] [Galera] Current view of cluster as seen by this node
view (view_id(PRIM,3d462ae9-881f,46)
memb {
3d462ae9-881f,0
854f9151-8846,0
950a443d-bafd,0
ac990be3-b8b5,0
}
joined {
}
left {
}
partitioned {
}
)

mysql-pxc-2 is chosen as a donor.

2025-08-27T12:42:20.356502Z 0 [Note] [MY-000000] [Galera] Member 0.0 (mysql-pxc-1) requested state transfer from ‘mysql-pxc-1,mysql-pxc-2,’. Selected 1.0 (mysql-pxc-2)(SYNCED) as donor.

There were some SST errors detected around the period below.

2025-08-27T12:43:16.582360Z 2 [Note] [MY-000000] [Galera] Detected STR version: 1, req_len: 137, req: STRv1
2025-08-27T12:43:16.823651Z 2 [Note] [MY-000000] [Galera] IST request: 3f948b74-7829-11f0-b70b-bec13e31b8f9:73649-84804|ssl://10.1.145.132:4568
2025-08-27T12:43:16.823684Z 2 [Note] [MY-000000] [WSREP] Server status change synced → donor
2025-08-27T12:43:16.823695Z 2 [Note] [MY-000000] [WSREP] wsrep_notify_cmd is not defined, skipping notification.
2025-08-27T12:43:16.823871Z 0 [Note] [MY-000000] [WSREP] Initiating SST/IST transfer on DONOR side (wsrep_sst_xtrabackup-v2 --role ‘donor’ --address ‘10.1.145.132:4444/xtrabackup_sst//1’ --socket ‘/tmp/mysql.sock’ --datadir ‘/var/lib/mysql/’ --basedir ‘/usr/’ --plugindir ‘/usr/lib64/mysql/plugin/’ --defaults-file ‘/etc/my.cnf’ --defaults-group-suffix ‘’ --mysqld-version ‘8.0.35-27.1’ --binlog ‘binlog’ --gtid ‘3f948b74-7829-11f0-b70b-bec13e31b8f9:73649’ --bypass)
2025-08-27T12:43:16.826813Z 2 [Note] [MY-000000] [WSREP] DONOR thread signaled with 0
2025-08-27T12:43:16.930762Z 0 [ERROR] [MY-000000] [WSREP-SST] ******************* ERROR **********************
2025-08-27T12:43:16.930808Z 0 [ERROR] [MY-000000] [WSREP-SST] Missing version string in comparison
2025-08-27T12:43:16.930819Z 0 [ERROR] [MY-000000] [WSREP-SST] left-side: operation:< right-side:2.4.29
2025-08-27T12:43:16.930846Z 0 [ERROR] [MY-000000] [WSREP-SST] ******************* ERROR **********************

However, it seems the node mysql-pxc-1 joins via IST.

2025-08-27T12:43:17.639732Z 0 [Note] [MY-000000] [WSREP-SST] Bypassing SST. Can work it through IST
2025-08-27T12:43:18.985840Z 0 [Note] [MY-000000] [Galera] SST sent: 3f948b74-7829-11f0-b70b-bec13e31b8f9:73649
2025-08-27T12:43:18.985875Z 0 [Note] [MY-000000] [WSREP] Server status change donor -> joined
2025-08-27T12:43:18.985902Z 0 [Note] [MY-000000] [WSREP] wsrep_notify_cmd is not defined, skipping notification.
2025-08-27T12:43:19.074159Z 0 [Note] [MY-000000] [Galera] 1.0 (mysql-pxc-2): State transfer to 0.0 (mysql-pxc-1) complete.
2025-08-27T12:43:19.074191Z 0 [Note] [MY-000000] [Galera] Shifting DONOR/DESYNCED -> JOINED (TO: 85338)

Before you saw that shutdown event:- “2025-08-27T12:44:58.626640Z 0 [System] [MY-013172] [Server] Received SHUTDOWN from user . Shutting down mysqld (Version: 8.0.35-27.1). all i can notice the replicated events related messages

2025-08-27T12:44:37.999967Z 0 [Warning] [MY-000000] [Galera] Failed to report last committed 3f948b74-7829-11f0-b70b-bec13e31b8f9:85072, -110 (Connection timed out)
2025-08-27T12:44:48.288427Z 10 [Note] [MY-000000] [Galera] Processing event queue:... 37.0% (272/735 events) complete.
2025-08-27T12:45:01.453230Z 10 [Note] [MY-000000] [Galera] Processing event queue:... 40.5% (320/791 events) complete.

Can you attach the below information to check the POD status ?

kubectl describe pod mysql-pxc-1
kubectl top pod mysql-pxc-1

Inside the PXC data directory you will notice below Xtrabackup related files. Can you check if they having any specific errors around the period or else you can attach them for a quick review as well ?

innobackup.backup.log
innobackup.prepare.log

It would be worth to check the exact pxc1 error log file. Did you see any interesting pattern there around the issue timelines ?

Thanks for your reply. Please find attached the requested information.

Since the issue was observed on 27th August, hope the logs that I attached has required information to understand further.

Thanks,

Vithal

kubectl describe pod mysql-pxc-1.txt (15.3 KB)

kubectl top pod mysql-pxc-1.txt (104 Bytes)

innobackup.decompress.log (817 Bytes)

innobackup.prepare.log (19.5 KB)

innobackup.move.log (59.9 KB)

innobackup.backup.log (49.8 KB)