Percona Cluster : Issue with node recovery

Hi,

PXC cluster with 5 nodes (mysql-pxc-0,mysql-pxc-1,mysql-pxc-2 ,mysql-pxc-3, mysql-pxc-4 are the pod names )

Operator version: 1.13.0

percona-xtradb-cluster image: percona/percona-xtradb-cluster:8.0.35

kubernetes version: 1.31.3 microk8s

As part of DR activities, 2 nodes were taken down from the cluster for 10mins. On these two nodes, mysql.pxc-1 and mysql.pxc-3 were running. After 10 mins of downtime, the nodes were back up and running. However, the pods mysql-pxc-1 and mysql-pxc-3 didn’t come up as expected. From the logs of mysql-pxc-1, it was noticed that “2025-08-27T12:44:58.626640Z 0 [System] [MY-013172] [Server] Received SHUTDOWN from user . Shutting down mysqld (Version: 8.0.35-27.1).” message was received and the shutdown was initiated. K8s tried to start up the pod again but ended up going in cycle of startup and shutdown process. Please refer to attached logs. Apologies, couldn’t download the log for mysql-pxc-1 but retained the screenshots. Also attached is the log from mysql-pxc-2 pod.

Can you please help with the issue to understand why mysql-pxc-1 received shutdown signal after ~6 mins into the startup?

Thanks - Vithal

mysql-pxc-1.docx (391.1 KB)

mysql-pxc-2 - toshare.log (37.7 KB)

@VithalAkunuri

Thanks for reaching out to us.

The cluster view looks fine around the below time frame.

2025-08-27T12:42:09.140921Z 0 [Note] [MY-000000] [Galera] Current view of cluster as seen by this node
view (view_id(PRIM,3d462ae9-881f,46)
memb {
3d462ae9-881f,0
854f9151-8846,0
950a443d-bafd,0
ac990be3-b8b5,0
}
joined {
}
left {
}
partitioned {
}
)

mysql-pxc-2 is chosen as a donor.

2025-08-27T12:42:20.356502Z 0 [Note] [MY-000000] [Galera] Member 0.0 (mysql-pxc-1) requested state transfer from ‘mysql-pxc-1,mysql-pxc-2,’. Selected 1.0 (mysql-pxc-2)(SYNCED) as donor.

There were some SST errors detected around the period below.

2025-08-27T12:43:16.582360Z 2 [Note] [MY-000000] [Galera] Detected STR version: 1, req_len: 137, req: STRv1
2025-08-27T12:43:16.823651Z 2 [Note] [MY-000000] [Galera] IST request: 3f948b74-7829-11f0-b70b-bec13e31b8f9:73649-84804|ssl://10.1.145.132:4568
2025-08-27T12:43:16.823684Z 2 [Note] [MY-000000] [WSREP] Server status change synced → donor
2025-08-27T12:43:16.823695Z 2 [Note] [MY-000000] [WSREP] wsrep_notify_cmd is not defined, skipping notification.
2025-08-27T12:43:16.823871Z 0 [Note] [MY-000000] [WSREP] Initiating SST/IST transfer on DONOR side (wsrep_sst_xtrabackup-v2 --role ‘donor’ --address ‘10.1.145.132:4444/xtrabackup_sst//1’ --socket ‘/tmp/mysql.sock’ --datadir ‘/var/lib/mysql/’ --basedir ‘/usr/’ --plugindir ‘/usr/lib64/mysql/plugin/’ --defaults-file ‘/etc/my.cnf’ --defaults-group-suffix ‘’ --mysqld-version ‘8.0.35-27.1’ --binlog ‘binlog’ --gtid ‘3f948b74-7829-11f0-b70b-bec13e31b8f9:73649’ --bypass)
2025-08-27T12:43:16.826813Z 2 [Note] [MY-000000] [WSREP] DONOR thread signaled with 0
2025-08-27T12:43:16.930762Z 0 [ERROR] [MY-000000] [WSREP-SST] ******************* ERROR **********************
2025-08-27T12:43:16.930808Z 0 [ERROR] [MY-000000] [WSREP-SST] Missing version string in comparison
2025-08-27T12:43:16.930819Z 0 [ERROR] [MY-000000] [WSREP-SST] left-side: operation:< right-side:2.4.29
2025-08-27T12:43:16.930846Z 0 [ERROR] [MY-000000] [WSREP-SST] ******************* ERROR **********************

However, it seems the node mysql-pxc-1 joins via IST.

2025-08-27T12:43:17.639732Z 0 [Note] [MY-000000] [WSREP-SST] Bypassing SST. Can work it through IST
2025-08-27T12:43:18.985840Z 0 [Note] [MY-000000] [Galera] SST sent: 3f948b74-7829-11f0-b70b-bec13e31b8f9:73649
2025-08-27T12:43:18.985875Z 0 [Note] [MY-000000] [WSREP] Server status change donor -> joined
2025-08-27T12:43:18.985902Z 0 [Note] [MY-000000] [WSREP] wsrep_notify_cmd is not defined, skipping notification.
2025-08-27T12:43:19.074159Z 0 [Note] [MY-000000] [Galera] 1.0 (mysql-pxc-2): State transfer to 0.0 (mysql-pxc-1) complete.
2025-08-27T12:43:19.074191Z 0 [Note] [MY-000000] [Galera] Shifting DONOR/DESYNCED -> JOINED (TO: 85338)

Before you saw that shutdown event:- “2025-08-27T12:44:58.626640Z 0 [System] [MY-013172] [Server] Received SHUTDOWN from user . Shutting down mysqld (Version: 8.0.35-27.1). all i can notice the replicated events related messages

2025-08-27T12:44:37.999967Z 0 [Warning] [MY-000000] [Galera] Failed to report last committed 3f948b74-7829-11f0-b70b-bec13e31b8f9:85072, -110 (Connection timed out)
2025-08-27T12:44:48.288427Z 10 [Note] [MY-000000] [Galera] Processing event queue:... 37.0% (272/735 events) complete.
2025-08-27T12:45:01.453230Z 10 [Note] [MY-000000] [Galera] Processing event queue:... 40.5% (320/791 events) complete.

Can you attach the below information to check the POD status ?

kubectl describe pod mysql-pxc-1
kubectl top pod mysql-pxc-1

Inside the PXC data directory you will notice below Xtrabackup related files. Can you check if they having any specific errors around the period or else you can attach them for a quick review as well ?

innobackup.backup.log
innobackup.prepare.log

It would be worth to check the exact pxc1 error log file. Did you see any interesting pattern there around the issue timelines ?

Thanks for your reply. Please find attached the requested information.

Since the issue was observed on 27th August, hope the logs that I attached has required information to understand further.

Thanks,

Vithal

kubectl describe pod mysql-pxc-1.txt (15.3 KB)

kubectl top pod mysql-pxc-1.txt (104 Bytes)

innobackup.decompress.log (817 Bytes)

innobackup.prepare.log (19.5 KB)

innobackup.move.log (59.9 KB)

innobackup.backup.log (49.8 KB)

@VithalAkunuri

Thanks for sharing the logs.

The resource usage and other status looks fine. There are no abnormal termination messages.

kubectl describe pod mysql-pxc-1
Name: mysql-pxc-1

pxc-init:
Container ID: containerd://f3f7aae27541e1adcdd385d079b6d61021faba99615c25c3ab27f333c9e66c3a
Image: percona/percona-xtradb-cluster-operator:1.13.0
Image ID: docker.io/percona/percona-xtradb-cluster-operator@sha256:c674d63242f1af521edfbaffae2ae02fb8d010c0557a67a9c42d2b4a50db5243
Port:
Host Port:
Command:
/pxc-init-entrypoint.sh
State: Terminated
Reason: Completed
Exit Code: 0
Started: Wed, 27 Aug 2025 15:08:38 +0100
Finished: Wed, 27 Aug 2025 15:08:39 +0100
Ready: True
Restart Count: 0
Limits:
memory: 3G
Requests:
cpu: 500m
memory: 2G

linkerd-init:
Container ID: containerd://7f1881197afc095bbdf53dcf6d75633b976ac415b37f477ffc0847992b8d2c9c
Image: cr.l5d.io/linkerd/proxy-init:v2.4.1
Image ID: cr.l5d.io/linkerd/proxy-init@sha256:e4ef473f52c453ea7895e9258738909ded899d20a252744cc0b9459b36f987ca
Port:
Host Port:
Args:
–ipv6=false
–incoming-proxy-port
4143
–outgoing-proxy-port
4140
–proxy-uid
2102
–inbound-ports-to-ignore
4190,4191,4567,4568
–outbound-ports-to-ignore
4567,4568
State: Terminated
Reason: Completed

  kubectl top pod mysql-pxc-1
NAME          CPU(cores)   MEMORY(bytes)
mysql-pxc-1   80m          1278Mi

The Xtrabackup related logs covering after the issue timelines but i don’t see any errors or suspicious patterns.

2025-08-27T14:10:13.972270-00:00 0 [Note] [MY-011825] [Xtrabackup] recognized server arguments: --innodb_checksum_algorithm=crc32 --innodb_log_checksums=1 --innodb_data_file_path=ibdata1:12M:autoextend --innodb_log_file_size=301989888 --innodb_page_size=16384 --innodb_undo_directory=./ --innodb_undo_tablespaces=2 --server-id=28984560 --innodb_log_checksums=ON --innodb_redo_log_encrypt=0 --innodb_undo_log_encrypt=0
2025-08-27T14:10:13.972470-00:00 0 [Note] [MY-011825] [Xtrabackup] recognized client arguments: --no-version-check=1 --use-memory=2013265920 --prepare=1 --rollback-prepared-trx=1 --xtrabackup-plugin-dir=/usr/bin/pxc_extra/pxb-8.0/lib/plugin --target-dir=/var/lib/mysql//sst-xb-tmpdir
/usr/bin/pxc_extra/pxb-8.0/bin/xtrabackup version 8.0.35-30 based on MySQL server 8.0.35 Linux (x86_64) (revision id: 6beb4b49)
2025-08-27T14:10:13.972501-00:00 0 [Note] [MY-011825] [Xtrabackup] cd to /var/lib/mysql/sst-xb-tmpdir/
2025-08-27T14:10:13.972578-00:00 0 [Note] [MY-011825] [Xtrabackup] This target seems to be not prepared yet.
2025-08-27T14:10:13.975497-00:00 0 [Note] [MY-011825] [Xtrabackup] xtrabackup_logfile detected: size=8585216, start_lsn=(595689630)
2025-08-27T14:10:13.976089-00:00 0 [Note] [MY-011825] [Xtrabackup] using the following InnoDB configuration for recovery:
2025-08-27T14:10:13.976107-00:00 0 [Note] [MY-011825] [Xtrabackup] innodb_data_home_dir = .
2025-08-27T14:10:13.976117-00:00 0 [Note] [MY-011825] [Xtrabackup] innodb_data_file_path = ibdata1:12M:autoextend
2025-08-27T14:10:13.976138-00:00 0 [Note] [MY-011825] [Xtrabackup] innodb_log_group_home_dir = .
2025-08-27T14:10:13.976147-00:00 0 [Note] [MY-011825] [Xtrabackup] innodb_log_files_in_group = 1
2025-08-27T14:10:13.976157-00:00 0 [Note] [MY-011825] [Xtrabackup] innodb_log_file_size = 8585216

2025-08-27T14:10:47.064196-00:00 0 [Note] [MY-011825] [Xtrabackup] Recovered WSREP position: 3f948b74-7829-11f0-b70b-bec13e31b8f9:109852
2025-08-27T14:10:47.064294-00:00 0 [Note] [MY-011825] [Xtrabackup] starting shutdown with innodb_fast_shutdown = 1
2025-08-27T14:10:47.064382-00:00 0 [Note] [MY-012330] [InnoDB] FTS optimize thread exiting.
2025-08-27T14:10:47.800327-00:00 0 [Note] [MY-013072] [InnoDB] Starting shutdown…
2025-08-27T14:10:48.753662-00:00 0 [Note] [MY-013084] [InnoDB] Log background threads are being closed…
2025-08-27T14:10:49.070894-00:00 0 [Note] [MY-012980] [InnoDB] Shutdown completed; log sequence number 603284138
2025-08-27T14:10:49.074424-00:00 0 [Note] [MY-011825] [Xtrabackup] completed OK!

Is that issue resolved now or if you still facing the reoccurring cycle of start/stop ?

Yes, we had the reoccurring cycle of start/stop. To overcome this situation, following steps were performed.

Paused the cluster, removed the PVCs from mysql-pxc1-4 instances ( not mysql-pxc-0 instance to keep the data ) , resumed the cluster and waited for full stateful set to be up and running ( 0-4 instances were started up ).

However, we suspect that the liveness probe initial delay of 300 seconds could be the reason for this “2025-08-27T12:44:58.626640Z 0 [System] [MY-013172] [Server] Received SHUTDOWN from user . Shutting down mysqld (Version: 8.0.35-27.1).” ?

Can you please share your feedback on this issue?

Thanks,

Vithal

Hi @anil.joshi can you please share your inputs on this issue?