Backup pods fail with "Donor no longer in donor state"

gmautner · August 4, 2023, 2:32pm

Backup pods intermittently fail. When looking at the logs, all of the failed ones have the message:
INFO: Donor no longer in donor state, interrupting script

Than we see a couple of lines ahead: INFO: [SST script] ++ handle_sigterm

And, finally:

Terminating processProcess completed with error: /usr/bin/run_backup.sh: 4 (Interrupted system call)2023-08-04 11:14:14 [ERROR] Backup was finished unsuccessfull

This happens sporadically and randomly like 1 every 30 times. Such pods that terminated with errors linger in etcd (and their corresponding jobs as well).

Not sure if it has anything to do with it but the PXC resource was deployed with Argo CD. No changes have been made to the manifest so there is no reason to believe this has anything to do with the error. Just mentioning.

mzi · September 3, 2023, 7:17pm

We are facing the same problem and we also deployed PXC through Argo CD. The backup itself succeeds after a few tries as the backups are retried by the CronJob operator of kubernetes. We experience failure in about every 4th or 5th backup.

Sergey_Pronin · September 4, 2023, 7:21am

Hey folks.

Thanks for sharing.
Can you please share versions and steps to reproduce this. I have experience with Argo CD, so no need to go to the super detailed level, but at least your setup details would help: k8s version, operator version, backup storage type (S3, GCS, other).

Is the backup that you are taking scheduled? Or is it on demand?

mzi · September 4, 2023, 7:36am

Hi! We are using the following versions:

K8S-Version: v1.24.13

PXC-Operator: 1.13.0 (sha256:c674d63242f1af521edfbaffae2ae02fb8d010c0557a67a9c42d2b4a50db5243)
[Installed through your helm chart version 1.13.1]

PXC-Version: percona/percona-xtradb-cluster:8.0.33-25.1
PXC-Backup-Version: percona/percona-xtradb-cluster-operator:1.13.0-pxc8.0-backup-pxb8.0.32

The backups are scheduled and uploaded to S3 (AWS). On demand backups succeed all the time so far. But we are not doing a lot of them therefore this may not be representative.

If you need any further details feel free to reach out.

gmautner · September 4, 2023, 5:30pm

I’m running Kubernetes version v1.23.6+k3s1 using Civo’s k8s service.

The problems occur during the scheduled backups and we’re using AWS S3 as the backend.

mkuzdzal · November 15, 2023, 10:54am

Got same issue
K8s-version: v1.27.1
PXC-operator: 1.12.0
PXC-version: 8.0.29-21.1
PXC-backup: 1.12.0-pxc8.0-backup

Scheduled backup and uploaded to S3 minio.

Sergey_Pronin · November 16, 2023, 11:19am

Hello all.

Thank you all for sharing this issue.
I confirmed with our team and it seems that there is such a problem. We understand that it leaves Pods in Error state, but backups are still created.
It does not qualify as critical issue, but we will definately look into it (trying to locate the JIRA issue about it).

Please let me know if you do not agree on the criticality here.

Johannes_Wegener · January 2, 2025, 3:57pm

@Sergey_Pronin we face this issue everytime - then a second pod is created which does not fail. Did you find the respective Jira ticket or should I open one?

We use the percona operator chart and db chart version 1.15.1 with the backup container 1.15.1-pxc8.0-backup-pxb8.0.35.

Best regards,
Johannes

RoJe · June 10, 2025, 8:40pm

Greetings, do we have possibility to apply some kind of patch for this error into existing pxc version or we must do whole upgrade to version 1.16.0? I got that error on PXC version 1.13.0

Topic		Replies	Views
Problems with backup Percona Operator for MySQL percona , kubernetes	8	216	September 2, 2024
Backup Error on Xtradb cluster Percona Operator for MySQL mysql	5	1116	March 7, 2021
Operator fails to rejoin crashed nodes to cluster without deleting it manually Percona Operator for MySQL	3	155	December 19, 2024
Error trying to create a backup Percona Operator for MySQL	7	2138	January 27, 2022
PXC cluster fails after single pod failure Percona Operator for MySQL	4	540	March 11, 2024

Backup pods fail with "Donor no longer in donor state"

Related topics