Percona-xtradb-cluster-operator cycles on deleting backup from s3

iadv · November 24, 2021, 2:56pm

Hello,

I am very pleased on operators work so far how backups are handled and everything went ok until second cluster was added.

Current setup:
Openshift 3.11
image: ‘percona/percona-xtradb-cluster-operator:1.8.0’
pxc: image: 'percona/percona-xtradb-cluster:8.0.22-13.1
CRD (PerconaXtraDBCluster) - admin-db and customer-db

What happens is that operator is unable to update backup CRD and performs s3 file deletion again and after that is still unable to update backup CRD.
That loop never stops.
Untill all 10 workers are busy and operator is skipping backup deletion.

Logs from operator:

{"level":"info","ts":1637762709.6179812,"caller":"zapr/zapr.go:69","msg":"Created a new backup job","Namespace":"dev","Name":"xb-cron-dev-admin-db--s3-ionos-20211124140509-13hs9"}
{"level":"info","ts":1637762758.1214314,"caller":"zapr/zapr.go:69","msg":"deleting backup from s3","name":"cron-dev-admin-db--s3-ionos-20211121135515-364e4"}
{"level":"info","ts":1637762783.195445,"caller":"zapr/zapr.go:69","msg":"backup was removed from s3","name":"cron-dev-admin-db--s3-ionos-20211121135515-364e4"}
{"level":"error","ts":1637762783.2017963,"caller":"zapr/zapr.go:128","msg":"failed to update finalizers for backup","backup":"cron-dev-admin-db--s3-ionos-20211121135515-364e4","error":"Operation cannot be fulfilled on perconaxtradbclusterbackups.pxc.percona.com \"cron-dev-admin-db--s3-ionos-20211121135515-364e4\": the object has been modified; please apply your changes to the latest version and try again","stacktrace":"github.com/go-logr/zapr.(*zapLogger).Error\n\t/go/src/github.com/percona/percona-xtradb-cluster-operator/vendor/github.com/go-logr/zapr/zapr.go:128\ngithub.com/percona/percona-xtradb-cluster-operator/pkg/controller/pxcbackup.(*ReconcilePerconaXtraDBClusterBackup).runS3BackupFinalizer\n\t/go/src/github.com/percona/percona-xtradb-cluster-operator/pkg/controller/pxcbackup/controller.go:361"}
{"level":"info","ts":1637762784.1155515,"caller":"zapr/zapr.go:69","msg":"deleting backup from s3","name":"cron-dev-admin-db--s3-ionos-20211121135515-364e4"}

Before adding second cluster in that namespace sometimes i saw such log error about updating backup but it never stayed in loop.

Sergey_Pronin · December 6, 2021, 8:54am

Hello @iadv ,

thank you for submitting this.
We have dropped the support for openshift 3.11 this year as it is quite a legacy. Could you please share full steps to reproduce this issue and we will try to do it on any other kubernetes platform?

iadv · December 6, 2021, 11:25am

Thanks @Sergey_Pronin
I am not aware exactly what happened since first cluster was running fine for quite a while.
And the second cluster was added just recently.

Like I said that previously i noticed that such errors exist that operator is unable to update CRD (I presume to remove s3 finalizer) because of not having the latest version.
But it was never an issue untill the second cluster was added.

I have tried to disable first or second cluster, tried to remove one or the other backup but the issue never resolved.

Current plan is to try to upgrade

operator to percona/pxc-operator/1.9.1 (currently have percona/pxc-operator v0.1.12 - that is 1.8)
pxc to percona/pxc-db v1.9.1 - 8.0.23-14.1 (currently have percona/pxc-db v0.1.17 - 8.0.22-13.1)

What i was looking for is some way to troubleshoot operator about what is triggering that unnecessary delete between initial command and upon successful deletion from s3.

Although there is no way to increase worker count. But i am not sure that it would change anything.

Thiago_Rodines · June 16, 2023, 10:08am

HI!

I’m having exactly the same problem with 1.12.0.

Did you solve that?

Thanks

Sergey_Pronin · June 19, 2023, 8:08am

@Thiago_Rodines is it on openshift 3.11 too?

Thiago_Rodines · June 19, 2023, 10:15am

@Sergey_Pronin , it is on Kubernetes.

Sergey_Pronin · June 19, 2023, 10:28am

@Thiago_Rodines got it. Do you also have two clusters?
Is there any easy way to reproduce the prob?

iadv · June 19, 2023, 10:47am

Hello

We did not solve it with operator, but just created a way around it and implemented needed procedures.

Since main problem was that operator could not update CRD then i created a cron that just patches CRD directly and removes that directive.
After that operator is able to do the rest with that CRD and proceed with further operation.

And since I did not have a process that deleted files from S3, then a lifecycle policy was set up for that bucket.

Thiago_Rodines · June 19, 2023, 10:47am

Yes! It worked perfectly when I had just one cluster and now, that I have 3 clusters, I have problem with backup jobs.

To reproduce you just need to run 2 clusters and configure backup on both.

What I did to “fix” was set an operator for each cluster and now everything is ok.

Topic		Replies	Views
Backup on S3 not being removed Percona Operator for MySQL	6	985	May 28, 2021
Old backup jobs in operator even after uninstallation and namespace deletion Percona XtraDB Cluster 8.x	5	713	November 12, 2021
GCS Cron Backup failed due to CRD issue Percona Operator for MySQL community , mysql , percona , new-release	27	1684	July 5, 2022
Percona XtraDB Operator version 8.0 fail to backup both s3 and local storage Percona Operator for MySQL percona	5	269	September 30, 2024
"Observed a panic in reconciler" on the backup restore request [perconaxtradbclusterrestore.pxc.percona.com] Percona Operator for MySQL mysql , percona , kubernetes	7	446	January 13, 2025

Percona-xtradb-cluster-operator cycles on deleting backup from s3

Related topics