Backup on S3 not being removed

I have a regular backup of the database going to an S3 compatible bucket. However, when the keep number is exceeded, the backup is removed from the Kubernetes cluster, but not removed from S3. is this as designed and we need to do our own housekeeping or should the backup removal also triggered the removal of the same from the S3 bucket?

Thanks.

1 Like

I’ve also noticed this in the Operator logs being repeated over and over(all of them show this):

{“level”:“info”,“ts”:1621444514.6232271,“caller”:“zapr/zapr.go:69”,“msg”:“all workers are busy - skip backup deletion for now”,“backup”:“cron-otk-db-otk-s3-20210504141504-32qpc”,“in progress”:“cron-otk-db-otk-s3-20210517041507-32qpc, cron-otk-db-otk-s3-20210512191505-32qpc, cron-otk-db-otk-s3-20210517061503-32qpc, cron-otk-db-otk-s3-20210515001502-32qpc, cron-otk-db-otk-s3-20210514231506-32qpc, cron-otk-db-otk-s3-20210508211505-32qpc, cron-otk-db-otk-s3-20210515131508-32qpc, cron-otk-db-otk-s3-20210507011509-32qpc, cron-otk-db-otk-s3-20210506161509-32qpc, cron-otk-db-otk-s3-20210516191511-32qpc”}

1 Like

Hello @rtanner , which version of the operator do you run?
The support for retention of backups on S3 was added in 1.8.0 version which was released just recently. Before that it worked only for PVCs.

See [K8SPXC-442] Add support for retention of backups stored on S3 - Percona JIRA for reference.

1 Like

Hi @spronin - the error messages in my second post are from a 1.8 version.

Thanks.

1 Like

Hi @spronin - I spun up a new 1.8 version and began taking hourly backups with a keep number of 6. The operator logs (attached) show it trying to delete from the s3 bucket as it is creating a new backup once we hit the keep limit. Then after 10 delete jobs have started, it starts outputting “zapr/zapr.go:69”,“msg”:“all workers are busy - skip backup deletion for now” constantly.

Based on the logs, it looks like it is hanging trying to do the deletes and the threads never timeout. No errors as to why the threads are stuck. The backups build up because it has a delete s3 finalizer, so I need to patch the backups to remove the finalizer and then go and manually cleanup the s3 backup.

I am using an s3 compatible storage product (not Amazon s3) and have very few, if any, issues doing actual backups.

Please let me know what other information you might need to to help troubleshoot this.

operator.txt (992.3 KB)

Thanks!

1 Like

@rtanner I have created the cluster with backups taken every minute:

    schedule:
      - name: "daily-backup"
        schedule: "* * * * *"
        keep: 5
        storageName: s3-test

In the logs I see the following:

{"level":"info","ts":1622202129.7677152,"caller":"zapr/zapr.go:69","msg":"Created a new backup job","Namespace":"pxc","Name":"xb-cron-cluster1-s3-test-20210528114209-q6fav"}
{"level":"info","ts":1622202159.0833988,"caller":"zapr/zapr.go:69","msg":"deleting backup from s3","name":"cron-cluster1-s3-test-20210528113708-q6fav"}
{"level":"info","ts":1622202168.5467582,"caller":"zapr/zapr.go:69","msg":"backup was removed from s3","name":"cron-cluster1-s3-test-20210528113708-q6fav"}
{"level":"info","ts":1622202189.9975278,"caller":"zapr/zapr.go:69","msg":"Created a new backup job","Namespace":"pxc","Name":"xb-cron-cluster1-s3-test-20210528114309-q6fav"}
{"level":"info","ts":1622202215.7567892,"caller":"zapr/zapr.go:69","msg":"deleting backup from s3","name":"cron-cluster1-s3-test-20210528113808-q6fav"}
{"level":"info","ts":1622202225.0048485,"caller":"zapr/zapr.go:69","msg":"backup was removed from s3","name":"cron-cluster1-s3-test-20210528113808-q6fav"}
{"level":"info","ts":1622202250.0557785,"caller":"zapr/zapr.go:69","msg":"Created a new backup job","Namespace":"pxc","Name":"xb-cron-cluster1-s3-test-20210528114409-q6fav"}
{"level":"info","ts":1622202277.5778854,"caller":"zapr/zapr.go:69","msg":"deleting backup from s3","name":"cron-cluster1-s3-test-20210528113908-q6fav"}
{"level":"info","ts":1622202285.4088395,"caller":"zapr/zapr.go:69","msg":"backup was removed from s3","name":"cron-cluster1-s3-test-20210528113908-q6fav"}

I suspect the finalizer is getting stuck and cannot delete the backup from the S3 you have. If there was an error you would see error in the operator log:

logger.Error(err, "failed to delete backup", "name", cr.Name)

But I do not see the failures in the log you shared, so best guess is it is getting stuck for some reason.
Do you have any issues with manually deleting the files from the S3 when using the same keys as in Operator?

1 Like

@spronin
I use the exact same keys to log in using an S3 client to delete the files, so I do not think the lack of delete ability is the issue. Is there a way to turn on debugging in the operator logs to see if it shows what it is waiting on?

1 Like