I have deployed mongo operator version 1.13.0. And we are having intermittent failures in the backup that is stored in S3.
In some cases it has failed due to a problem with the S3 certificate that I have solved, but in most cases, there are times that it works fine and others that it fails.
The backup agent logs do not show any errors and all the files are stored correctly in S3 even though the CRD marks it as failed.
The only error that appears is the following:
2023-05-28T00:15:00.746+0000 done dumping open010.Request (2644976 documents)
2023-05-28T00:15:00.746+0000 dump phase III: the oplog
2023-05-28T00:15:00.746+0000 finishing dump
2023-05-28T00:15:00.746+0000 Mux close namespace open010.Request
2023-05-28T00:15:00.746+0000 Mux finish
2023-05-28T00:15:00.746+0000 mux completed successfully
2023-05-28T00:15:02.000+0000 I [backup/2023-05-28T00:00:21Z] mongodump finished, waiting for the oplog
2023-05-28T00:15:05.000+0000 I [backup/2023-05-28T00:00:21Z] dropping tmp collections
2023-05-28T00:15:08.000+0000 I [backup/2023-05-28T00:00:21Z] mark RS as error waiting for dump done: backup stuck, last beat ts: 1685232241:
2023-05-28T00:15:11.000+0000 D [backup/2023-05-28T00:00:21Z] set balancer on
2023-05-28T00:15:11.000+0000 E [backup/2023-05-28T00:00:21Z] backup: waiting for dump done: backup stuck, last beat ts: 1685232241
2023-05-28T00:15:11.000+0000 D [backup/2023-05-28T00:00:21Z] releasing lock
2023-05-28T00:15:14.000+0000 D [pitr] start_catchup
Percona, software that doesn’t work some of the time™
I am really starting to feel regret that I decided on using Percona. First, there were too many logs generated in v1.13, making the entire thing unusable. Now backups don’t work! PMM Server has to become root, so it can’t run on openshift…
What is this man…
Please test your software, I don’t want to be a beta tester… This is wasting my time
as for backups case - seems there is a race condition that leads to misleading error STATUS. The backups are safe, but status is wrong. We will address this in the next release.
@Sergey_Pronin Hello!
We just got the same issue.
How can we delete old backups manually from s3 in a safe way (without breaking the last backups)?
Which folders/files are safe to delete?