Scheduled backups aren't deleted, pitr is not continuous

SlavaUtesinov · July 12, 2023, 8:06am

Hello!

I turned on backups’ stuff alongside pitr, than I created a full (on-demand) backup (2023-07-03T05:56:49Z) as a base backup for pitr, than I set scheduled backups with cron (2023-07-03T07:55:58Z - is the first scheduled backup).

Configuration:

backup:
  enabled: true
  tasks:
    - name: {{name}}
      storageName: main        
      enabled: true
      keep: 2        
      schedule: */5 * * * *      
      type: logical
  pitr:
    enabled: true
    oplogSpanMin: 3
  storages:
    main:
      s3:
        bucket: {{bucket}}
        credentialsSecret: {{credentialsSecret}}
        endpointUrl: {{endpointUrl}}
        prefix: {{prefix}}
        region: {{region}}
      type: s3

After some time I have the following actual result:

Backup snapshots:
  2023-07-03T05:56:12Z <logical> [restore_to_time: 2023-07-03T05:56:49Z]
  2023-07-03T07:55:21Z <logical> [restore_to_time: 2023-07-03T07:55:58Z]
  2023-07-03T08:25:21Z <logical> [restore_to_time: 2023-07-03T08:25:31Z]
  2023-07-03T09:00:41Z <logical> [restore_to_time: 2023-07-03T09:00:49Z]
  2023-07-03T09:15:41Z <logical> [restore_to_time: 2023-07-03T09:15:51Z]
  2023-07-03T09:25:47Z <logical> [restore_to_time: 2023-07-03T09:26:32Z]
  2023-07-03T09:30:56Z <logical> [restore_to_time: 2023-07-03T09:31:07Z]
  2023-07-03T09:40:55Z <logical> [restore_to_time: 2023-07-03T09:41:04Z]

As you can see keep: 2 doesn’t work, because at this case it would have been only three backups (expected result):

Backup snapshots:
  2023-07-03T05:56:12Z <logical> [restore_to_time: 2023-07-03T05:56:49Z]
  2023-07-03T09:30:56Z <logical> [restore_to_time: 2023-07-03T09:36:07Z]
  2023-07-03T09:40:55Z <logical> [restore_to_time: 2023-07-03T09:41:04Z]

At operator logs I see the following errors:
2023-07-03T09:50:40.305Z ERROR failed to run finalizer {"controller": "perconaservermongodbbackup-controller", "object": {"name":"cron-cloud-documentdb-20230703092500-xppp2","namespace":"documentdb40"}, "namespace": "documentdb40", "name": "cron-cloud-documentdb-20230703092500-xppp2", "reconcileID": "0c3ea5d8-0d7a-4930-a6b9-2e12b99da437", "finalizer": "delete-backup", "error": "unable to delete: backup is a base for '2023-07-03T09:26:32 - 2023-07-03T09:41:04'", "errorVerbose": "unable to delete: backup is a base for '2023-07-03T09:26:32 - 2023-07-03T09:41:04'\ngithub.com/percona/percona-backup-mongodb/pbm.(*PBM).probeDelete\n\t/go/pkg/mod/github.com/percona/percona-backup-mongodb@v0.0.0-20221024072933-3ec38a5fc670/pbm/delete.go:63\ngithub.com/percona/percona-backup-mongodb/pbm.(*PBM).DeleteBackup\n\t/go/pkg/mod/github.com/percona/percona-backup-mongodb@v0.0.0-20221024072933-3ec38a5fc670/pbm/delete.go:29\ngithub.com/percona/percona-server-mongodb-operator/pkg/controller/perconaservermongodbbackup.(*ReconcilePerconaServerMongoDBBackup).checkFinalizers\n\t/go/src/github.com/percona/percona-server-mongodb-operator/pkg/controller/perconaservermongodbbackup/perconaservermongodbbackup_controller.go:366\ngithub.com/percona/percona-server-mongodb-operator/pkg/controller/perconaservermongodbbackup.(*ReconcilePerconaServerMongoDBBackup).Reconcile\n\t/go/src/github.com/percona/percona-server-mongodb-operator/pkg/controller/perconaservermongodbbackup/perconaservermongodbbackup_controller.go:177\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.14.4/pkg/internal/controller/controller.go:122\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.14.4/pkg/internal/controller/controller.go:323\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.14.4/pkg/internal/controller/controller.go:274\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.14.4/pkg/internal/controller/controller.go:235\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1594"} github.com/percona/percona-server-mongodb-operator/pkg/controller/perconaservermongodbbackup.(*ReconcilePerconaServerMongoDBBackup).checkFinalizers /go/src/github.com/percona/percona-server-mongodb-operator/pkg/controller/perconaservermongodbbackup/perconaservermongodbbackup_controller.go:368 github.com/percona/percona-server-mongodb-operator/pkg/controller/perconaservermongodbbackup.(*ReconcilePerconaServerMongoDBBackup).Reconcile /go/src/github.com/percona/percona-server-mongodb-operator/pkg/controller/perconaservermongodbbackup/perconaservermongodbbackup_controller.go:177 sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.14.4/pkg/internal/controller/controller.go:122 sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.14.4/pkg/internal/controller/controller.go:323 sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.14.4/pkg/internal/controller/controller.go:274 sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2 /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.14.4/pkg/internal/controller/controller.go:235

Moreover pitr ranges have gaps:

 PITR <on>:
  2023-07-03T05:56:50Z - 2023-07-03T07:50:28Z
  2023-07-03T07:55:59Z - 2023-07-03T08:10:37Z
  2023-07-03T08:25:32Z - 2023-07-03T08:35:31Z
  2023-07-03T09:00:50Z - 2023-07-03T09:05:57Z
  2023-07-03T09:15:52Z - 2023-07-03T09:20:48Z
  2023-07-03T09:26:33Z - 2023-07-03T09:41:04Z

I expected the only one continuous range:

 PITR <on>:
  2023-07-03T05:56:50Z - 2023-07-03T09:41:04Z

At a backup-agent pod I see logs:

2023-07-03T07:53:38.000+0000 E [pitr] streaming oplog: epoch mismatch. Got sleep in {1688370631 9}, woke up in {1688370639 1}. Too old for that stuff.
2023-07-03T07:55:54.000+0000 E [pitr] streaming oplog: get backup start TS: run out of tries

Environment:
percona-server-mongodb-operator:1.14.0
percona-server-mongodb:6.0.4-3
percona-backup-mongodb:2.0.4

P.S. One question aside - what is a proper way to choose specific backup as a base for pitr when I want to make restore?

Thanks!

Sergey_Pronin · July 14, 2023, 7:38am

Hello @SlavaUtesinov ,

thanks for sharing it. Well written and quite easy to understand.

I see that your schedule takes backups every 5 mins. I have a feeling that backups might not even complete when the new one starts and this causes the prob/ Do you have some data in the cluster? If so, how much (GBs)?

I will try to reproduce it this or next week.

SlavaUtesinov · July 14, 2023, 8:26am

Hi, @Sergey_Pronin!

My cluster is absolutely empty. Moreover, if I switch on only scheduled backups with the same rate(cron) but without pitr feature, scheduled backups work as expected and keep parameter is strictly honored.

P.S. I was monitoring the cluster for twenty-four hours to catch this behavior.

Sergey_Pronin · July 14, 2023, 12:30pm

Ah, makes sense. Got it now.

We have a feature request to support PITR retention: [K8SPSMDB-801] Add support for PITR retention - Percona JIRA

We will add it in 1.15 release.
But I’m still confused a bit with the chunk gaps.

Topic		Replies	Views
Pitr restore working different way in two ways Percona Operator for MongoDB	14	695	May 21, 2024
PSMDB 1.15.0 PITR Restore not working Percona Backup for MongoDB percona , mongodb	5	734	January 27, 2024
Unable to restore from pitr backup Percona Operator for MongoDB	2	735	September 26, 2023
Backup planning Percona Backup for MongoDB	3	614	July 11, 2024
Older backups are not getting deleted, with the keep argument specified Percona Operator for MongoDB mongodb , pbm	3	133	March 4, 2025

Scheduled backups aren't deleted, pitr is not continuous

Related topics