No support for Incremental Backup in 1.15.0 version?

Hello There,

I see new released version 1.15.0 PSMDB does not support incremental backup. It only has option to specify physical / logical in the CRD yaml file.

type:
                enum:
                - logical
                - physical
                type: string

So, I am not able to take incremental backups using the tasks like below

 tasks:
    - name: "ten-min-backup"
      enabled: true
      schedule: "*/10 * * * *"
      keep: 24
      storageName: minio
      compressionType: zstd
      type: incremental

Can you guys help me how I can achieve this?

Expecting some responses from the forum

Regards,
Swwapnil Yadav

Hi @Swwapnil_Yadav ,

Currently Percona Operator for MongoDB can do either logical or physical backups only and it does not support Incremental Physical backups. Currently it is in roadmap and it should be available in upcoming releases.

Here is the documentation link for the types of backups supported with operator for PSMDB.

Currently you can use PITR which takes the logical backups of oplog slices at a defined time interval. Kindly go through the documentation for setting pitr.

As of now incremental backups can only be done using PBM CLI.

Regards,
Parag

Thank you so much @Parag_Bhayani for this clarity.

Any idea on when can we expect (approx) the support to incremental backup to be released ?

I might have few more questions around PITR , will post them here once i try PITR implementation.

Thank you so much.

Regards,
Swwapnil Yadav

Hello @Parag_Bhayani ,

PITR backup is getting stored on the same bucket where i am copying the physical backup. Now, when it comes to restore it using the timestamp I am executing the below restore file,

apiVersion: psmdb.percona.com/v1
kind: PerconaServerMongoDBRestore
metadata:
  name: restore-pitr
spec:
  clusterName: mongodb-psmdb-db
  backupName: cron-mongodb-psmdb-db-20231012104500-p7xjr
  pitr:
    type: date
    date: 2023-10-12 09:55:39
#    date: YYYY-MM-DD hh:mm:ss

But i see its status stays in Requested

Spec:
  Backup Name:   cron-mongodb-psmdb-db-20231012104500-p7xjr
  Cluster Name:  mongodb-psmdb-db
  Pitr:
    Date:  2023-10-12 09:55:39
    Type:  date
Status:
  Pbm Name:     2023-10-12T11:34:21.13606856Z
  Pitr Target:  2023-10-12T09:55:39
  State:        requested
Events:         <none>

And i see the backup-agent container gets destroyed. And also the DB goes inaccessible.

Is it like we need to shutdown / disable the PITR before restoring the PITR ?

Please help

You shouldn’t need to do that in the Operator. Can you share some logs from both operator and the db while this stays in requested?

Hey @Ivan_Groenewold ,

Thanks. Below are the logs i see. and then backup-agent container gets destroyed.


================ mongo-operator logs =============

ERROR 2023-10-13T06:21:10.312191705Z [resource.labels.containerName: psmdb-operator] 2023-10-13T06:21:10.312Z INFO Waiting for statefulsets to be ready before restore {"controller": "psmdbrestore-controller", "object": {"name":"restore-pitr","namespace":"obp-cms-portal"}, "namespace": "obp-cms-portal", "name": "restore-pitr", "reconcileID": "df451ea4-0a99-41b6-bb73-75fb533231a8", "ready": false}
ERROR 2023-10-13T06:21:15.313095024Z [resource.labels.containerName: psmdb-operator] 2023-10-13T06:21:15.312Z INFO Setting safe defaults, updating replset size to meet the minimum number of replicas {"controller": "psmdbrestore-controller", "object": {"name":"restore-pitr","namespace":"obp-cms-portal"}, "namespace": "obp-cms-portal", "name": "restore-pitr", "reconcileID": "b4d7a745-56d5-4d7d-a32a-cc83417402a3", "oldSize": 1, "newSize": 3}
ERROR 2023-10-13T06:21:15.314034464Z [resource.labels.containerName: psmdb-operator] 2023-10-13T06:21:15.313Z INFO Waiting for statefulsets to be ready before restore {"controller": "psmdbrestore-controller", "object": {"name":"restore-pitr","namespace":"obp-cms-portal"}, "namespace": "obp-cms-portal", "name": "restore-pitr", "reconcileID": "b4d7a745-56d5-4d7d-a32a-cc83417402a3", "ready": false}

ERROR 2023-10-13T06:21:18.621679745Z [resource.labels.containerName: psmdb-operator] 2023-10-13T06:21:18.621Z ERROR Reconciler error {"controller": "psmdb-controller", "object": {"name":"mongodb-psmdb-db","namespace":"obp-cms-portal"}, "namespace": "obp-cms-portal", "name": "mongodb-psmdb-db", "reconcileID": "a44dfc20-4111-4fc3-a0fa-3995d9f55e28", "error": "reconcile StatefulSet for rs0: failed to run smartUpdate: failed to apply changes: wait pod restart: reach pod wait limit", "errorVerbose": "reconcile StatefulSet for rs0: failed to run smartUpdate: failed to apply changes: wait pod restart: reach pod wait limit\ngithub.com/percona/percona-server-mongodb-operator/pkg/controller/perconaservermongodb.(*ReconcilePerconaServerMongoDB).Reconcile\n\t/go/src/github.com/percona/percona-server-mongodb-operator/pkg/controller/perconaservermongodb/psmdb_controller.go:423\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.16.1/pkg/internal/controller/controller.go:119\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.16.1/pkg/internal/controller/controller.go:316\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.16.1/pkg/internal/controller/controller.go:266\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.16.1/pkg/internal/controller/controller.go:227\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1598"}
ERROR 2023-10-13T06:21:18.621750325Z [resource.labels.containerName: psmdb-operator] sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler
ERROR 2023-10-13T06:21:18.621757035Z [resource.labels.containerName: psmdb-operator] /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.16.1/pkg/internal/controller/controller.go:329
ERROR 2023-10-13T06:21:18.621761635Z [resource.labels.containerName: psmdb-operator] sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
ERROR 2023-10-13T06:21:18.621765495Z [resource.labels.containerName: psmdb-operator] /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.16.1/pkg/internal/controller/controller.go:266
ERROR 2023-10-13T06:21:18.621769455Z [resource.labels.containerName: psmdb-operator] sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2
ERROR 2023-10-13T06:21:18.621773475Z [resource.labels.containerName: psmdb-operator] /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.16.1/pkg/internal/controller/controller.go:227
ERROR 2023-10-13T06:21:18.622825145Z [resource.labels.containerName: psmdb-operator] 2023-10-13T06:21:18.622Z INFO Setting safe defaults, updating replset size to meet the minimum number of replicas {"controller": "psmdb-controller", "object": {"name":"mongodb-psmdb-db","namespace":"obp-cms-portal"}, "namespace": "obp-cms-portal", "name": "mongodb-psmdb-db", "reconcileID": "a5202387-b842-44f1-b323-62a30299e4f4", "oldSize": 1, "newSize": 3}
ERROR 2023-10-13T06:21:18.624365005Z [resource.labels.containerName: psmdb-operator] 2023-10-13T06:21:18.624Z INFO StatefulSet is changed, starting smart update {"controller": "psmdb-controller", "object": {"name":"mongodb-psmdb-db","namespace":"obp-cms-portal"}, "namespace": "obp-cms-portal", "name": "mongodb-psmdb-db", "reconcileID": "a5202387-b842-44f1-b323-62a30299e4f4", "name": "mongodb-psmdb-db-rs0"}


==================== backup-agent logs ========================

ERROR 2023-10-13T06:19:53.356172875Z [resource.labels.containerName: backup-agent] 2023-10-13T06:19:53.000+0000 I [pitr] got done signal, stopping
ERROR 2023-10-13T06:19:56.144466034Z [resource.labels.containerName: backup-agent] 2023-10-13T06:19:56.000+0000 I [pitr] created chunk 2023-10-13T06:15:22 - 2023-10-13T06:19:53
ERROR 2023-10-13T06:19:56.152562769Z [resource.labels.containerName: backup-agent] 2023-10-13T06:19:56.000+0000 I [pitr] pausing/stopping with last_ts 2023-10-13 06:19:53 +0000 UTC
ERROR 2023-10-13T06:21:56.185776225Z [resource.labels.containerName: backup-agent] 2023/10/13 06:21:56 [entrypoint] got terminated, shutting down
ERROR 2023-10-13T06:21:56.186187662Z [resource.labels.containerName: backup-agent] 2023/10/13 06:21:56 [entrypoint] kill `pbm-agent` (37): <nil>

====================== psmdb logs =====================================

ERROR 2023-10-13T06:22:27.464349550Z [resource.labels.containerName: mongod] + log=/tmp/pbm-agent.log
ERROR 2023-10-13T06:22:27.470213081Z [resource.labels.containerName: mongod] + /opt/percona/ps-entry.sh --bind_ip_all --auth --dbpath=/data/db --port=27017 --replSet=rs0 --storageEngine=wiredTiger --relaxPermChecks --sslAllowInvalidCertificates --clusterAuthMode=x509 --enableEncryption --encryptionKeyFile=/etc/mongodb-encryption/encryption-key --wiredTigerCacheSizeGB=0.43 --wiredTigerIndexPrefixCompression=true
ERROR 2023-10-13T06:22:27.470259514Z [resource.labels.containerName: mongod] + /opt/percona/pbm-agent
ERROR 2023-10-13T06:22:27.548080541Z [resource.labels.containerName: mongod] + exec mongod --bind_ip_all --auth --dbpath=/data/db --port=27017 --replSet=rs0 --storageEngine=wiredTiger --relaxPermChecks --clusterAuthMode=x509 --enableEncryption --encryptionKeyFile=/etc/mongodb-encryption/encryption-key --wiredTigerCacheSizeGB=0.43 --wiredTigerIndexPrefixCompression=true --tlsMode preferTLS --tlsCertificateKeyFile /tmp/tls.pem --tlsAllowInvalidCertificates --tlsClusterFile /tmp/tls-internal.pem --tlsCAFile /etc/mongodb-ssl/ca.crt --tlsClusterCAFile /etc/mongodb-ssl-internal/ca.crt
INFO 2023-10-13T06:22:27.613215873Z [resource.labels.containerName: mongod] {"attr":{…}, "c":"NETWORK", "ctx":"-", "id":4913010, "msg":"Certificate information", "s":"I", "t":{…}}

Below is the status of PITR restore

Name:         restore-pitr
Namespace:    obp-cms-portal
Labels:       <none>
Annotations:  <none>
API Version:  psmdb.percona.com/v1
Kind:         PerconaServerMongoDBRestore
Metadata:
  Creation Timestamp:  2023-10-13T06:19:44Z
  Generation:          1
  Managed Fields:
    API Version:  psmdb.percona.com/v1
    Fields Type:  FieldsV1
    fieldsV1:
      f:metadata:
        f:annotations:
          .:
          f:kubectl.kubernetes.io/last-applied-configuration:
      f:spec:
        .:
        f:backupName:
        f:clusterName:
        f:pitr:
          .:
          f:date:
          f:type:
    Manager:      kubectl-client-side-apply
    Operation:    Update
    Time:         2023-10-13T06:19:44Z
    API Version:  psmdb.percona.com/v1
    Fields Type:  FieldsV1
    fieldsV1:
      f:status:
        .:
        f:pbmName:
        f:pitrTarget:
        f:state:
    Manager:         percona-server-mongodb-operator
    Operation:       Update
    Subresource:     status
    Time:            2023-10-13T06:23:26Z
  Resource Version:  95643369
  UID:               ad0bbcfd-e954-491d-a563-a8af40a42337
Spec:
  Backup Name:   cron-mongodb-psmdb-db-20231013061500-xnwlq
  Cluster Name:  mongodb-psmdb-db
  Pitr:
    Date:  2023-10-13 06:00:22
    Type:  date
Status:
  Pbm Name:     2023-10-13T06:23:26.694134951Z
  Pitr Target:  2023-10-13T06:00:22
  State:        requested
Events:         <none>

Appreciate your inputs.

Regards,
Swwapnil Yadav

Hello @Ivan_Groenewold ,

Please let me know if you have any inputs for me here.

Regards,
Swwapnil Yadav

Hey @Ivan_Groenewold ,

When i enabled below configuration, it actually terminates all 3 instances of mongodb stateful sets and creates only one instance / pod.

allowUnsafeConfigurations: true

Not sure what is happening, Please help here.

Hello There,

Any help on this would be very much appreciated.

Regards,
Swwapnil Yadav

@Swwapnil_Yadav thanks for raising it. Just to be clear - you are using PITR + physical backups, right? (cause it is unclear from your CRs).

@Ege_Gunes any thoughts right away on it?

Hello @Sergey_Pronin ,

Yes right, PITR + Physical.

Hi @Swwapnil_Yadav,

When you start a physical restore operator updates the statefulsets to prepare them for physical restore. This involves:

  • Annotating the StatefulSet to prevent psmdb_controller reconciliation.
  • Adding an init container that installs necessary tools for backup and restore.
  • Removing the existing backup-agent container.
  • Appending a volume for backup configuration.
  • Adjusting the primary container’s command, environment variables, and volume mounts for the restore process.

From the logs you shared I see operator failed to update statefulsets (reconcile StatefulSet for rs0: failed to run smartUpdate: failed to apply changes: wait pod restart: reach pod wait limit).

If a pod fails to become ready it can make everything stuck. Do you see any crashes or readiness probe failures in pods?