MongoDB Restore Failing

Description:

I’ve been trying to get the Percona Operator for MongoDB running in our production environment. So far, I’ve migrated all data from Atlas to our hosted DB using Percona operator. I encountered a few issues setting up backups to S3 but that’s also working perfectly now. The issue being primarily this one which I resolved by disabling TLS.

The final step before I push this to production would be to ensure restoration of backups to another cluster also works correctly. This is where I’ve encountered issue after issue.

The first issue I encountered was {"Error": "Backup '2025-01-23T22:00:21Z' not found"}. I resolved this by manually inserting the <backup-name>.pbm.json (with the appropriate S3 credentials & fixing the timestamps) into the admin.pbmBackups collection.

After this, the backup ran (running state) for a few minutes (6m30s almost always), after which it went into the error state, with this error:

prepare data: connect to mongo: mongo failed with [F] Fatal assertion / 2025-01-24T15:12:16.944+00:00, connect err: ping: server selection error: server selection timeout, current topology: { Type: Single, Servers: [{ Addr: localhost:28030, Type: Unknown, Last error: dial tcp [::1]:28030: connect: connection refused }, ] }

All this while, the rs0 pod has 100% memory usage & its PV’s used space continues growing. I let this happen for a few hours, eventually the usage reached 1.1TB/1.5TB after which it suddenly dropped to 10GB. It stayed like this for another 2 hours, after which I shut this pod down.

Steps to Reproduce:

The cluster to restore’s config:

apiVersion: psmdb.percona.com/v1
kind: PerconaServerMongoDB
metadata:
  name: mongodb-2
  namespace: mongodb
spec:
  crVersion: 1.19.0
  image: percona/percona-server-mongodb:8.0.4-1-multi
  tls:
    mode: disabled
  unsafeFlags:
    tls: true
    replsetSize: true
    mongosSize: true
  upgradeOptions:
    apply: disabled
    schedule: "0 2 * * *"
  secrets:
    users: mongodb-2
  replsets:
  - name: rs0
    size: 1
    affinity:
      antiAffinityTopologyKey: 'none'
    # if using even number of nodes, set arbiter to true
    arbiter:
      enabled: false
      size: 1
    configuration: |
      replication:
        # 25GB
        oplogSizeMB: 25800 
    podDisruptionBudget:
      maxUnavailable: 1
    resources:
      limits:
        cpu: 8
        memory: 16Gi
      requests:
        cpu: 4
        memory: 16Gi
    volumeSpec:
      persistentVolumeClaim:
        resources:
          requests:
            storage: 1500Gi
        storageClassName: gp2-xfs

  sharding:
    enabled: true

    configsvrReplSet:
      size: 1
      resources:
        limits:
          cpu: 2
          memory: 4Gi
        requests:
          cpu: 1
          memory: 2Gi
      volumeSpec:
        persistentVolumeClaim:
          resources:
            requests:
              storage: 25Gi
          storageClassName: gp2-xfs
          
    mongos:
      size: 1
  backup:
    enabled: true
    image: percona/percona-backup-mongodb:2.8.0-multi
    pitr:
      enabled: false
      oplogOnly: false
    resources:
      limits:
        cpu: 2
        memory: 8Gi
      requests:
        cpu: 500m
        memory: 2Gi
    storages:
      s3-bkp:
        type: s3
        s3:
          bucket: mongodb-backup
          region: ap-east-1
          credentialsSecret: mongodb-backups

The restore:

apiVersion: psmdb.percona.com/v1
kind: PerconaServerMongoDBRestore
metadata:
  name: mongodb-2-restore
  namespace: mongodb
spec:
  clusterName: mongodb-2
  storageName: s3-bkp
  # specify S3 bucket name & source
  backupSource:
    type: physical
    # the file path is easy to find, either from the S3 bucket
    # or via the `describe psmdb-backup` command
    destination: s3://mongodb-backup/2025-01-23T22:00:21Z

Here’s the cluster’s config that produced the backup (was a scheduled backup). It was also upgraded from MongoDB 7 to 8, backup was made after the upgrade. Followed the instructions here. This cluster works perfectly.

apiVersion: psmdb.percona.com/v1
kind: PerconaServerMongoDB
metadata:
  name: mongodb
  namespace: mongodb
spec:
  crVersion: 1.19.0
  image: percona/percona-server-mongodb:8.0.4-1-multi
  tls:
    mode: disabled
  unsafeFlags:
    tls: true
    replsetSize: true
    mongosSize: true
  upgradeOptions:
    apply: disabled
    schedule: "0 2 * * *"
  secrets:
    users: mongodb
  replsets:
  - name: rs0
    size: 1
    affinity:
      antiAffinityTopologyKey: 'none'
    # if using even number of nodes, set arbiter to true
    arbiter:
      enabled: false
      size: 1
    configuration: |
      replication:
        # 25GB
        oplogSizeMB: 25800 
    podDisruptionBudget:
      maxUnavailable: 1
    resources:
      limits:
        cpu: 8
        memory: 16Gi
      requests:
        cpu: 8
        memory: 16Gi
    volumeSpec:
      persistentVolumeClaim:
        resources:
          requests:
            storage: 2000Gi
        storageClassName: gp2-xfs

  sharding:
    enabled: true

    configsvrReplSet:
      size: 1
      resources:
        limits:
          cpu: 2
          memory: 4Gi
        requests:
          cpu: 1
          memory: 2Gi
      volumeSpec:
        persistentVolumeClaim:
          resources:
            requests:
              storage: 25Gi
          storageClassName: gp2-xfs
          
    mongos:
      size: 1
  users:
  - name: service-im
    db: instant-messaging
    passwordSecretRef: 
      name: service-im-password
      key: password
    roles:
      - name: dbOwner
        db: instant-messaging
      - name: clusterMonitor
        db: admin
  backup:
    enabled: true
    image: percona/percona-backup-mongodb:2.8.0-multi
    pitr:
      enabled: true
      oplogOnly: true
    storages:
      s3-bkp:
        type: s3
        s3:
          bucket: mongodb-backup
          region: ap-east-1
          credentialsSecret: mongodb-backups
    resources:
      limits:
        cpu: 2
        memory: 8Gi
      requests:
        cpu: 500m
        memory: 2Gi
    tasks:
    - name: "mongodb-backup"
      enabled: true
      # At 0 minutes past the hour, every 12 hours, starting at 10:00 AM
      # https://crontab.cronhub.io
      schedule: "0 10/12 * * *"
      keep: 8
      type: physical
      storageName: s3-bkp

Version:

Operator: 1.19.0
MongoDB: 8.0.4

Logs:

Full logs of the restore cluster after startup to error message.
mongodb-2-rs0.log (154.3 KB)

Expected Result:

Well, I expect the restore to complete smoothly

Actual Result:

The errors keep occurring as outlined in the description

Additional Information:

Do note, the backup is sizeable – 1.1TB

Okay so finally found what the issue was. I had to pass in the encryption keys of the original cluster:

secrets:
    users: mongodb
    encryptionKey: mongodb-mongodb-encryption-key
    # "mongodb" being the name of the original cluster

I couldn’t find any information about this in the backups/restore documentation. Would’ve been nice to know that the original encryption was required here.

The first issue is still there though, would be great to have a resolution to this as it’s quite inconvenient when restoring:

The first issue I encountered was {"Error": "Backup '2025-01-23T22:00:21Z' not found"} . I resolved this by manually inserting the <backup-name>.pbm.json (with the appropriate S3 credentials & fixing the timestamps) into the admin.pbmBackups collection.