Cross-cluster restore fails with "backup meta: not found" error despite metadata existing in S3

Description:

When attempting to restore an incremental backup using Percona Server MongoDB Operator in a cross-cluster restore scenario, the restore operation runs for a period of time, then the status becomes empty, followed by an error: “failed to validate backup: validate backup in metadata: get backup meta: not found”.

Cross-cluster setup:

  • Original cluster: Backup was created automatically by scheduled tasks

  • New cluster: Fresh deployment attempting to restore from the backup

The backup files actually exist in S3 storage:

  • Backup data path: s3://test/2026-01-13T04:45:00Z/

  • Metadata file: s3://test/2026-01-13T04:45:00Z.pbm.json

However, the Operator in the new cluster appears unable to correctly locate and read the backup metadata created by the original cluster, causing the restore operation to fail.

Steps to Reproduce:

1.In the original cluster:

  • Deploy Percona Server MongoDB Operator with S3-compatible storage (Cloudflare R2)

  • Configure incremental backup tasks (schedule: "0 * * * ")

  • Wait for backups to complete successfully

  • Verify backup files and metadata exist in S3

2.In the new cluster:

  • Deploy Percona Server MongoDB Operator (fresh installation)

  • Configure the same S3 storage credentials and settings

  • Create PerconaServerMongoDBRestore resource with the following configuration:

   spec:
     clusterName: mongodb
     backupSource:
       type: incremental
       destination: s3://test/2026-01-13T04:45:00Z/
       s3:
         credentialsSecret: mongodb-backup-r2
         endpointUrl: https://{account_id}.r2.cloudflarestorage.com
         region: auto
         bucket: test

3.Apply the restore configuration

4.Observe the restore status, which eventually fails with the error

Version:

  • Percona Server MongoDB Image: percona/percona-server-mongodb:8.0.12-4

  • Percona Backup MongoDB Image: percona/percona-backup-mongodb:2.12.0

  • Kubernetes Operator: psmdb-operator 1.21.2 (installed via Helm chart)

  • Storage Backend: Cloudflare R2 (S3 compatible)

Logs:

Pbm Status

[mongodb@mongodb-test-0 /]$ pbm status
Cluster:
========
test:
  - mongodb-test-0.mongodb-test.data-system.svc.cluster.local:27017 [P]: pbm-agent [v2.12.0] OK
  - mongodb-test-1.mongodb-test.data-system.svc.cluster.local:27017 [S]: pbm-agent [v2.12.0] OK
  - mongodb-test-2.mongodb-test.data-system.svc.cluster.local:27017 [S]: pbm-agent [v2.12.0] OK


PITR incremental backup:
========================
Status [OFF]

Currently running:
==================
(none)

Backups:
========
S3 auto https://{account_id}.r2.cloudflarestorage.com/test
  Snapshots:
    2026-01-13T05:00:00Z 32.26MB <incremental> success [restore_to_time: 2026-01-13T05:00:06]
    2026-01-13T04:45:00Z 32.35MB <incremental> success [restore_to_time: 2026-01-13T04:45:05]
    2026-01-13T04:30:16Z 32.22MB <incremental> success [restore_to_time: 2026-01-13T04:30:23]
    2026-01-13T04:15:00Z 32.30MB <incremental> success [restore_to_time: 2026-01-13T04:15:06]
    2026-01-13T04:00:00Z 32.23MB <incremental> success [restore_to_time: 2026-01-13T04:00:05]
    2026-01-13T03:45:00Z 76.19MB <incremental> success [restore_to_time: 2026-01-13T03:45:04]
    2026-01-13T03:30:00Z 76.14MB <incremental> success [restore_to_time: 2026-01-13T03:30:05]
    2026-01-13T03:15:00Z 94.36MB <incremental> success [restore_to_time: 2026-01-13T03:15:04]
    2026-01-13T03:00:00Z 78.91MB <incremental> success [restore_to_time: 2026-01-13T03:00:05]
    2026-01-13T02:45:00Z 93.57MB <incremental> success [restore_to_time: 2026-01-13T02:45:05]
    2026-01-13T02:30:00Z 78.21MB <incremental> success [restore_to_time: 2026-01-13T02:30:07]
    2026-01-13T02:15:00Z 31.92MB <incremental> success [restore_to_time: 2026-01-13T02:15:04]
    2026-01-13T02:00:06Z 134.11MB <incremental, base> success [restore_to_time: 2026-01-13T02:00:10]

Restore
Error: failed to validate backup: validate backup in metadata: get backup meta: not found

Expected Result:

The Operator should be able to:

  1. Read the backup metadata file (.pbm.json) from S3 storage, regardless of which cluster created the backup

  2. Validate the backup integrity

  3. Successfully execute the incremental backup restore operation in the new cluster

  4. The Restore CR status should display restore progress and eventually complete

Cross-cluster restore should be supported as this is a common disaster recovery and migration scenario.

Actual Result:

  1. The Restore CR status becomes empty after running for some time

  2. Subsequently fails with error: “failed to validate backup: validate backup in metadata: get backup meta: not found”

  3. The restore operation fails in the new cluster, even though both the backup files and metadata file are confirmed to exist in S3 storage

Additional Information:

  • Scenario: Cross-cluster restore - backup created in one Kubernetes cluster, restore attempted in a different (new) cluster

  • Using S3-compatible storage (Cloudflare R2), not AWS S3

  • Backup was created automatically by the Operator’s scheduled task in the original cluster

  • Both clusters use the same S3 storage configuration and credentials

  • Metadata file naming format: {backup-timestamp}.pbm.json

  • The destination path format in restore.yaml: s3://bucket-name/backup-timestamp/

  • Potential issue: The Operator may be storing cluster-specific information in metadata or using cluster-specific paths, preventing cross-cluster restores

Suggested investigation:

  • Whether the metadata contains cluster-specific identifiers that prevent cross-cluster restore

  • The path logic used by the Operator when validating backups from different clusters

  • Whether cross-cluster restore is officially supported

Hi, have you followed the steps in documentation for restoring on a new cluster? there are some extra things to do like making sure user passwords and the encryption key match.

Thank you for pointing that out. I need to clarify:

I did ensure the user passwords match by creating the secret with identical values in both clusters:

kubectl create secret generic mongodb-users \
  -n data-system \
  --from-literal=MONGODB_ADMIN_PASSWORD={same-password} \
  --from-literal=MONGODB_CLUSTER_ADMIN_PASSWORD={same-cluster-password} \
  --from-literal=MONGODB_BACKUP_PASSWORD={same-backup-password}

However, I did NOT use the same encryption key in the new cluster. I created a different encryption key secret.This is likely the cause of the issue since I’m restoring from an incremental backup.

Before I recreate the cluster with the matching encryption key, could you confirm:Is my user passwords setup correct as shown above?

If you are creating secrets manually, make sure to follow instructions here

I run the following two commands before creating both the old and new clusters to ensure that the two clusters use the same users-password and encryption-key:

kubectl create secret generic mongodb-users \
  -n data-system \
  --from-literal=MONGODB_BACKUP_PASSWORD={backup password} \
  --from-literal=MONGODB_BACKUP_USER=backup \
  --from-literal=MONGODB_CLUSTER_ADMIN_PASSWORD={cluster admin password} \
  --from-literal=MONGODB_CLUSTER_ADMIN_USER=clusterAdmin \
  --from-literal=MONGODB_CLUSTER_MONITOR_PASSWORD={cluster monitor password} \
  --from-literal=MONGODB_CLUSTER_MONITOR_USER=clusterMonitor \
  --from-literal=MONGODB_DATABASE_ADMIN_PASSWORD={database admin password} \
  --from-literal=MONGODB_DATABASE_ADMIN_USER=databaseAdmin \
  --from-literal=MONGODB_USER_ADMIN_PASSWORD={user admin password} \
  --from-literal=MONGODB_USER_ADMIN_USER=userAdmin
kubectl create secret generic mongodb-encryption-key \
  -n data-system \
  --from-literal=encryption-key={key}

I also reference these two manually created Secrets in the cluster creation .yaml:

secrets:
  users: mongodb-users
  encryptionKey: mongodb-encryption-key

However, I still encounter the same error.

I tested backup and restore within the same cluster, and it works correctly.
Therefore, I suspect the issue might be related to what you mentioned earlier.

However, even after setting the same users-password and encryption-key, the behavior is still different.

Could you please help me understand what I might be missing?
I’ve been working on this for a full day and still haven’t been able to resolve the issue

I also added the approach you mentioned:

kubectl create secret generic mongodb-users-password \
  -n data-system \
  --from-literal=password={password}

.yaml

passwordSecretRef:
  name: mongodb-users-password
  key: password

However, the same issue still persists.

I also performed a cross-cluster restore like you and encountered many errors, for reasons I didn’t know. Finally, I discovered that I was using two encryption keys in two different clusters (because when declaring a secrets.encryptionKey, if the secret key doesn’t exist, the system will automatically create it). I created an encryptionKey as shown below and successfully performed a cross-cluster restore.

File encryption-key.yaml:

apiVersion: v1
kind: Secret
metadata:
name: mongodb-encryption-key
type: Opaque
data:

#Name of the key file MONGODB will read. It must be ‘encryption-key’.

encryption-key: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

Command apply encryption-key for 2 clusters:

kubectl -n namespace apply -f ./encryption-key.yaml

Thank you for your response, but I did indeed use the same encryption key. I used the following command:

kubectl create secret generic mongodb-encryption-key \
  -n data-system \
  --from-literal=encryption-key={key}

And in the cluster .yaml, the secret is referenced as follows:

secrets:
  users: mongodb-users
  encryptionKey: mongodb-encryption-key

Hi, after doing restore process. If the result is failed, you should exec into pods cfg and rs of container mongod, then check error in log files /data/db/pbm.restore.log, /data/db/pbm-restore-logs/. You will get more detail errors or you upload them here, me will try to check error

Thank you for your response. However, the paths you mentioned, /data/db/pbm.restore.log and /data/db/pbm-restore-logs/, do not exist. I also checked the relevant logs from mongod and the backup agent, but did not find any related information. However, I can provide some logs for your reference.

mongodb-test-0:backup-agent

2026-01-15T03:36:01.000+0000 I starting PITR routine                                                                                                        
2026-01-15T03:36:01.000+0000 I node: test/mongodb-test-0.mongodb-test.data-system.svc.cluster.local:27017                                                         
2026-01-15T03:36:01.000+0000 I conn level ReadConcern: majority; WriteConcern: majority                                                                     
2026-01-15T03:36:01.000+0000 E [agentCheckup] check storage connection: unable to get storage: get config: get: mongo: no documents in result               
2026-01-15T03:36:01.000+0000 I listening for the commands                                                                                                   
2026-01-15T03:36:06.000+0000 E [agentCheckup] check storage connection: unable to get storage: get config: get: mongo: no documents in result               
2026-01-15T03:36:11.000+0000 E [agentCheckup] check storage connection: unable to get storage: get config: get: mongo: no documents in result               
2026-01-15T03:36:13.000+0000 I got command resync <ts: 1768448172>, opid: 696860ac029bd523d1cd7ea7                                                          
2026-01-15T03:36:13.000+0000 I got epoch {1768448172 2}                                                                                                     
2026-01-15T03:36:13.000+0000 D [resync] lock not acquired                                                                                                   
2026-01-15T03:36:18.000+0000 W [agentCheckup] storage is not initialized                                                                                    
2026-01-15T03:39:30.000+0000 I got command resync <ts: 1768448369>, opid: 69686171029bd523d1cd7f46                                                          
2026-01-15T03:39:30.000+0000 I got epoch {1768448198 8}                                                                                                     
2026-01-15T03:39:30.000+0000 D [resync] lock not acquired

mongodb-test-1:backup-agent

2026-01-15T03:36:07.000+0000 I starting PITR routine                                                                                                        
2026-01-15T03:36:07.000+0000 I node: test/mongodb-test-1.mongodb-test.data-system.svc.cluster.local:27017                                                         
2026-01-15T03:36:07.000+0000 I conn level ReadConcern: majority; WriteConcern: majority                                                                     
2026-01-15T03:36:07.000+0000 E [agentCheckup] check storage connection: unable to get storage: get config: get: mongo: no documents in result               
2026-01-15T03:36:07.000+0000 I listening for the commands                                                                                                   
2026-01-15T03:36:12.000+0000 E [agentCheckup] check storage connection: unable to get storage: get config: get: mongo: no documents in result               
2026-01-15T03:36:13.000+0000 I got command resync <ts: 1768448172>, opid: 696860ac029bd523d1cd7ea7                                                          
2026-01-15T03:36:13.000+0000 I got epoch {1768448172 2}                                                                                                     
2026-01-15T03:36:13.000+0000 I [resync] started                                                                                                             
2026-01-15T03:36:16.000+0000 D [resync] uploading ".pbm.init" [size hint: 6 (6.00B); part size: 10485760 (10.00MB)]                                         
2026-01-15T03:36:17.000+0000 W [agentCheckup] storage is not initialized                                                                                    
2026-01-15T03:36:37.000+0000 D [resync] got backups list: 2                                                                                                 
2026-01-15T03:36:38.000+0000 D [resync] bcp: 2026-01-13T17:11:49Z                                                                                           
2026-01-15T03:36:38.000+0000 D [resync] bcp: 2026-01-13T17:16:19Z                                                                                           
2026-01-15T03:36:38.000+0000 D [resync] got physical restores list: 0                                                                                       
2026-01-15T03:36:38.000+0000 D [resync] epoch set to {1768448198 8}                                                                                         
2026-01-15T03:36:38.000+0000 I [resync] succeed                                                                                                             
2026-01-15T03:39:30.000+0000 I got command resync <ts: 1768448369>, opid: 69686171029bd523d1cd7f46                                                          
2026-01-15T03:39:30.000+0000 I got epoch {1768448198 8}                                                                                                     
2026-01-15T03:39:30.000+0000 I [resync] started                                                                                                             
2026-01-15T03:39:48.000+0000 D [resync] got backups list: 2                                                                                                 
2026-01-15T03:39:48.000+0000 D [resync] bcp: 2026-01-13T17:11:49Z                                                                                           
2026-01-15T03:39:48.000+0000 D [resync] bcp: 2026-01-13T17:16:19Z                                                                                           
2026-01-15T03:39:48.000+0000 I [resync] succeed

mongodb-test-2:backup-agent

2026-01-15T03:36:14.000+0000 I starting PITR routine                                                                                                        
2026-01-15T03:36:14.000+0000 I node: test/mongodb-test-2.mongodb-test.data-system.svc.cluster.local:27017                                                         
2026-01-15T03:36:14.000+0000 I conn level ReadConcern: majority; WriteConcern: majority                                                                     
2026-01-15T03:36:14.000+0000 I listening for the commands                                                                                                   
2026-01-15T03:39:30.000+0000 I got command resync <ts: 1768448369>, opid: 69686171029bd523d1cd7f46                                                          
2026-01-15T03:39:30.000+0000 I got epoch {1768448198 8}                                                                                                     
2026-01-15T03:39:30.000+0000 D [resync] lock not acquired

mongodb-restore-description

Name:         mongodb-restore                                                                                                                               
Namespace:    data-system                                                                                                                                   
Labels:                                                                                                                                               
Annotations:                                                                                                                                          
API Version:  psmdb.percona.com/v1                                                                                                                         
Kind:         PerconaServerMongoDBRestore                                                                                                                   
Metadata:                                                                                                                                                   
 Creation Timestamp:  2026-01-15T03:39:29Z                                                                                                                 
 Generation:          1                                                                                                                                    
 Resource Version:    404117                                                                                                                               
 UID:                 35c66426-013b-4cbe-a360-f5ac15b01a15                                                                                                 
Spec:                                                                                                                                                       
 Backup Source:                                                                                                                                            
   Destination:  s3://test/2026-01-13T04:45:00Z/                                                                                                           
   s3:                                                                                                                                                     
     Bucket:              test                                                                                                                             
     Credentials Secret:  mongodb-backup-r2                                                                                                                
     Endpoint URL:        https://{account_id}.r2.cloudflarestorage.com                                                                
     Region:              auto                                                                                                                             
   Type:                  incremental                                                                                                                      
 Cluster Name:            mongodb                                                                                                                          
Status:                                                                                                                                                     
 Error:  failed to validate backup: validate backup in metadata: get backup meta: not found                                                                
 State:  error                                                                                                                                             
Events: <none>

Hi @Lim_Les ,

In PerconaServerMongoDBRestore resource, you must change:
from:
destination: s3://test/2026-01-13T04:45:00Z/
to:
destination: s3://test/2026-01-13T04:45:00Z

(remove slash at end of line)

1 Like

Hi, I finally managed to recover it successfully. Thank you for your assistance with the recovery. The issue was caused by an extra slash. In fact, at the beginning I had three problems: an inconsistent encryption key, a version mismatch, and a trailing slash at the end of the destination. The whole process was like figuring out how to protect an egg so it wouldn’t break when dropped from the fifth floor. Interestingly, the destination in the documentation actually ends with a slash.

I hope PBM can improve its compatibility. After all, backups are the last line of defense and recovery should be straightforward and easy.

Hi @Lim_Les, we have the following information in the backup object. This information can be used to simplify the restoration process:

status:
  completed: "2026-01-19T18:15:53Z"
  destination: s3://operator-testing/2026-01-19T18:15:46Z
  lastTransition: "2026-01-19T18:15:53Z"
  lastWriteAt: "2026-01-19T18:15:52Z"
  pbmName: "2026-01-19T18:15:46Z"
  pbmPod: some-name-rs0-2.some-name-rs0.multi-storage-29939.svc.cluster.local:27017
  pbmPods:
    rs0: some-name-rs0-2.some-name-rs0.multi-storage-29939.svc.cluster.local:27017
  replsetNames:
  - rs0
  s3:
    bucket: operator-testing
    credentialsSecret: minio-secret
    endpointUrl: http://minio-service:9000/
    region: us-east-1
    serverSideEncryption: {}
  size: 57.64KB
  start: "2026-01-19T18:15:47Z"
  state: ready
  storageName: minio-1
  type: logical

Do you have any suggestions on what we should do/add to simplify the process?