When attempting to restore an incremental backup using Percona Server MongoDB Operator in a cross-cluster restore scenario, the restore operation runs for a period of time, then the status becomes empty, followed by an error: “failed to validate backup: validate backup in metadata: get backup meta: not found”.
Cross-cluster setup:
Original cluster: Backup was created automatically by scheduled tasks
New cluster: Fresh deployment attempting to restore from the backup
However, the Operator in the new cluster appears unable to correctly locate and read the backup metadata created by the original cluster, causing the restore operation to fail.
Steps to Reproduce:
1.In the original cluster:
Deploy Percona Server MongoDB Operator with S3-compatible storage (Cloudflare R2)
The destination path format in restore.yaml: s3://bucket-name/backup-timestamp/
Potential issue: The Operator may be storing cluster-specific information in metadata or using cluster-specific paths, preventing cross-cluster restores
Suggested investigation:
Whether the metadata contains cluster-specific identifiers that prevent cross-cluster restore
The path logic used by the Operator when validating backups from different clusters
Whether cross-cluster restore is officially supported
Hi, have you followed the steps in documentation for restoring on a new cluster? there are some extra things to do like making sure user passwords and the encryption key match.
However, I did NOT use the same encryption key in the new cluster. I created a different encryption key secret.This is likely the cause of the issue since I’m restoring from an incremental backup.
Before I recreate the cluster with the matching encryption key, could you confirm:Is my user passwords setup correct as shown above?
I run the following two commands before creating both the old and new clusters to ensure that the two clusters use the same users-password and encryption-key:
I tested backup and restore within the same cluster, and it works correctly.
Therefore, I suspect the issue might be related to what you mentioned earlier.
However, even after setting the same users-password and encryption-key, the behavior is still different.
Could you please help me understand what I might be missing?
I’ve been working on this for a full day and still haven’t been able to resolve the issue
I also performed a cross-cluster restore like you and encountered many errors, for reasons I didn’t know. Finally, I discovered that I was using two encryption keys in two different clusters (because when declaring a secrets.encryptionKey, if the secret key doesn’t exist, the system will automatically create it). I created an encryptionKey as shown below and successfully performed a cross-cluster restore.
Hi, after doing restore process. If the result is failed, you should exec into pods cfg and rs of container mongod, then check error in log files /data/db/pbm.restore.log, /data/db/pbm-restore-logs/. You will get more detail errors or you upload them here, me will try to check error
Thank you for your response. However, the paths you mentioned, /data/db/pbm.restore.log and /data/db/pbm-restore-logs/, do not exist. I also checked the relevant logs from mongod and the backup agent, but did not find any related information. However, I can provide some logs for your reference.
mongodb-test-0:backup-agent
2026-01-15T03:36:01.000+0000 I starting PITR routine
2026-01-15T03:36:01.000+0000 I node: test/mongodb-test-0.mongodb-test.data-system.svc.cluster.local:27017
2026-01-15T03:36:01.000+0000 I conn level ReadConcern: majority; WriteConcern: majority
2026-01-15T03:36:01.000+0000 E [agentCheckup] check storage connection: unable to get storage: get config: get: mongo: no documents in result
2026-01-15T03:36:01.000+0000 I listening for the commands
2026-01-15T03:36:06.000+0000 E [agentCheckup] check storage connection: unable to get storage: get config: get: mongo: no documents in result
2026-01-15T03:36:11.000+0000 E [agentCheckup] check storage connection: unable to get storage: get config: get: mongo: no documents in result
2026-01-15T03:36:13.000+0000 I got command resync <ts: 1768448172>, opid: 696860ac029bd523d1cd7ea7
2026-01-15T03:36:13.000+0000 I got epoch {1768448172 2}
2026-01-15T03:36:13.000+0000 D [resync] lock not acquired
2026-01-15T03:36:18.000+0000 W [agentCheckup] storage is not initialized
2026-01-15T03:39:30.000+0000 I got command resync <ts: 1768448369>, opid: 69686171029bd523d1cd7f46
2026-01-15T03:39:30.000+0000 I got epoch {1768448198 8}
2026-01-15T03:39:30.000+0000 D [resync] lock not acquired
mongodb-test-1:backup-agent
2026-01-15T03:36:07.000+0000 I starting PITR routine
2026-01-15T03:36:07.000+0000 I node: test/mongodb-test-1.mongodb-test.data-system.svc.cluster.local:27017
2026-01-15T03:36:07.000+0000 I conn level ReadConcern: majority; WriteConcern: majority
2026-01-15T03:36:07.000+0000 E [agentCheckup] check storage connection: unable to get storage: get config: get: mongo: no documents in result
2026-01-15T03:36:07.000+0000 I listening for the commands
2026-01-15T03:36:12.000+0000 E [agentCheckup] check storage connection: unable to get storage: get config: get: mongo: no documents in result
2026-01-15T03:36:13.000+0000 I got command resync <ts: 1768448172>, opid: 696860ac029bd523d1cd7ea7
2026-01-15T03:36:13.000+0000 I got epoch {1768448172 2}
2026-01-15T03:36:13.000+0000 I [resync] started
2026-01-15T03:36:16.000+0000 D [resync] uploading ".pbm.init" [size hint: 6 (6.00B); part size: 10485760 (10.00MB)]
2026-01-15T03:36:17.000+0000 W [agentCheckup] storage is not initialized
2026-01-15T03:36:37.000+0000 D [resync] got backups list: 2
2026-01-15T03:36:38.000+0000 D [resync] bcp: 2026-01-13T17:11:49Z
2026-01-15T03:36:38.000+0000 D [resync] bcp: 2026-01-13T17:16:19Z
2026-01-15T03:36:38.000+0000 D [resync] got physical restores list: 0
2026-01-15T03:36:38.000+0000 D [resync] epoch set to {1768448198 8}
2026-01-15T03:36:38.000+0000 I [resync] succeed
2026-01-15T03:39:30.000+0000 I got command resync <ts: 1768448369>, opid: 69686171029bd523d1cd7f46
2026-01-15T03:39:30.000+0000 I got epoch {1768448198 8}
2026-01-15T03:39:30.000+0000 I [resync] started
2026-01-15T03:39:48.000+0000 D [resync] got backups list: 2
2026-01-15T03:39:48.000+0000 D [resync] bcp: 2026-01-13T17:11:49Z
2026-01-15T03:39:48.000+0000 D [resync] bcp: 2026-01-13T17:16:19Z
2026-01-15T03:39:48.000+0000 I [resync] succeed
mongodb-test-2:backup-agent
2026-01-15T03:36:14.000+0000 I starting PITR routine
2026-01-15T03:36:14.000+0000 I node: test/mongodb-test-2.mongodb-test.data-system.svc.cluster.local:27017
2026-01-15T03:36:14.000+0000 I conn level ReadConcern: majority; WriteConcern: majority
2026-01-15T03:36:14.000+0000 I listening for the commands
2026-01-15T03:39:30.000+0000 I got command resync <ts: 1768448369>, opid: 69686171029bd523d1cd7f46
2026-01-15T03:39:30.000+0000 I got epoch {1768448198 8}
2026-01-15T03:39:30.000+0000 D [resync] lock not acquired
mongodb-restore-description
Name: mongodb-restore
Namespace: data-system
Labels:
Annotations:
API Version: psmdb.percona.com/v1
Kind: PerconaServerMongoDBRestore
Metadata:
Creation Timestamp: 2026-01-15T03:39:29Z
Generation: 1
Resource Version: 404117
UID: 35c66426-013b-4cbe-a360-f5ac15b01a15
Spec:
Backup Source:
Destination: s3://test/2026-01-13T04:45:00Z/
s3:
Bucket: test
Credentials Secret: mongodb-backup-r2
Endpoint URL: https://{account_id}.r2.cloudflarestorage.com
Region: auto
Type: incremental
Cluster Name: mongodb
Status:
Error: failed to validate backup: validate backup in metadata: get backup meta: not found
State: error
Events: <none>
In PerconaServerMongoDBRestore resource, you must change:
from: destination: s3://test/2026-01-13T04:45:00Z/
to: destination: s3://test/2026-01-13T04:45:00Z
Hi, I finally managed to recover it successfully. Thank you for your assistance with the recovery. The issue was caused by an extra slash. In fact, at the beginning I had three problems: an inconsistent encryption key, a version mismatch, and a trailing slash at the end of the destination. The whole process was like figuring out how to protect an egg so it wouldn’t break when dropped from the fifth floor. Interestingly, the destination in the documentation actually ends with a slash.