tl; dr - restoring sharded mongodb backup created via percona backup for mongodb not working as expected.
Description
Attempts to restore a mongodb backup that was downloaded from s3 to the filesystem are failing.
The error message which is printed is that
..Error: waiting for start: cluster failed: prepare snapshot: failed to ensure snapshot file 2021-08-04T11:50:33Z_shardrs01.dump.s2: no such file
However, the file 2021-08-04T11:50:33Z_shardrs01.dump.s2
exists within the backup path:
$ ls -la
total 14388300
drwxr-xr-x 2 pbm pbm 4096 Aug 4 18:48 .
drwxr-xr-x 4 pbm pbm 4096 Aug 4 18:01 ..
-rw-r--r-- 1 pbm pbm 5 Aug 4 16:47 .pbm.init
-rw-r--r-- 1 root root 3158 Aug 4 18:48 2021-08-04T11:50:33Z.pbm.json
-rw-r--r-- 1 root root 3289 Aug 4 18:47 2021-08-04T11:50:33Z.pbm.json.backup
-rw-r--r-- 1 pbm pbm 1756198 Aug 4 17:20 2021-08-04T11:50:33Z_configrs.dump.s2
-rw-r--r-- 1 pbm pbm 43855 Aug 4 17:25 2021-08-04T11:50:33Z_configrs.oplog.s2
-rw-r--r-- 1 pbm pbm 7300133194 Aug 4 17:20 2021-08-04T11:50:33Z_shardrs01.dump.s2
-rw-r--r-- 1 pbm pbm 203819 Aug 4 17:25 2021-08-04T11:50:33Z_shardrs01.oplog.s2
-rw-r--r-- 1 pbm pbm 7431279565 Aug 4 17:20 2021-08-04T11:50:33Z_shardrs02.dump.s2
-rw-r--r-- 1 pbm pbm 159863 Aug 4 17:25 2021-08-04T11:50:33Z_shardrs02.oplog.s2
Basically, pbm says that a required file doesn’t exist, but that isn’t true.
What was done
- We had configured percona backup for mongodb to backup our production mongodb sharded database to s3. The configuration for that is as follows:
pitr:
enabled: false
storage:
type: s3
s3:
provider: aws
region: $region
bucket: $bucket
prefix: $prefix
credentials:
access-key-id: '#SECRET'
secret-access-key: #SECRET'
Our end goal was to restore this backup to an identical mongodb sharded database cluster in a non-production environment. for that, we have this configuration:
pitr:
enabled: false
storage:
type: filesystem
filesystem:
path: /var/lib/pbm/backups/
restore:
batchSize: 350
numInsertionWorkers: 1
for the above config, we have ensured that the owner is the pbm
user for the /var/lib/pbm
directory.
Once both configurations were setup, we made a backup of the production cluster.
On the staging side, we downloaded those backups from s3 to the pbm backup path on the filesystem, which was /var/lib/pbm/backups
.
we altered the store
key of that backup’s .pbm.json
file this way to let pbm
know that the backup was created on the filesystem:
{
"type": "filesystem",
"s3": {},
"azure": {},
"filesystem": {
"path": "/var/lib/pbm/backups/"
}
}
once that was done, we issued
pbm config --force-resync
We got the backups to show up in our list of backups via pbm list
:
$ pbm list
Backup snapshots:
2021-08-04T11:50:33Z [complete: 2021-08-04T11:55:30]
What happened
We tried to issue
pbm restore '2021-08-04T11:50:33Z'
This resulted in the following error:
$ pbm restore '2021-08-04T11:50:33Z'
..Error: waiting for start: cluster failed: prepare snapshot: failed to ensure snapshot file 2021-08-04T11:50:33Z_shardrs01.dump.s2: no such file
We’re completely baffled by this issue and pointers on this would be of immense help.