PBM restores not working

Prashant_Warrier · August 4, 2021, 1:45pm

tl; dr - restoring sharded mongodb backup created via percona backup for mongodb not working as expected.

Description

Attempts to restore a mongodb backup that was downloaded from s3 to the filesystem are failing.

The error message which is printed is that

..Error: waiting for start: cluster failed: prepare snapshot: failed to ensure snapshot file 2021-08-04T11:50:33Z_shardrs01.dump.s2: no such file

However, the file 2021-08-04T11:50:33Z_shardrs01.dump.s2 exists within the backup path:

$ ls -la
total 14388300
drwxr-xr-x 2 pbm  pbm        4096 Aug  4 18:48 .
drwxr-xr-x 4 pbm  pbm        4096 Aug  4 18:01 ..
-rw-r--r-- 1 pbm  pbm           5 Aug  4 16:47 .pbm.init
-rw-r--r-- 1 root root       3158 Aug  4 18:48 2021-08-04T11:50:33Z.pbm.json
-rw-r--r-- 1 root root       3289 Aug  4 18:47 2021-08-04T11:50:33Z.pbm.json.backup
-rw-r--r-- 1 pbm  pbm     1756198 Aug  4 17:20 2021-08-04T11:50:33Z_configrs.dump.s2
-rw-r--r-- 1 pbm  pbm       43855 Aug  4 17:25 2021-08-04T11:50:33Z_configrs.oplog.s2
-rw-r--r-- 1 pbm  pbm  7300133194 Aug  4 17:20 2021-08-04T11:50:33Z_shardrs01.dump.s2
-rw-r--r-- 1 pbm  pbm      203819 Aug  4 17:25 2021-08-04T11:50:33Z_shardrs01.oplog.s2
-rw-r--r-- 1 pbm  pbm  7431279565 Aug  4 17:20 2021-08-04T11:50:33Z_shardrs02.dump.s2
-rw-r--r-- 1 pbm  pbm      159863 Aug  4 17:25 2021-08-04T11:50:33Z_shardrs02.oplog.s2

Basically, pbm says that a required file doesn’t exist, but that isn’t true.

What was done

We had configured percona backup for mongodb to backup our production mongodb sharded database to s3. The configuration for that is as follows:

pitr:
  enabled: false
storage:
  type: s3
  s3:
    provider: aws
    region: $region
    bucket: $bucket
    prefix: $prefix
    credentials:
      access-key-id: '#SECRET'
      secret-access-key: #SECRET'

Our end goal was to restore this backup to an identical mongodb sharded database cluster in a non-production environment. for that, we have this configuration:

pitr:
  enabled: false
storage:
  type: filesystem
  filesystem:
    path: /var/lib/pbm/backups/
restore:
  batchSize: 350
  numInsertionWorkers: 1

for the above config, we have ensured that the owner is the pbm user for the /var/lib/pbm directory.

Once both configurations were setup, we made a backup of the production cluster.

On the staging side, we downloaded those backups from s3 to the pbm backup path on the filesystem, which was /var/lib/pbm/backups.

we altered the store key of that backup’s .pbm.json file this way to let pbm know that the backup was created on the filesystem:

{
  "type": "filesystem",
  "s3": {},
  "azure": {},
  "filesystem": {
    "path": "/var/lib/pbm/backups/"
  }
}

once that was done, we issued

pbm config --force-resync

We got the backups to show up in our list of backups via pbm list:

$ pbm list
Backup snapshots:
  2021-08-04T11:50:33Z [complete: 2021-08-04T11:55:30]

What happened

We tried to issue

pbm restore '2021-08-04T11:50:33Z'

This resulted in the following error:

$ pbm restore '2021-08-04T11:50:33Z'                                                  
..Error: waiting for start: cluster failed: prepare snapshot: failed to ensure snapshot file 2021-08-04T11:50:33Z_shardrs01.dump.s2: no such file

We’re completely baffled by this issue and pointers on this would be of immense help.

Andrew_Pogrebnoi · August 10, 2021, 11:37am

Hi @Prashant_Warrier

Do all agents on all shards have access to the very same /var/lib/pbm/backups/ with all files? Is it NFS?

Prashant_Warrier · August 13, 2021, 6:24am

All agents on all shards have access to the /var/lib/pbm/backups directory - in the sense that that directory exists on each node within each replica within each shard.

It is not NFS.

Andrew_Pogrebnoi · August 13, 2021, 10:16am

Oh, I see. All agents have to have access to the very same directory with the same files. Meaning it should be either S3-king storage or some kind of network file system in case of storage.type: filesystem. The idea is that you never know what exactly node will make a backup. Even with the backup priority option there is no 100% guarantee as the node may be down etc. Moreover, the node that will perform restore most probably gonna be the different node that did a backup - for the restore it’s should be a primary node as for the backup preference given to the secondary nodes. So all agents should have access to all of the backup files.

Topic		Replies	Views
PBM restore failing Percona Backup for MongoDB	3	1855	September 12, 2023
Restore backup logical Percona Server for MongoDB percona , mongodb	2	939	July 22, 2024
Pbm restore is failing	0	200	March 28, 2024
Issue with PBM for sharded cluster Percona Backup for MongoDB	6	1776	August 10, 2022
PBM Restore failed Percona Backup for MongoDB	4	917	May 18, 2023

PBM restores not working

Description

What was done

What happened

Related topics