PITR restore a backup to a new cluster

Hi,

I followed the steps Restore a Replica Set to a New Environment with Percona Backup for MongoDB - Percona Database Performance Blog to restore a backup to a new cluster. It works when i restore the backup file name.
pbm restore 2022-08-17T14:09:26Z

But when i try to restore point in time, it failed with below error.
error: no chunks found
But the chunks are already there as well.

Please suggest.

1 Like

Any inputs would be helpful
!

1 Like

Hi @aranjith0,

Could you please run pbm config --force-resync and then send output of pbm list --full --unbacked here?

1 Like

@aranjith0 can you also provide the output of pbm logs -sD -t0 after the failed restore?

1 Like

Hello @ Dmytro_Zghoba,

[mongod@ip-10-80-11-213 ~]$ pbm config --force-resync
Storage resync started
[mongod@ip-10-80-11-213 ~]$ 2022-09-06T08:48:00.000+0000 I got command resync <ts: 1662454080>
2022-09-06T08:48:00.000+0000 I got epoch {1662130640 44}
2022-09-06T08:48:00.000+0000 I [resync] not a member of the leader rs

[mongod@ip-10-80-11-213 ~]$ pbm list --full --unbacked
Backup snapshots:
2022-08-17T14:09:26Z [complete: 2022-08-18T06:08:20Z]
2022-08-29T05:30:01Z [complete: 2022-08-29T19:36:09Z]
2022-09-05T07:39:00Z [complete: 2022-09-05T22:36:53Z]

PITR :
2022-08-18T06:08:21Z - 2022-08-21T07:53:47Z
2022-08-29T19:36:10Z - 2022-09-01T15:25:17Z

shard2ReplSet: [{ 2022-08-14T03:40:37 - 2022-09-05T22:36:53 }]
shard3ReplSet: [{ 2022-08-18T06:08:20 - 2022-08-21T07:53:47 } { 2022-08-29T19:36:09 - 2022-09-01T15:25:17 }]
configReplSet: [{ 2022-08-14T03:40:37 - 2022-09-05T22:36:53 }]

1 Like

@ Andrew_Pogrebnoi, Here is the output of pbm logs -sD -t0 after the failed restore:

2022-09-02T14:57:19Z I [configReplSet/10.80.11.176:27029] got command resync <ts: 1662130639>
2022-09-02T14:57:19Z I [configReplSet/10.80.11.176:27029] got epoch {1662130637 3}
2022-09-02T14:57:19Z I [configReplSet/10.80.11.176:27029] [resync] started
2022-09-02T14:57:19Z D [configReplSet/10.80.11.176:27029] [resync] got physical restores list: 0
2022-09-02T14:57:19Z D [configReplSet/10.80.11.176:27029] [resync] got backups list: 2
2022-09-02T14:57:19Z D [configReplSet/10.80.11.176:27029] [resync] bcp: 2022-08-17T14:09:26Z.pbm.json
2022-09-02T14:57:19Z D [configReplSet/10.80.11.176:27029] [resync] bcp: 2022-08-29T05:30:01Z.pbm.json
2022-09-02T14:57:19Z I [shard1ReplSet/10.80.11.176:27028] got command resync <ts: 1662130639>
2022-09-02T14:57:19Z I [shard1ReplSet/10.80.11.176:27028] got epoch {1662130637 3}
2022-09-02T14:57:19Z I [shard1ReplSet/10.80.11.176:27028] [resync] not a member of the leader rs
2022-09-02T14:57:20Z I [shard3ReplSet/10.80.11.213:27028] got command resync <ts: 1662130639>
2022-09-02T14:57:20Z I [shard3ReplSet/10.80.11.213:27028] got epoch {1662130637 3}
2022-09-02T14:57:20Z I [shard3ReplSet/10.80.11.213:27028] [resync] not a member of the leader rs
2022-09-02T14:57:20Z I [configReplSet/10.80.11.176:27029] [resync] succeed
2022-09-02T14:57:20Z D [configReplSet/10.80.11.176:27029] [resync] epoch set to {1662130640 44}
2022-09-02T14:57:21Z I [shard2ReplSet/10.80.11.55:27028] got command resync <ts: 1662130639>
2022-09-02T14:57:21Z I [shard2ReplSet/10.80.11.55:27028] got epoch {1662130637 3}
2022-09-02T14:57:21Z I [shard2ReplSet/10.80.11.55:27028] [resync] not a member of the leader rs
2022-09-02T14:58:18Z I [configReplSet/10.80.11.176:27029] got command pitrestore [name: 2022-09-02T14:58:19.091570724Z, point-in-time ts: 1660892400] <ts: 1662130699>
2022-09-02T14:58:18Z I [configReplSet/10.80.11.176:27029] got epoch {1662130640 44}
2022-09-02T14:58:18Z I [configReplSet/10.80.11.176:27029] [pitrestore/2022-08-19T07:00:00Z] recovery started
2022-09-02T14:58:18Z D [configReplSet/10.80.11.176:27029] [pitrestore/2022-08-19T07:00:00Z] waiting for ‘starting’ status
2022-09-02T14:58:18Z I [shard1ReplSet/10.80.11.176:27028] got command pitrestore [name: 2022-09-02T14:58:19.091570724Z, point-in-time ts: 1660892400] <ts: 1662130699>
2022-09-02T14:58:18Z I [shard1ReplSet/10.80.11.176:27028] got epoch {1662130640 44}
2022-09-02T14:58:18Z I [shard1ReplSet/10.80.11.176:27028] [pitrestore/2022-08-19T07:00:00Z] recovery started
2022-09-02T14:58:18Z D [shard1ReplSet/10.80.11.176:27028] [pitrestore/2022-08-19T07:00:00Z] waiting for ‘starting’ status
2022-09-02T14:58:19Z I [shard3ReplSet/10.80.11.213:27028] got command pitrestore [name: 2022-09-02T14:58:19.091570724Z, point-in-time ts: 1660892400] <ts: 1662130699>
2022-09-02T14:58:19Z I [shard3ReplSet/10.80.11.213:27028] got epoch {1662130640 44}
2022-09-02T14:58:19Z I [shard3ReplSet/10.80.11.213:27028] [pitrestore/2022-08-19T07:00:00Z] recovery started
2022-09-02T14:58:19Z D [shard3ReplSet/10.80.11.213:27028] [pitrestore/2022-08-19T07:00:00Z] waiting for ‘starting’ status
2022-09-02T14:58:19Z I [configReplSet/10.80.11.176:27029] [pitrestore/2022-08-19T07:00:00Z] moving to state running
2022-09-02T14:58:19Z E [shard1ReplSet/10.80.11.176:27028] [pitrestore/2022-08-19T07:00:00Z] restore: no chunks found
2022-09-02T14:58:20Z I [shard2ReplSet/10.80.11.55:27028] got command pitrestore [name: 2022-09-02T14:58:19.091570724Z, point-in-time ts: 1660892400] <ts: 1662130699>
2022-09-02T14:58:20Z I [shard2ReplSet/10.80.11.55:27028] got epoch {1662130640 44}
2022-09-02T14:58:20Z I [shard2ReplSet/10.80.11.55:27028] [pitrestore/2022-08-19T07:00:00Z] recovery started
2022-09-02T14:58:20Z D [shard2ReplSet/10.80.11.55:27028] [pitrestore/2022-08-19T07:00:00Z] waiting for ‘starting’ status
2022-09-02T14:58:20Z I [shard3ReplSet/10.80.11.213:27028] [pitrestore/2022-08-19T07:00:00Z] moving to state running
2022-09-02T14:58:20Z E [configReplSet/10.80.11.176:27029] [pitrestore/2022-08-19T07:00:00Z] restore: check cluster for restore started: convergeClusterWithTimeout: restore on the shard shard1ReplSet failed with: no chunks found
2022-09-02T14:58:21Z I [shard2ReplSet/10.80.11.55:27028] [pitrestore/2022-08-19T07:00:00Z] moving to state running
2022-09-02T14:58:21Z E [shard3ReplSet/10.80.11.213:27028] [pitrestore/2022-08-19T07:00:00Z] restore: waiting for running: cluster failed: check cluster for restore started: convergeClusterWithTimeout: restore on the shard shard1ReplSet failed with: no chunks found
2022-09-02T14:58:22Z E [shard2ReplSet/10.80.11.55:27028] [pitrestore/2022-08-19T07:00:00Z] restore: waiting for running: cluster failed: waiting for running: cluster failed: check cluster for restore started: convergeClusterWithTimeout: restore on the shard shard1ReplSet failed with: no chunks found
2022-09-02T14:58:50Z I [shard1ReplSet/10.80.11.176:27028] got command pitrestore [name: 2022-09-02T14:58:51.651018877Z, point-in-time ts: 1660806000] <ts: 1662130731>
2022-09-02T14:58:50Z I [shard1ReplSet/10.80.11.176:27028] got epoch {1662130640 44}
2022-09-02T14:58:50Z I [shard1ReplSet/10.80.11.176:27028] [pitrestore/2022-08-18T07:00:00Z] recovery started
2022-09-02T14:58:50Z D [shard1ReplSet/10.80.11.176:27028] [pitrestore/2022-08-18T07:00:00Z] waiting for ‘starting’ status
2022-09-02T14:58:51Z I [configReplSet/10.80.11.176:27029] got command pitrestore [name: 2022-09-02T14:58:51.651018877Z, point-in-time ts: 1660806000] <ts: 1662130731>
2022-09-02T14:58:51Z I [configReplSet/10.80.11.176:27029] got epoch {1662130640 44}
2022-09-02T14:58:51Z I [configReplSet/10.80.11.176:27029] [pitrestore/2022-08-18T07:00:00Z] recovery started
2022-09-02T14:58:51Z D [configReplSet/10.80.11.176:27029] [pitrestore/2022-08-18T07:00:00Z] waiting for ‘starting’ status
2022-09-02T14:58:51Z E [shard1ReplSet/10.80.11.176:27028] [pitrestore/2022-08-18T07:00:00Z] restore: no chunks found
2022-09-02T14:58:52Z I [shard3ReplSet/10.80.11.213:27028] got command pitrestore [name: 2022-09-02T14:58:51.651018877Z, point-in-time ts: 1660806000] <ts: 1662130731>
2022-09-02T14:58:52Z I [shard3ReplSet/10.80.11.213:27028] got epoch {1662130640 44}
2022-09-02T14:58:52Z I [shard3ReplSet/10.80.11.213:27028] [pitrestore/2022-08-18T07:00:00Z] recovery started
2022-09-02T14:58:52Z D [shard3ReplSet/10.80.11.213:27028] [pitrestore/2022-08-18T07:00:00Z] waiting for ‘starting’ status
2022-09-02T14:58:52Z E [configReplSet/10.80.11.176:27029] [pitrestore/2022-08-18T07:00:00Z] restore: waiting for start: cluster failed: no chunks found
2022-09-02T14:58:53Z I [shard2ReplSet/10.80.11.55:27028] got command pitrestore [name: 2022-09-02T14:58:51.651018877Z, point-in-time ts: 1660806000] <ts: 1662130731>
2022-09-02T14:58:53Z I [shard2ReplSet/10.80.11.55:27028] got epoch {1662130640 44}
2022-09-02T14:58:53Z I [shard2ReplSet/10.80.11.55:27028] [pitrestore/2022-08-18T07:00:00Z] recovery started
2022-09-02T14:58:53Z D [shard2ReplSet/10.80.11.55:27028] [pitrestore/2022-08-18T07:00:00Z] waiting for ‘starting’ status
2022-09-02T14:58:53Z E [shard3ReplSet/10.80.11.213:27028] [pitrestore/2022-08-18T07:00:00Z] restore: waiting for start: cluster failed: waiting for start: cluster failed: no chunks found
2022-09-02T14:58:54Z E [shard2ReplSet/10.80.11.55:27028] [pitrestore/2022-08-18T07:00:00Z] restore: waiting for start: cluster failed: waiting for start: cluster failed: waiting for start: cluster failed: no chunks found
2022-09-02T14:59:15Z I [configReplSet/10.80.11.176:27029] got command pitrestore [name: 2022-09-02T14:59:15.737608813Z, point-in-time ts: 1660978800] <ts: 1662130755>
2022-09-02T14:59:15Z I [configReplSet/10.80.11.176:27029] got epoch {1662130640 44}
2022-09-02T14:59:15Z I [configReplSet/10.80.11.176:27029] [pitrestore/2022-08-20T07:00:00Z] recovery started
2022-09-02T14:59:15Z D [configReplSet/10.80.11.176:27029] [pitrestore/2022-08-20T07:00:00Z] waiting for ‘starting’ status
2022-09-02T14:59:15Z I [shard1ReplSet/10.80.11.176:27028] got command pitrestore [name: 2022-09-02T14:59:15.737608813Z, point-in-time ts: 1660978800] <ts: 1662130755>
2022-09-02T14:59:15Z I [shard1ReplSet/10.80.11.176:27028] got epoch {1662130640 44}
2022-09-02T14:59:15Z I [shard1ReplSet/10.80.11.176:27028] [pitrestore/2022-08-20T07:00:00Z] recovery started
2022-09-02T14:59:15Z D [shard1ReplSet/10.80.11.176:27028] [pitrestore/2022-08-20T07:00:00Z] waiting for ‘starting’ status
2022-09-02T14:59:16Z I [shard3ReplSet/10.80.11.213:27028] got command pitrestore [name: 2022-09-02T14:59:15.737608813Z, point-in-time ts: 1660978800] <ts: 1662130755>
2022-09-02T14:59:16Z I [shard3ReplSet/10.80.11.213:27028] got epoch {1662130640 44}
2022-09-02T14:59:16Z I [shard3ReplSet/10.80.11.213:27028] [pitrestore/2022-08-20T07:00:00Z] recovery started
2022-09-02T14:59:16Z D [shard3ReplSet/10.80.11.213:27028] [pitrestore/2022-08-20T07:00:00Z] waiting for ‘starting’ status
2022-09-02T14:59:16Z I [configReplSet/10.80.11.176:27029] [pitrestore/2022-08-20T07:00:00Z] moving to state running
2022-09-02T14:59:16Z E [shard1ReplSet/10.80.11.176:27028] [pitrestore/2022-08-20T07:00:00Z] restore: no chunks found
2022-09-02T14:59:17Z I [shard2ReplSet/10.80.11.55:27028] got command pitrestore [name: 2022-09-02T14:59:15.737608813Z, point-in-time ts: 1660978800] <ts: 1662130755>
2022-09-02T14:59:17Z I [shard2ReplSet/10.80.11.55:27028] got epoch {1662130640 44}
2022-09-02T14:59:17Z I [shard2ReplSet/10.80.11.55:27028] [pitrestore/2022-08-20T07:00:00Z] recovery started
2022-09-02T14:59:17Z D [shard2ReplSet/10.80.11.55:27028] [pitrestore/2022-08-20T07:00:00Z] waiting for ‘starting’ status
2022-09-02T14:59:17Z I [shard3ReplSet/10.80.11.213:27028] [pitrestore/2022-08-20T07:00:00Z] moving to state running
2022-09-02T14:59:17Z E [configReplSet/10.80.11.176:27029] [pitrestore/2022-08-20T07:00:00Z] restore: check cluster for restore started: convergeClusterWithTimeout: restore on the shard shard1ReplSet failed with: no chunks found
2022-09-02T14:59:18Z I [shard2ReplSet/10.80.11.55:27028] [pitrestore/2022-08-20T07:00:00Z] moving to state running
2022-09-02T14:59:18Z E [shard3ReplSet/10.80.11.213:27028] [pitrestore/2022-08-20T07:00:00Z] restore: waiting for running: cluster failed: check cluster for restore started: convergeClusterWithTimeout: restore on the shard shard1ReplSet failed with: no chunks found
2022-09-02T14:59:19Z E [shard2ReplSet/10.80.11.55:27028] [pitrestore/2022-08-20T07:00:00Z] restore: waiting for running: cluster failed: waiting for running: cluster failed: check cluster for restore started: convergeClusterWithTimeout: restore on the shard shard1ReplSet failed with: no chunks found

1 Like

@aranjith0
there are enough oplog chunks for the PITR for “configReplSet”, “shard2ReplSet”, “shard3ReplSet”.
I don’t see oplog chunks for “shard1ReplSet” but it’s ok for restore anyway. pbm should skip restore on the shard.
however, it could be that the “2022-08-29T05:30:01Z” backup contains data for the “shard1ReplSet”. in this way, pbm does not skip restore and tries to find oplog chunks for the shard and therefore fails.

could you please send result of
mongosh "$PBM_MONGODB_URI" --quiet --eval 'db.pbmBackups.findOne({ name: "2022-08-29T05:30:01Z" }) ?

[or mongosh "$PBM_MONGODB_URI" --quiet --eval 'db.pbmBackups.findOne({ name: "2022-08-29T05:30:01Z" }).replsets.map(({ name }) => name)

1 Like

No result. it just enter to a terminal.

1 Like

sorry, use the following:
mongosh "$PBM_MONGODB_URI" --quiet --eval 'db.getSiblingDB("admin").pbmBackups.findOne({ name: "2022-08-29T05:30:01Z" })

1 Like

Same thing.

1 Like

It will take long :slight_smile:
I’ll explain, what I’m trying to do.

when you run pbm restore --time $TS, it lookups for the most recent full backup, restores it, and then replays oplog (pitr chunks) up to the $TS.

before doing the backup restore, it checks target cluster compatibility with backup topology. fails when it’s incompatible. if the backup does not has a snapshot for its shard, it skips, otherwise, restores the shard and replays oplog for the shard.

I think you have a backup that contains a snapshot for shard1ReplSet but does not have PITR chunks for the shard (no idea yet how it might happen).

I’m trying to confirm this.

could you please take a look at your most recent backup and check replsets?

2 Likes