PITR restore a backup to a new cluster

aranjith0 · September 1, 2022, 3:37pm

Hi,

I followed the steps Restore a Replica Set to a New Environment with Percona Backup for MongoDB - Percona Database Performance Blog to restore a backup to a new cluster. It works when i restore the backup file name.
pbm restore 2022-08-17T14:09:26Z

But when i try to restore point in time, it failed with below error.
error: no chunks found
But the chunks are already there as well.

Please suggest.

aranjith0 · September 5, 2022, 3:56pm

Any inputs would be helpful…!

Dmytro_Zghoba · September 5, 2022, 4:09pm

Hi @aranjith0,

Could you please run pbm config --force-resync and then send output of pbm list --full --unbacked here?

Andrew_Pogrebnoi · September 5, 2022, 4:38pm

@aranjith0 can you also provide the output of pbm logs -sD -t0 after the failed restore?

aranjith0 · September 6, 2022, 8:49am

Hello @ Dmytro_Zghoba,

[mongod@ip-10-80-11-213 ~]$ pbm config --force-resync
Storage resync started
[mongod@ip-10-80-11-213 ~]$ 2022-09-06T08:48:00.000+0000 I got command resync <ts: 1662454080>
2022-09-06T08:48:00.000+0000 I got epoch {1662130640 44}
2022-09-06T08:48:00.000+0000 I [resync] not a member of the leader rs

[mongod@ip-10-80-11-213 ~]$ pbm list --full --unbacked
Backup snapshots:
2022-08-17T14:09:26Z [complete: 2022-08-18T06:08:20Z]
2022-08-29T05:30:01Z [complete: 2022-08-29T19:36:09Z]
2022-09-05T07:39:00Z [complete: 2022-09-05T22:36:53Z]

PITR :
2022-08-18T06:08:21Z - 2022-08-21T07:53:47Z
2022-08-29T19:36:10Z - 2022-09-01T15:25:17Z

shard2ReplSet: [{ 2022-08-14T03:40:37 - 2022-09-05T22:36:53 }]
shard3ReplSet: [{ 2022-08-18T06:08:20 - 2022-08-21T07:53:47 } { 2022-08-29T19:36:09 - 2022-09-01T15:25:17 }]
configReplSet: [{ 2022-08-14T03:40:37 - 2022-09-05T22:36:53 }]

aranjith0 · September 6, 2022, 8:52am

@ Andrew_Pogrebnoi, Here is the output of pbm logs -sD -t0 after the failed restore:

2022-09-02T14:57:19Z I [configReplSet/10.80.11.176:27029] got command resync <ts: 1662130639>
2022-09-02T14:57:19Z I [configReplSet/10.80.11.176:27029] got epoch {1662130637 3}
2022-09-02T14:57:19Z I [configReplSet/10.80.11.176:27029] [resync] started
2022-09-02T14:57:19Z D [configReplSet/10.80.11.176:27029] [resync] got physical restores list: 0
2022-09-02T14:57:19Z D [configReplSet/10.80.11.176:27029] [resync] got backups list: 2
2022-09-02T14:57:19Z D [configReplSet/10.80.11.176:27029] [resync] bcp: 2022-08-17T14:09:26Z.pbm.json
2022-09-02T14:57:19Z D [configReplSet/10.80.11.176:27029] [resync] bcp: 2022-08-29T05:30:01Z.pbm.json
2022-09-02T14:57:19Z I [shard1ReplSet/10.80.11.176:27028] got command resync <ts: 1662130639>
2022-09-02T14:57:19Z I [shard1ReplSet/10.80.11.176:27028] got epoch {1662130637 3}
2022-09-02T14:57:19Z I [shard1ReplSet/10.80.11.176:27028] [resync] not a member of the leader rs
2022-09-02T14:57:20Z I [shard3ReplSet/10.80.11.213:27028] got command resync <ts: 1662130639>
2022-09-02T14:57:20Z I [shard3ReplSet/10.80.11.213:27028] got epoch {1662130637 3}
2022-09-02T14:57:20Z I [shard3ReplSet/10.80.11.213:27028] [resync] not a member of the leader rs
2022-09-02T14:57:20Z I [configReplSet/10.80.11.176:27029] [resync] succeed
2022-09-02T14:57:20Z D [configReplSet/10.80.11.176:27029] [resync] epoch set to {1662130640 44}
2022-09-02T14:57:21Z I [shard2ReplSet/10.80.11.55:27028] got command resync <ts: 1662130639>
2022-09-02T14:57:21Z I [shard2ReplSet/10.80.11.55:27028] got epoch {1662130637 3}
2022-09-02T14:57:21Z I [shard2ReplSet/10.80.11.55:27028] [resync] not a member of the leader rs
2022-09-02T14:58:18Z I [configReplSet/10.80.11.176:27029] got command pitrestore [name: 2022-09-02T14:58:19.091570724Z, point-in-time ts: 1660892400] <ts: 1662130699>
2022-09-02T14:58:18Z I [configReplSet/10.80.11.176:27029] got epoch {1662130640 44}
2022-09-02T14:58:18Z I [configReplSet/10.80.11.176:27029] [pitrestore/2022-08-19T07:00:00Z] recovery started
2022-09-02T14:58:18Z D [configReplSet/10.80.11.176:27029] [pitrestore/2022-08-19T07:00:00Z] waiting for ‘starting’ status
2022-09-02T14:58:18Z I [shard1ReplSet/10.80.11.176:27028] got command pitrestore [name: 2022-09-02T14:58:19.091570724Z, point-in-time ts: 1660892400] <ts: 1662130699>
2022-09-02T14:58:18Z I [shard1ReplSet/10.80.11.176:27028] got epoch {1662130640 44}
2022-09-02T14:58:18Z I [shard1ReplSet/10.80.11.176:27028] [pitrestore/2022-08-19T07:00:00Z] recovery started
2022-09-02T14:58:18Z D [shard1ReplSet/10.80.11.176:27028] [pitrestore/2022-08-19T07:00:00Z] waiting for ‘starting’ status
2022-09-02T14:58:19Z I [shard3ReplSet/10.80.11.213:27028] got command pitrestore [name: 2022-09-02T14:58:19.091570724Z, point-in-time ts: 1660892400] <ts: 1662130699>
2022-09-02T14:58:19Z I [shard3ReplSet/10.80.11.213:27028] got epoch {1662130640 44}
2022-09-02T14:58:19Z I [shard3ReplSet/10.80.11.213:27028] [pitrestore/2022-08-19T07:00:00Z] recovery started
2022-09-02T14:58:19Z D [shard3ReplSet/10.80.11.213:27028] [pitrestore/2022-08-19T07:00:00Z] waiting for ‘starting’ status
2022-09-02T14:58:19Z I [configReplSet/10.80.11.176:27029] [pitrestore/2022-08-19T07:00:00Z] moving to state running
2022-09-02T14:58:19Z E [shard1ReplSet/10.80.11.176:27028] [pitrestore/2022-08-19T07:00:00Z] restore: no chunks found
2022-09-02T14:58:20Z I [shard2ReplSet/10.80.11.55:27028] got command pitrestore [name: 2022-09-02T14:58:19.091570724Z, point-in-time ts: 1660892400] <ts: 1662130699>
2022-09-02T14:58:20Z I [shard2ReplSet/10.80.11.55:27028] got epoch {1662130640 44}
2022-09-02T14:58:20Z I [shard2ReplSet/10.80.11.55:27028] [pitrestore/2022-08-19T07:00:00Z] recovery started
2022-09-02T14:58:20Z D [shard2ReplSet/10.80.11.55:27028] [pitrestore/2022-08-19T07:00:00Z] waiting for ‘starting’ status
2022-09-02T14:58:20Z I [shard3ReplSet/10.80.11.213:27028] [pitrestore/2022-08-19T07:00:00Z] moving to state running
2022-09-02T14:58:20Z E [configReplSet/10.80.11.176:27029] [pitrestore/2022-08-19T07:00:00Z] restore: check cluster for restore started: convergeClusterWithTimeout: restore on the shard shard1ReplSet failed with: no chunks found
2022-09-02T14:58:21Z I [shard2ReplSet/10.80.11.55:27028] [pitrestore/2022-08-19T07:00:00Z] moving to state running
2022-09-02T14:58:21Z E [shard3ReplSet/10.80.11.213:27028] [pitrestore/2022-08-19T07:00:00Z] restore: waiting for running: cluster failed: check cluster for restore started: convergeClusterWithTimeout: restore on the shard shard1ReplSet failed with: no chunks found
2022-09-02T14:58:22Z E [shard2ReplSet/10.80.11.55:27028] [pitrestore/2022-08-19T07:00:00Z] restore: waiting for running: cluster failed: waiting for running: cluster failed: check cluster for restore started: convergeClusterWithTimeout: restore on the shard shard1ReplSet failed with: no chunks found
2022-09-02T14:58:50Z I [shard1ReplSet/10.80.11.176:27028] got command pitrestore [name: 2022-09-02T14:58:51.651018877Z, point-in-time ts: 1660806000] <ts: 1662130731>
2022-09-02T14:58:50Z I [shard1ReplSet/10.80.11.176:27028] got epoch {1662130640 44}
2022-09-02T14:58:50Z I [shard1ReplSet/10.80.11.176:27028] [pitrestore/2022-08-18T07:00:00Z] recovery started
2022-09-02T14:58:50Z D [shard1ReplSet/10.80.11.176:27028] [pitrestore/2022-08-18T07:00:00Z] waiting for ‘starting’ status
2022-09-02T14:58:51Z I [configReplSet/10.80.11.176:27029] got command pitrestore [name: 2022-09-02T14:58:51.651018877Z, point-in-time ts: 1660806000] <ts: 1662130731>
2022-09-02T14:58:51Z I [configReplSet/10.80.11.176:27029] got epoch {1662130640 44}
2022-09-02T14:58:51Z I [configReplSet/10.80.11.176:27029] [pitrestore/2022-08-18T07:00:00Z] recovery started
2022-09-02T14:58:51Z D [configReplSet/10.80.11.176:27029] [pitrestore/2022-08-18T07:00:00Z] waiting for ‘starting’ status
2022-09-02T14:58:51Z E [shard1ReplSet/10.80.11.176:27028] [pitrestore/2022-08-18T07:00:00Z] restore: no chunks found
2022-09-02T14:58:52Z I [shard3ReplSet/10.80.11.213:27028] got command pitrestore [name: 2022-09-02T14:58:51.651018877Z, point-in-time ts: 1660806000] <ts: 1662130731>
2022-09-02T14:58:52Z I [shard3ReplSet/10.80.11.213:27028] got epoch {1662130640 44}
2022-09-02T14:58:52Z I [shard3ReplSet/10.80.11.213:27028] [pitrestore/2022-08-18T07:00:00Z] recovery started
2022-09-02T14:58:52Z D [shard3ReplSet/10.80.11.213:27028] [pitrestore/2022-08-18T07:00:00Z] waiting for ‘starting’ status
2022-09-02T14:58:52Z E [configReplSet/10.80.11.176:27029] [pitrestore/2022-08-18T07:00:00Z] restore: waiting for start: cluster failed: no chunks found
2022-09-02T14:58:53Z I [shard2ReplSet/10.80.11.55:27028] got command pitrestore [name: 2022-09-02T14:58:51.651018877Z, point-in-time ts: 1660806000] <ts: 1662130731>
2022-09-02T14:58:53Z I [shard2ReplSet/10.80.11.55:27028] got epoch {1662130640 44}
2022-09-02T14:58:53Z I [shard2ReplSet/10.80.11.55:27028] [pitrestore/2022-08-18T07:00:00Z] recovery started
2022-09-02T14:58:53Z D [shard2ReplSet/10.80.11.55:27028] [pitrestore/2022-08-18T07:00:00Z] waiting for ‘starting’ status
2022-09-02T14:58:53Z E [shard3ReplSet/10.80.11.213:27028] [pitrestore/2022-08-18T07:00:00Z] restore: waiting for start: cluster failed: waiting for start: cluster failed: no chunks found
2022-09-02T14:58:54Z E [shard2ReplSet/10.80.11.55:27028] [pitrestore/2022-08-18T07:00:00Z] restore: waiting for start: cluster failed: waiting for start: cluster failed: waiting for start: cluster failed: no chunks found
2022-09-02T14:59:15Z I [configReplSet/10.80.11.176:27029] got command pitrestore [name: 2022-09-02T14:59:15.737608813Z, point-in-time ts: 1660978800] <ts: 1662130755>
2022-09-02T14:59:15Z I [configReplSet/10.80.11.176:27029] got epoch {1662130640 44}
2022-09-02T14:59:15Z I [configReplSet/10.80.11.176:27029] [pitrestore/2022-08-20T07:00:00Z] recovery started
2022-09-02T14:59:15Z D [configReplSet/10.80.11.176:27029] [pitrestore/2022-08-20T07:00:00Z] waiting for ‘starting’ status
2022-09-02T14:59:15Z I [shard1ReplSet/10.80.11.176:27028] got command pitrestore [name: 2022-09-02T14:59:15.737608813Z, point-in-time ts: 1660978800] <ts: 1662130755>
2022-09-02T14:59:15Z I [shard1ReplSet/10.80.11.176:27028] got epoch {1662130640 44}
2022-09-02T14:59:15Z I [shard1ReplSet/10.80.11.176:27028] [pitrestore/2022-08-20T07:00:00Z] recovery started
2022-09-02T14:59:15Z D [shard1ReplSet/10.80.11.176:27028] [pitrestore/2022-08-20T07:00:00Z] waiting for ‘starting’ status
2022-09-02T14:59:16Z I [shard3ReplSet/10.80.11.213:27028] got command pitrestore [name: 2022-09-02T14:59:15.737608813Z, point-in-time ts: 1660978800] <ts: 1662130755>
2022-09-02T14:59:16Z I [shard3ReplSet/10.80.11.213:27028] got epoch {1662130640 44}
2022-09-02T14:59:16Z I [shard3ReplSet/10.80.11.213:27028] [pitrestore/2022-08-20T07:00:00Z] recovery started
2022-09-02T14:59:16Z D [shard3ReplSet/10.80.11.213:27028] [pitrestore/2022-08-20T07:00:00Z] waiting for ‘starting’ status
2022-09-02T14:59:16Z I [configReplSet/10.80.11.176:27029] [pitrestore/2022-08-20T07:00:00Z] moving to state running
2022-09-02T14:59:16Z E [shard1ReplSet/10.80.11.176:27028] [pitrestore/2022-08-20T07:00:00Z] restore: no chunks found
2022-09-02T14:59:17Z I [shard2ReplSet/10.80.11.55:27028] got command pitrestore [name: 2022-09-02T14:59:15.737608813Z, point-in-time ts: 1660978800] <ts: 1662130755>
2022-09-02T14:59:17Z I [shard2ReplSet/10.80.11.55:27028] got epoch {1662130640 44}
2022-09-02T14:59:17Z I [shard2ReplSet/10.80.11.55:27028] [pitrestore/2022-08-20T07:00:00Z] recovery started
2022-09-02T14:59:17Z D [shard2ReplSet/10.80.11.55:27028] [pitrestore/2022-08-20T07:00:00Z] waiting for ‘starting’ status
2022-09-02T14:59:17Z I [shard3ReplSet/10.80.11.213:27028] [pitrestore/2022-08-20T07:00:00Z] moving to state running
2022-09-02T14:59:17Z E [configReplSet/10.80.11.176:27029] [pitrestore/2022-08-20T07:00:00Z] restore: check cluster for restore started: convergeClusterWithTimeout: restore on the shard shard1ReplSet failed with: no chunks found
2022-09-02T14:59:18Z I [shard2ReplSet/10.80.11.55:27028] [pitrestore/2022-08-20T07:00:00Z] moving to state running
2022-09-02T14:59:18Z E [shard3ReplSet/10.80.11.213:27028] [pitrestore/2022-08-20T07:00:00Z] restore: waiting for running: cluster failed: check cluster for restore started: convergeClusterWithTimeout: restore on the shard shard1ReplSet failed with: no chunks found
2022-09-02T14:59:19Z E [shard2ReplSet/10.80.11.55:27028] [pitrestore/2022-08-20T07:00:00Z] restore: waiting for running: cluster failed: waiting for running: cluster failed: check cluster for restore started: convergeClusterWithTimeout: restore on the shard shard1ReplSet failed with: no chunks found

Dmytro_Zghoba · September 6, 2022, 10:40am

@aranjith0
there are enough oplog chunks for the PITR for “configReplSet”, “shard2ReplSet”, “shard3ReplSet”.
I don’t see oplog chunks for “shard1ReplSet” but it’s ok for restore anyway. pbm should skip restore on the shard.
however, it could be that the “2022-08-29T05:30:01Z” backup contains data for the “shard1ReplSet”. in this way, pbm does not skip restore and tries to find oplog chunks for the shard and therefore fails.

could you please send result of
mongosh "$PBM_MONGODB_URI" --quiet --eval 'db.pbmBackups.findOne({ name: "2022-08-29T05:30:01Z" }) ?

[or mongosh "$PBM_MONGODB_URI" --quiet --eval 'db.pbmBackups.findOne({ name: "2022-08-29T05:30:01Z" }).replsets.map(({ name }) => name)

aranjith0 · September 7, 2022, 9:34am

No result. it just enter to a terminal.

Dmytro_Zghoba · September 7, 2022, 9:41am

sorry, use the following:
mongosh "$PBM_MONGODB_URI" --quiet --eval 'db.getSiblingDB("admin").pbmBackups.findOne({ name: "2022-08-29T05:30:01Z" })

aranjith0 · September 7, 2022, 9:59am

Same thing.

Dmytro_Zghoba · September 9, 2022, 11:37am

It will take long
I’ll explain, what I’m trying to do.

when you run pbm restore --time $TS, it lookups for the most recent full backup, restores it, and then replays oplog (pitr chunks) up to the $TS.

before doing the backup restore, it checks target cluster compatibility with backup topology. fails when it’s incompatible. if the backup does not has a snapshot for its shard, it skips, otherwise, restores the shard and replays oplog for the shard.

I think you have a backup that contains a snapshot for shard1ReplSet but does not have PITR chunks for the shard (no idea yet how it might happen).

I’m trying to confirm this.

could you please take a look at your most recent backup and check replsets?

Topic		Replies	Views
Error Percona Restore Percona Backup for MongoDB	14	2079	April 15, 2021
Unexpected backup failure - convergeCluster lost shard? Percona Backup for MongoDB	12	2353	July 26, 2021
PBM Restore failed Percona Backup for MongoDB	4	926	May 18, 2023
Mongodb PIT backup and Restore Percona Server for MongoDB mongodb	1	502	March 25, 2021
Percona backup: unable to do PITR restore Percona Backup for MongoDB percona	3	1430	June 8, 2021

PITR restore a backup to a new cluster

Related topics