[mongod@ip-10-80-11-213 ~]$ pbm config --force-resync
Storage resync started
[mongod@ip-10-80-11-213 ~]$ 2022-09-06T08:48:00.000+0000 I got command resync <ts: 1662454080>
2022-09-06T08:48:00.000+0000 I got epoch {1662130640 44}
2022-09-06T08:48:00.000+0000 I [resync] not a member of the leader rs
@ Andrew_Pogrebnoi, Here is the output of pbm logs -sD -t0 after the failed restore:
2022-09-02T14:57:19Z I [configReplSet/10.80.11.176:27029] got command resync <ts: 1662130639>
2022-09-02T14:57:19Z I [configReplSet/10.80.11.176:27029] got epoch {1662130637 3}
2022-09-02T14:57:19Z I [configReplSet/10.80.11.176:27029] [resync] started
2022-09-02T14:57:19Z D [configReplSet/10.80.11.176:27029] [resync] got physical restores list: 0
2022-09-02T14:57:19Z D [configReplSet/10.80.11.176:27029] [resync] got backups list: 2
2022-09-02T14:57:19Z D [configReplSet/10.80.11.176:27029] [resync] bcp: 2022-08-17T14:09:26Z.pbm.json
2022-09-02T14:57:19Z D [configReplSet/10.80.11.176:27029] [resync] bcp: 2022-08-29T05:30:01Z.pbm.json
2022-09-02T14:57:19Z I [shard1ReplSet/10.80.11.176:27028] got command resync <ts: 1662130639>
2022-09-02T14:57:19Z I [shard1ReplSet/10.80.11.176:27028] got epoch {1662130637 3}
2022-09-02T14:57:19Z I [shard1ReplSet/10.80.11.176:27028] [resync] not a member of the leader rs
2022-09-02T14:57:20Z I [shard3ReplSet/10.80.11.213:27028] got command resync <ts: 1662130639>
2022-09-02T14:57:20Z I [shard3ReplSet/10.80.11.213:27028] got epoch {1662130637 3}
2022-09-02T14:57:20Z I [shard3ReplSet/10.80.11.213:27028] [resync] not a member of the leader rs
2022-09-02T14:57:20Z I [configReplSet/10.80.11.176:27029] [resync] succeed
2022-09-02T14:57:20Z D [configReplSet/10.80.11.176:27029] [resync] epoch set to {1662130640 44}
2022-09-02T14:57:21Z I [shard2ReplSet/10.80.11.55:27028] got command resync <ts: 1662130639>
2022-09-02T14:57:21Z I [shard2ReplSet/10.80.11.55:27028] got epoch {1662130637 3}
2022-09-02T14:57:21Z I [shard2ReplSet/10.80.11.55:27028] [resync] not a member of the leader rs
2022-09-02T14:58:18Z I [configReplSet/10.80.11.176:27029] got command pitrestore [name: 2022-09-02T14:58:19.091570724Z, point-in-time ts: 1660892400] <ts: 1662130699>
2022-09-02T14:58:18Z I [configReplSet/10.80.11.176:27029] got epoch {1662130640 44}
2022-09-02T14:58:18Z I [configReplSet/10.80.11.176:27029] [pitrestore/2022-08-19T07:00:00Z] recovery started
2022-09-02T14:58:18Z D [configReplSet/10.80.11.176:27029] [pitrestore/2022-08-19T07:00:00Z] waiting for âstartingâ status
2022-09-02T14:58:18Z I [shard1ReplSet/10.80.11.176:27028] got command pitrestore [name: 2022-09-02T14:58:19.091570724Z, point-in-time ts: 1660892400] <ts: 1662130699>
2022-09-02T14:58:18Z I [shard1ReplSet/10.80.11.176:27028] got epoch {1662130640 44}
2022-09-02T14:58:18Z I [shard1ReplSet/10.80.11.176:27028] [pitrestore/2022-08-19T07:00:00Z] recovery started
2022-09-02T14:58:18Z D [shard1ReplSet/10.80.11.176:27028] [pitrestore/2022-08-19T07:00:00Z] waiting for âstartingâ status
2022-09-02T14:58:19Z I [shard3ReplSet/10.80.11.213:27028] got command pitrestore [name: 2022-09-02T14:58:19.091570724Z, point-in-time ts: 1660892400] <ts: 1662130699>
2022-09-02T14:58:19Z I [shard3ReplSet/10.80.11.213:27028] got epoch {1662130640 44}
2022-09-02T14:58:19Z I [shard3ReplSet/10.80.11.213:27028] [pitrestore/2022-08-19T07:00:00Z] recovery started
2022-09-02T14:58:19Z D [shard3ReplSet/10.80.11.213:27028] [pitrestore/2022-08-19T07:00:00Z] waiting for âstartingâ status
2022-09-02T14:58:19Z I [configReplSet/10.80.11.176:27029] [pitrestore/2022-08-19T07:00:00Z] moving to state running
2022-09-02T14:58:19Z E [shard1ReplSet/10.80.11.176:27028] [pitrestore/2022-08-19T07:00:00Z] restore: no chunks found
2022-09-02T14:58:20Z I [shard2ReplSet/10.80.11.55:27028] got command pitrestore [name: 2022-09-02T14:58:19.091570724Z, point-in-time ts: 1660892400] <ts: 1662130699>
2022-09-02T14:58:20Z I [shard2ReplSet/10.80.11.55:27028] got epoch {1662130640 44}
2022-09-02T14:58:20Z I [shard2ReplSet/10.80.11.55:27028] [pitrestore/2022-08-19T07:00:00Z] recovery started
2022-09-02T14:58:20Z D [shard2ReplSet/10.80.11.55:27028] [pitrestore/2022-08-19T07:00:00Z] waiting for âstartingâ status
2022-09-02T14:58:20Z I [shard3ReplSet/10.80.11.213:27028] [pitrestore/2022-08-19T07:00:00Z] moving to state running
2022-09-02T14:58:20Z E [configReplSet/10.80.11.176:27029] [pitrestore/2022-08-19T07:00:00Z] restore: check cluster for restore started: convergeClusterWithTimeout: restore on the shard shard1ReplSet failed with: no chunks found
2022-09-02T14:58:21Z I [shard2ReplSet/10.80.11.55:27028] [pitrestore/2022-08-19T07:00:00Z] moving to state running
2022-09-02T14:58:21Z E [shard3ReplSet/10.80.11.213:27028] [pitrestore/2022-08-19T07:00:00Z] restore: waiting for running: cluster failed: check cluster for restore started: convergeClusterWithTimeout: restore on the shard shard1ReplSet failed with: no chunks found
2022-09-02T14:58:22Z E [shard2ReplSet/10.80.11.55:27028] [pitrestore/2022-08-19T07:00:00Z] restore: waiting for running: cluster failed: waiting for running: cluster failed: check cluster for restore started: convergeClusterWithTimeout: restore on the shard shard1ReplSet failed with: no chunks found
2022-09-02T14:58:50Z I [shard1ReplSet/10.80.11.176:27028] got command pitrestore [name: 2022-09-02T14:58:51.651018877Z, point-in-time ts: 1660806000] <ts: 1662130731>
2022-09-02T14:58:50Z I [shard1ReplSet/10.80.11.176:27028] got epoch {1662130640 44}
2022-09-02T14:58:50Z I [shard1ReplSet/10.80.11.176:27028] [pitrestore/2022-08-18T07:00:00Z] recovery started
2022-09-02T14:58:50Z D [shard1ReplSet/10.80.11.176:27028] [pitrestore/2022-08-18T07:00:00Z] waiting for âstartingâ status
2022-09-02T14:58:51Z I [configReplSet/10.80.11.176:27029] got command pitrestore [name: 2022-09-02T14:58:51.651018877Z, point-in-time ts: 1660806000] <ts: 1662130731>
2022-09-02T14:58:51Z I [configReplSet/10.80.11.176:27029] got epoch {1662130640 44}
2022-09-02T14:58:51Z I [configReplSet/10.80.11.176:27029] [pitrestore/2022-08-18T07:00:00Z] recovery started
2022-09-02T14:58:51Z D [configReplSet/10.80.11.176:27029] [pitrestore/2022-08-18T07:00:00Z] waiting for âstartingâ status
2022-09-02T14:58:51Z E [shard1ReplSet/10.80.11.176:27028] [pitrestore/2022-08-18T07:00:00Z] restore: no chunks found
2022-09-02T14:58:52Z I [shard3ReplSet/10.80.11.213:27028] got command pitrestore [name: 2022-09-02T14:58:51.651018877Z, point-in-time ts: 1660806000] <ts: 1662130731>
2022-09-02T14:58:52Z I [shard3ReplSet/10.80.11.213:27028] got epoch {1662130640 44}
2022-09-02T14:58:52Z I [shard3ReplSet/10.80.11.213:27028] [pitrestore/2022-08-18T07:00:00Z] recovery started
2022-09-02T14:58:52Z D [shard3ReplSet/10.80.11.213:27028] [pitrestore/2022-08-18T07:00:00Z] waiting for âstartingâ status
2022-09-02T14:58:52Z E [configReplSet/10.80.11.176:27029] [pitrestore/2022-08-18T07:00:00Z] restore: waiting for start: cluster failed: no chunks found
2022-09-02T14:58:53Z I [shard2ReplSet/10.80.11.55:27028] got command pitrestore [name: 2022-09-02T14:58:51.651018877Z, point-in-time ts: 1660806000] <ts: 1662130731>
2022-09-02T14:58:53Z I [shard2ReplSet/10.80.11.55:27028] got epoch {1662130640 44}
2022-09-02T14:58:53Z I [shard2ReplSet/10.80.11.55:27028] [pitrestore/2022-08-18T07:00:00Z] recovery started
2022-09-02T14:58:53Z D [shard2ReplSet/10.80.11.55:27028] [pitrestore/2022-08-18T07:00:00Z] waiting for âstartingâ status
2022-09-02T14:58:53Z E [shard3ReplSet/10.80.11.213:27028] [pitrestore/2022-08-18T07:00:00Z] restore: waiting for start: cluster failed: waiting for start: cluster failed: no chunks found
2022-09-02T14:58:54Z E [shard2ReplSet/10.80.11.55:27028] [pitrestore/2022-08-18T07:00:00Z] restore: waiting for start: cluster failed: waiting for start: cluster failed: waiting for start: cluster failed: no chunks found
2022-09-02T14:59:15Z I [configReplSet/10.80.11.176:27029] got command pitrestore [name: 2022-09-02T14:59:15.737608813Z, point-in-time ts: 1660978800] <ts: 1662130755>
2022-09-02T14:59:15Z I [configReplSet/10.80.11.176:27029] got epoch {1662130640 44}
2022-09-02T14:59:15Z I [configReplSet/10.80.11.176:27029] [pitrestore/2022-08-20T07:00:00Z] recovery started
2022-09-02T14:59:15Z D [configReplSet/10.80.11.176:27029] [pitrestore/2022-08-20T07:00:00Z] waiting for âstartingâ status
2022-09-02T14:59:15Z I [shard1ReplSet/10.80.11.176:27028] got command pitrestore [name: 2022-09-02T14:59:15.737608813Z, point-in-time ts: 1660978800] <ts: 1662130755>
2022-09-02T14:59:15Z I [shard1ReplSet/10.80.11.176:27028] got epoch {1662130640 44}
2022-09-02T14:59:15Z I [shard1ReplSet/10.80.11.176:27028] [pitrestore/2022-08-20T07:00:00Z] recovery started
2022-09-02T14:59:15Z D [shard1ReplSet/10.80.11.176:27028] [pitrestore/2022-08-20T07:00:00Z] waiting for âstartingâ status
2022-09-02T14:59:16Z I [shard3ReplSet/10.80.11.213:27028] got command pitrestore [name: 2022-09-02T14:59:15.737608813Z, point-in-time ts: 1660978800] <ts: 1662130755>
2022-09-02T14:59:16Z I [shard3ReplSet/10.80.11.213:27028] got epoch {1662130640 44}
2022-09-02T14:59:16Z I [shard3ReplSet/10.80.11.213:27028] [pitrestore/2022-08-20T07:00:00Z] recovery started
2022-09-02T14:59:16Z D [shard3ReplSet/10.80.11.213:27028] [pitrestore/2022-08-20T07:00:00Z] waiting for âstartingâ status
2022-09-02T14:59:16Z I [configReplSet/10.80.11.176:27029] [pitrestore/2022-08-20T07:00:00Z] moving to state running
2022-09-02T14:59:16Z E [shard1ReplSet/10.80.11.176:27028] [pitrestore/2022-08-20T07:00:00Z] restore: no chunks found
2022-09-02T14:59:17Z I [shard2ReplSet/10.80.11.55:27028] got command pitrestore [name: 2022-09-02T14:59:15.737608813Z, point-in-time ts: 1660978800] <ts: 1662130755>
2022-09-02T14:59:17Z I [shard2ReplSet/10.80.11.55:27028] got epoch {1662130640 44}
2022-09-02T14:59:17Z I [shard2ReplSet/10.80.11.55:27028] [pitrestore/2022-08-20T07:00:00Z] recovery started
2022-09-02T14:59:17Z D [shard2ReplSet/10.80.11.55:27028] [pitrestore/2022-08-20T07:00:00Z] waiting for âstartingâ status
2022-09-02T14:59:17Z I [shard3ReplSet/10.80.11.213:27028] [pitrestore/2022-08-20T07:00:00Z] moving to state running
2022-09-02T14:59:17Z E [configReplSet/10.80.11.176:27029] [pitrestore/2022-08-20T07:00:00Z] restore: check cluster for restore started: convergeClusterWithTimeout: restore on the shard shard1ReplSet failed with: no chunks found
2022-09-02T14:59:18Z I [shard2ReplSet/10.80.11.55:27028] [pitrestore/2022-08-20T07:00:00Z] moving to state running
2022-09-02T14:59:18Z E [shard3ReplSet/10.80.11.213:27028] [pitrestore/2022-08-20T07:00:00Z] restore: waiting for running: cluster failed: check cluster for restore started: convergeClusterWithTimeout: restore on the shard shard1ReplSet failed with: no chunks found
2022-09-02T14:59:19Z E [shard2ReplSet/10.80.11.55:27028] [pitrestore/2022-08-20T07:00:00Z] restore: waiting for running: cluster failed: waiting for running: cluster failed: check cluster for restore started: convergeClusterWithTimeout: restore on the shard shard1ReplSet failed with: no chunks found
@aranjith0
there are enough oplog chunks for the PITR for âconfigReplSetâ, âshard2ReplSetâ, âshard3ReplSetâ.
I donât see oplog chunks for âshard1ReplSetâ but itâs ok for restore anyway. pbm should skip restore on the shard.
however, it could be that the â2022-08-29T05:30:01Zâ backup contains data for the âshard1ReplSetâ. in this way, pbm does not skip restore and tries to find oplog chunks for the shard and therefore fails.
could you please send result of mongosh "$PBM_MONGODB_URI" --quiet --eval 'db.pbmBackups.findOne({ name: "2022-08-29T05:30:01Z" }) ?
It will take long
Iâll explain, what Iâm trying to do.
when you run pbm restore --time $TS, it lookups for the most recent full backup, restores it, and then replays oplog (pitr chunks) up to the $TS.
before doing the backup restore, it checks target cluster compatibility with backup topology. fails when itâs incompatible. if the backup does not has a snapshot for its shard, it skips, otherwise, restores the shard and replays oplog for the shard.
I think you have a backup that contains a snapshot for shard1ReplSet but does not have PITR chunks for the shard (no idea yet how it might happen).
Iâm trying to confirm this.
could you please take a look at your most recent backup and check replsets?