Hi,
I had 1 MongoDB instance with 3 replicase, I’d setup latest pbm-agent to that instance. (My MongonDB is running in VMs, self managed, community edition).
Below are versions in source MongoDB:
-
pbm-agent version - 2.12.0
-
pbm version - pbm cli - 2.12.0
-
mongod --version - db version v7.0.11-6
-
mongosh --version - 2.2.10
Target MongoDB where I want to restore backup has below config:
-
pbm-agent version - 2.12.0
-
pbm version - pbm cli - 2.12.0
-
mongod --version - db version v7.0.11-1
-
mongosh --version - 2.2.10
I ran full physical backup with PITR enabled in source, around 160 GB with GZIP compression with below config:
[vikas@XXXtest ~]$ pbm config
storage:
type: gcs
gcs:
bucket: XXXXX_bkp
prefix: pbm/stg_v2
credentials:
hmacAccessKey: '***'
hmacSecret: '***'
chunkSize: 10485760
retryer:
backoffInitial: 1s
backoffMax: 30s
backoffMultiplier: 2
maxAttempts: 5
chunkRetryDeadline: 32s
pitr:
enabled: false
compression: gzip
backup:
oplogSpanMin: 0
compression: gzip
restore: {}
pbm status command post backup completion says below:
pbm status
Cluster:
========
rs2:
- XXX03.asiaXXc.c.prod.internal:27017 [P]: pbm-agent [v2.12.0] OK
- XXX02.asiaXXb.c.prod.internal:27017 [S]: pbm-agent [v2.12.0] OK
- XXX01.asiaXXa.c.prod.internal:27017 [S]: pbm-agent [v2.12.0] OK
PITR incremental backup:
========================
Status [ON]
Running members: rs2/XXX01.asiaXXa.c.prod.internal:27017;
Currently running:
==================
(none)
Backups:
========
GCS gs://XXXX_stg_bkp/pbm/stg_v2
Snapshots:
2025-12-11T07:04:59Z 160.63GB <physical> success [restore_to_time: 2025-12-11T07:05:02]
PITR chunks [160.83MB]:
2025-12-11T07:05:03 - 2025-12-15T06:30:15
In my new MongoDB instance, I have configured storage with same GCP creds. Then I ran pbm config –force-resync' and then pbm status gave me same output as above.
Then I ran & tested restore with below 2 scenarios:
- pbm restore to exact physical snapshot name -
pbm restore 2025-12-11T07:04:59Z - pbm restore to PITR chunk range using –time -
pbm restore –time 2025-12-15T06:30:15
1st option, I was able to restore till 11th Dec, but then I tried to replay oplogs with command pbm oplog-replay –start=2025-12-11T07:05:03 –end=2025-12-15T06:30:15, it did not work.
Logs:
2025-12-11T06:41:35Z E [rs2/ZZZ03.asia-XXXc.c.prod.internal:27017] [agentCheckup] check storage connection: storage check failed with: file stat: list with parts for mw file stat op: fetching pbm file parts base for .pbm.init: get S3 object header: operation error S3: HeadObject, https response error StatusCode: 403, RequestID: , HostID: , api error Forbidden: Forbidden
...
Lot of logs with similar lines as above.
I used the same creds for backup, restore and it was working so definetely not permission issue.
My Bucket structure looks as below at location XXX_stg_bkp/pbm/stg_v2:
XXX_stg_bkp/pbm/stg_v2/pbmPitr/rs2/20251215 path has below content:
20251215000015-1.20251215001015-1.oplog.gz
Dec 15, 2025, 5:40:15 AM
20251215001015-1.20251215002015-1.oplog.gz
Dec 15, 2025, 5:50:15 AM
20251215002015-1.20251215003015-1.oplog.gz
Dec 15, 2025, 6:00:15 AM
20251215003015-1.20251215004015-1.oplog.gz
Dec 15, 2025, 6:10:15 AM
…
Tried different scenarios, like from 2nd - pbm restore to PITR chunk range using –time - pbm restore –time 2025-12-15T06:30:15, I got similar errors, with below error that seems new
2025-12-15T06:26:02Z W [rs2/XXX06-test.asiaXXXc.c.prod.internal:27017] [replay/2025-12-15T06:23:31.57268086Z] failed to download chunk 8388608-16777215
2025-12-15T06:26:04Z E [rs2/XXX06-test.asia-XXXc.c.prod.internal:27017] [replay/2025-12-15T06:23:31.57268086Z] oplog replay: reply oplog: replay chunk 1765436702.1765449737: apply oplog for chunk: applying an entry: op: {"Timestamp":{"T":1765440413,"I":5},"Term":10,"Hash":null,"Version":2,"Operation":"i","Namespace":"stg_common_acs.gatemappings","Object":[{"Key":"_id","Value":"693a7b9df9ee0cb4b05156cc"},{"Key":"createdBy","Value":"eventsync"},{"Key":"updatedBy","Value":"eventsync"},{"Key":"gateId","Value":"693a7797f9ee0cb4b0514fee"},{"Key":"gateGroupId","Value":"693969eac9049198065e6310"},{"Key":"createdDate","Value":"2025-12-11T08:06:53.331Z"},{"Key":"updatedDate","Value":"2025-12-11T08:06:53.331Z"}],"Query":[{"Key":"_id","Value":"693a7b9df9ee0cb4b05156cc"}],"UI":{"Subtype":4,"Data":"jvdFFGH0flMjMQ=="},"LSID":"SXxwsdwdwbizsP+KDqbJUFdWsASXUl/Zvdfdsf0sdwdwdwfHdfsdfoXPEA","TxnNumber":1194,"PrevOpTime":"HAAwfsdfdsfAAwdwddAABJ0AP//////////AA==","MultiOpType":null} | merr <nil>: applyOps: (DuplicateKey) E11000 duplicate key error collection: stg_common_acs.gatemappings index: gateId_1_gateGroupId_1 dup key: { gateId: ObjectId('693a7sdsdsdsdb4b0514fee'), gateGroupId: ObjectId('693961212126310') }
{"ts":1765779812,"s":3,"rs":"rs2","node":"XXXX05-test.asia-XXXb.c.prod.internal:27017","e":"replay","eobj":"2025-12-15T06:23:31.57268086Z","ep":{"T":1765779777,"I":1},"opid":"693fa9634dd170ef13363b31","msg":"time: 2025-12-11T07:05:03Z-2025-12-12T05:40:00Z"}
{"ts":1765779812,"s":3,"rs":"rs2","node":"XXX05-test.asiaXXXb.c.prod.internal:27017","e":"replay","eobj":"2025-12-15T06:23:31.57268086Z","ep":{"T":1765779777,"I":1},"opid":"693fa9634dd170ef13363b31","msg":"node in not suitable for restore"}
I have explored various options, scenarios, etc. No luck.
I can’t restore using PITR, either directly or first from base snapshot & then replaying oplogs.