Pbm oplog-replay skips entire endpoint second; --end timestamp behaves inconsistently (PBM 2.12.0) | failing reconciliation tests

Henryx · December 1, 2025, 8:47pm

Just for more clarification…

we do external snapshot based PITR recovery from consistent snapshot being “captured” while DB was in opened backup-cursor “backupReady” state. We utilise enterprise features of HDD hypervisor or something… which captures guaranteed consistent snapshot to the point in time.

afterwards when DBs (X-1 - Target db) and (X - Reference db for reconciliation) are started, endPoint timestamps resulting from WT recovery are captured and used in later phase for PITR oplog-replay recovery.

so we use this combination of external snapshot based backup + PITR oplogonly made 1 minute slices which are huge in size.

to speedup the oplog-replay process, I set storage.journal.commitIntervalMs to max value of 500 - it had practical impact on applyOps processing time, reducing it to more acceptable recovery time objective.

however as average size of chunks is in this particular case (oplog heavy business instance) 45-50 ~ MBs in size it still takes long time to process.

in final reconciliation test where I compare side by side metadata stats for all non-system mongodb colls (in non-system namespaces only) i saw logical mismatches in docs counts and data sizes which is by reconciliation test considered as unacceptable incomplete state.
by further analysing and decompressing xxx.snappy 1-min. oplog chunk we found the EXACT missing docs in the last epoch second $END timestamp. these were skipped.

If needed I can provide more details for this case.
I hope someone will notice this thread as because this issue is of concern when real productions depend on PBM utilities and expect times to be reliable.
If needed I can provide exact actual evidence for reproduction.

sidenote: right now we can cope with it by setting END time always + 1 second as an insurance, in order to always apply operations exactly to required endpoint second to the last exact operation matching the given point in time.. not sure whether this is actually the best practice but we need to have guaranteed operation-level consistency.

Thank you much in advance.
Thanks in advance

Topic		Replies	Views
Percona oplog-replay is not working after physical base snapshot restore was completed Percona Backup for MongoDB	3	276	December 30, 2025
PITR issue - oplog has insufficient range Percona Backup for MongoDB closed-no-reply , pbm	1	450	August 19, 2024
Streaming oplog: no starting point defined Percona Backup for MongoDB pbm	3	107	November 6, 2025
Error Percona Restore Percona Backup for MongoDB	14	2379	April 15, 2021
Steps to Backup only OPlog Slice Percona Backup for MongoDB mongodb	11	1963	February 1, 2023

Pbm oplog-replay skips entire endpoint second; --end timestamp behaves inconsistently (PBM 2.12.0) | failing reconciliation tests

Related topics