PBM external snapshot restore: oplogTruncateAfterPoint, stableTimestamp clamp, and TTL handling

Hi Percona team,

I would like to kindly ask for clarification about the correct restore procedure for PBM external snapshots + PITR replay.

Environment:

PBM: 2.12.0
PSMDB: 8.0.4-2
Topology: single-node replica sets
Storage: WiredTiger
Backup model: PBM external backup metadata + LVM snapshot + PBM PITR chunks

Our workflow:

  1. Run PBM external backup.

  2. Take LVM snapshot.

  3. Store PBM metadata, including last_write_ts.

  4. Restore snapshot X-1 to an isolated host.

  5. Start temporary mongod.

  6. Insert:

db.getSiblingDB("local")["replset.oplogTruncateAfterPoint"].insertOne({
  _id: "oplogTruncateAfterPoint",
  oplogTruncateAfterPoint: Timestamp(lastWriteT, lastWriteI)
})
  1. Restart with:
--setParameter recoverFromOplogAsStandalone=true
--setParameter takeUnstableCheckpointOnShutdown=true
  1. Apply PBM PITR oplogs up to target timestamp.

  2. Compare against reference DB restored from later snapshot X.

I have two questions.

1. oplogTruncateAfterPoint vs stableTimestamp

PBM source appears to set oplogTruncateAfterPoint from restoreTS, usually LastWriteTS.

MongoDB recovery can then log something like:

The oplog truncation point is equal to or earlier than the stable timestamp,
so truncating after the stable timestamp instead

So MongoDB effectively clamps the requested truncate point to stableTimestamp if last_write_ts <= stableTimestamp.

Questions:

  • Is this expected and safe in PBM external snapshot restores?

  • Should our scripts explicitly calculate:

effectiveRestoreTS = max(PBM last_write_ts, WiredTiger stable/recoveryTimestamp)

before inserting oplogTruncateAfterPoint?

  • Or should we always use PBM metadata last_write_ts and let MongoDB clamp internally?

  • What timestamp should be treated as the real base endpoint for later PITR replay: PBM last_write_ts, MongoDB endPoint, or applyThroughOpTime from the recovery log?

This is important because for external snapshots PBM does not copy the files itself, so the actual snapshot files can sometimes have a WiredTiger stable timestamp later than PBM metadata last_write_ts.

2. TTL monitor during PITR replay

I could not find explicit TTL handling in PBM 2.12.0 restore / oplog replay source.

MongoDB Ops Manager restore docs start temporary restore mongod with:

--setParameter ttlMonitorEnabled=false

But PBM restore / replay code does not seem to pass this flag.

Questions:

  • During PBM PITR / oplog replay, is TTL monitor expected to be disabled manually?

  • In a single-node replica set, the replay target becomes primary. Can TTL deletes run during PBM replay?

  • If yes, is this considered safe?

  • Could TTL act as an extra writer outside the intended oplog replay stream?

  • Does PBM ignore or tolerate duplicate-key / missing-document errors during oplog replay in a way that could hide TTL-related divergence?

  • What is the supported recommendation for single-node replica sets where replaying on a secondary is not possible?

Our concern is that PITR replay should be deterministic: base snapshot + oplog stream should be the only source of changes. TTL monitor uses current wall-clock time, not the PITR target time, so it looks like a possible source of nondeterministic deletes during recovery.

We run single-node replica-sets.

Backups are made using low-level and fast LVM snapshots (PBM here is used to open / close backupCursor to have consistent backup + metadata. at the same time we are doing oplog slices each 1 minute independently (oplogOnly: true).
Physical backups are not suitable due to high load on PROD hosts and it takes significantly longer time to read & copy over network in comparison to LVM low-level tool which handles the snapshots.

That’s the reason why we are dependent on:

  • oplogOnly: true option (for oplogs creation independently - not reliant on physical backups)
  • handling restore process manually using custom scripts on top of PBM

we run reconciliation tests after each point in time recovery test to compare database which was recovered (snapshot + oplogs) vs reference database ( snapshot X-1). by comparing collection statistics.

Unfortunately there are mismatches even though I tried to reconstruct what to implement from PBM source code ( setting truncateAfterPoint to limit wiredTiger recovery.., using accurate ordinal epoch timestamp with increment,..).

Currently we don’t explicitly start mongo instance with TTL monitor disabled when applying oplogs.

and also mismatches happen on collections which dont have TTL indexes.

this is interesting case as we would expect databases being 1:1 and any logical mismatches in data sizes to be small (if any) or only on colls which have TTL indexes present.

We are running single-node replica-sets.

Thank you very much in advance.

Hi, thanks for taking the time to post this. First of all, for snapshot backups why are you doing manual work in steps 6 and 7? PBM should figure out the right way to trim the oplog and so on during restore. Did you see the instructions from Restore from a snapshot-based backup - Percona Backup for MongoDB which contain the necessary steps? I don’t see pbm restore-finish during your workflow which is mandatory.

Hello Ivan,

many thanks for your response. Interesting. I thought that in our case of snapshots + independent oplogs the preparation of instance before oplog replay must be done manually.

Do I understand it well that these steps from here:

should be performed before i manually trigger oplog-replay (from desired time to stop apply time) ?

also i would like to ask to my previous question regarding TTL indexes. Does PBM oplog-replay take into account TTL indexes interfering with state of database data ? might it lead to inconsistencies ?
We plan to deprecate all TTL indexes on colls with business related data and delete documents by separate external application mainly because of consistency concerns since TTLs are actively deleting documents during oplog application at recovery and we have doubts regarding consistency

Hi, that is correct. After restoring the snapshot you can also have pbm apply oplog for you by following Replay oplog from arbitrary start time - Percona Backup for MongoDB

Regarding TTL indexes, PBM does not do anything with the TTL thread, so it is possible you get inconsistencies in that regard. Typically this is not a problem (you just get some extra documents that eventually will be deleted) but if important for you I encourage you to open a ticket in jira.percona.com about it.