PITR restore performance

A follow up from Restore on a local docker deployment - #7 by Adraen I’m opening up a new thread because that’s a different discussion (hope that’s okay)

So I have a large-ish DB (~20GB compressed) that heavily uses timeseries DB. I’ve enabled daily logical snapshot and PITR.

However I’m running into absolutely terrible restore performance for PITR, my computer is mostly doing nothing while the oplog restore is crawling. I’m doing everything locally to remove as many bottlenecks as possible (no S3, no network, no encryption)

I got codex to give me some metrics on the restore

  • the snapshot recovery took 9 minutes and 9 seconds (that’s nice and fast)
  • the oplog replay is dead slow, It’s about 1.2x realtime, to replay 18seconds I need 15 seconds, so current estimated restore time is 17hours for 20 hours of PITR slices

My host machine is really not saturated:

  • Mongo CPU: ~7-9%
    PBM CPU: ~3-4%
    RAM available: ~23 GiB
    Swap: full but not actively swapping
    I/O wait: ~6%

Codex’s conclusion is

MongoDB logs show the main delay is write latency, not compute. During restore, Mongo logged slow time-series bucket writes using majority write concern, with
waitForWriteConcernDurationMillis commonly around 180-408ms. The workload is applying many small applyOps operations against time-series bucket collections such as
*.system.buckets.data.

The restore is therefore latency-bound and replay-order-bound. PBM/Mongo are not able to use all CPU cores or disk bandwidth during oplog replay, so aggregate system
metrics look quiet even though wall-clock progress is slow.

Additional contributing factors:

  • Time-series bucket updates cause write amplification.
  • WiredTiger is checkpointing regularly, which is normal but adds background write work.
  • The oplog cap maintainer is trimming a ~9 GB oplog, though logs do not show this as a major delay.
  • MongoDB Compass was running full collection scans/counts during the restore; this has now been closed.

PBM tuning options are limited for this phase. --num-parallel-collections and --num-insertion-workers-per-collection can help snapshot import, but the snapshot phase
already completed quickly. PBM does not expose an obvious oplog replay parallelism knob in version 2.12.0.


From this it looks like the issue might simply be the latency of each op, although what I find odd is that I’ve used mongorestore --replayOp and it was much faster than this I believe (my oplog is long enough that I might A/B test PBM and mongorestore in the weekend)

Am I missing something by any chance? I doubt eveybody is having such poor restore performances as it makes it pretty much unusable.