Question: oplogOnly removal in PBM 3.0 ? Is that for real or joke?

Hello there,

suddenly I noticed following ticket: Jira

where it’s said:

Solution proposal

  • We agreed to deprecate oplogOnly . To remove that in PBM 3.0

  • This also needs to be deprecated on the operator side

And it shocked me on the first sight. I hope that I misunderstood that proposal, that oplogOnly will be deprecated. Our backup strategy relies on PITR external-snapshot based backups:
a) our script uses PBM to open / close backup cursor at the time when we capture SSD state to a full LVM snapshot
b) independently we slice oplog to chunks each minute using PBM agents and oplogOnly.

in a restore, we have custom script built to orchestrate all the stuff. PBM is used only when:
a) doing –force-resync after oplogs are prepared from backup tapes archive for recovery
b) running oplog-replay to ApllyOps

That way we archieve enterprise level consistent and fast backup / restore strategy.

However THAT would not be possible when you disallow / deprecate oplogOnly option as because this will be a real PAIN switching from external snapshot-based backups to physical (expensive, slow, … done on higher level than external based LVM snapshot which we can do even on 20 TB instance within a second !).

We were very glad once you introduced external snapshot-based backups as it’s very logical & the best possible approach regardless of DB size.. (logical, physicals… add more complexity, more expensive, slower,.etc).

Can you confirm that you really plan to deprecate that option ?

what will then happen with external based backups ? Will you deprecate these too ?
My apologies but I’m really worried as this would be potentialy real downgrade, so I hope I just misunderstood.

Thanks in advance for any response,
Martin

Hello Martin,

Thank you for reaching out and sharing your concerns so transparently. As the Product Manager for Percona Backup for MongoDB (PBM), I want to address your anxiety immediately: we do not intend to leave users with high-volume environments like yours behind. I completely understand why seeing that Jira ticket was a shock, especially given the sophistication of your current workflow. Your setup—leveraging LVM snapshots for instant capture of 20 TB datasets combined with PBM for oplog slicing—is exactly the kind of “enterprise-grade” engineering we want to enable, not hinder.

To give you some context, the proposal to deprecate oplogOnly in PBM 3.0 stems from our initiative to streamline PBM’s architecture. Historically, oplogOnly has been somewhat of a detached feature, and our goal is to bring a fully integrated Physical Backup experience (using PSMDB backupCursor capability) that handles the consistency, file copying, and metadata management natively, without requiring the complex external scripting you currently maintain. However, your feedback highlights a critical gap. We recognize that “standard” physical backups (copying data files to object storage) cannot currently compete with the speed of a storage-level LVM snapshot (taking 1 second for 20 TB).

Help Us Close the Gap

Before we finalize any deprecation plans, I need to understand exactly where our native PBM Physical Backups fall short compared to your custom LVM + oplogOnly strategy.

If we were to ask you to switch to PBM’s native physical backup and restore, what are the specific blockers?

  • Snapshot Speed: Is the primary blocker strictly the “time to capture”? (i.e., Waiting for PBM to stream 20 TB to S3 vs. the instant LVM freeze).

  • Incremental Capabilities: PBM supports Incremental Physical Backups (uploading only changed data blocks since the last backup), would that satisfy your requirements, or is the local LVM snapshot still superior for your RTO (Recovery Time Objective)?

  • Storage API Integration: Do you need PBM to natively orchestrate the storage-level snapshot (calling the LVM/EBS/GCP snapshot API itself) rather than streaming files?

  • Restoration Flow: You mentioned a custom restore script. Is there specific logic in your restore process that PBM’s pbm restore command is currently missing?

We are currently in the planning phase for PBM 3.0. Your input is incredibly timely. If native physical backups are too slow or expensive for your scale, we need to know so we can either:

  1. Retain the hooks (like oplogOnly) that allow for external snapshot integrations.

  2. Build native support for “External Snapshots” so PBM manages the LVM/Storage calls directly, removing the need for your custom scripts.

We want to make your life easier, not harder.


Would you be open to sharing a sanitized version of your backup/restore timings or a brief call to discuss your architecture? This would help us write the requirements to ensure PBM 3.0 supports the “20 TB in 1 second” standard you’ve set.

Hello Radoslaw,

Thanks for fast response I really appreciate it.

First of all I’ll describe our situtation and why we decided to use PBM.

Due to having enterprise HW architecture & decided not to rely on mongodb enterprise

pricing. Our decision was to keep all dbs without shards (less complexity and hardware requirements - operational costs reduced not even from the HW point of view, but the backup (ensuring consistency - RTO, RPO to be the best possible, operations management, etc.) we can’t imagine right now to stick to sharded cluster solution. HA is maintained by the HW architecture under the OS level - in case of disaster on one geolocation, we are able to immediately switch the mongodb runtime. Following this reasoning to have under 20tb instances in operation without need to operate more membered replica-sets or sharded clusters, we were running single-node replica-sets from the beginning.

Previously our business critical databases were running in the mentioned setup on mongodb community. We switched to percona to address missing enterprise requirements mainly related to LDAP, audit logging, etc. Our backup however that time was relying on the following:

SSD array based LVM snapshots + oplog chunks dumped with mongodump (with oplog.rs query) orchestrated by (now deprecated) custom script.
and fortunately / unfortunately we had a disaster where we found out the Horribleness and amateurishness of mongodb available community solutions.. I can speak on issues with this approach for infinitely long but mainly mongodump is not effectively utilising tailable cursor over oplog.rs (index-less coll) cheaply, relyies on query, resulting file is oplog.rs and can’t be renamed or concatenated together with other chunks, long interval gaps between oplog increments … and many many other little "easter eggs” we noticed only at the time when something bad happened.
The worst of all is the mongorestore:

  • when you have a collection with TTL index (mongodb’s thread once DB is started from recovered snapshot from X time ago.. will delete docs which should be expired at the current point in time), mongorestore’s oplog replay will fail when an update operation should happen on expired document
  • mongorestore oplog-replay can be told only what time to stop, not from which epoch timestamp to start applying ( I honestly don’t understand who designed these tools and how he was able to get his job in the first place)
  • mongorestore oplog-replay consumes precious time logging into STDOUT only irrelevant madness, missing docs ? no.. attempts to apply operations on system sessions collections
  • and many many other bottlenecks..

Afterwards as part of lessons learned we realised, we can’t rely on mongodb tools as these are absolutely useless for any production with business critical data. So the plan was:

  • program our own tool which will utilise tailable cursor and generate oplog chunks much more effectively (and more often) than mongodump
  • write a custom program / script which will address basically all the madness present in mongorestore: skipping irrelevant ops for system sessions and similar not-relevant stuff in applyOps, ignore / skip TTL indexed colls to prevent recovery from failing, being able to START from exact endPoint epoch (after wiredtiger’s recovery) until the desired last required epoch (with operation time precision - epoch T + increment I) ..

You guys described how PITR should look like in this video:

And then I looked into a PBM’s code on github and noticed, you made actually enterprise-ready working solid solution addressing most of what we wanted in the exact (or near similar) way how we thought to implement it + it’s open source, it also utilises backupCursor (for the most paranoid ones like us… we love to open it and close as an consistency insurance.. worth to mention that our expensive SSD architecture does second precise snapshots real fast and probavbly it’s not that needed to use it) and you basically built it right.

Now to our current setup:

  • we can’t use data-file copies. For our single node replica-sets and our premium SSD infrastructure we are good to go with custom backup shell script which only opens backup cursor, calls API of SSD array manager to create a snapshot, close backup cursor and save metadata (apart from PBM metadata in colls we create our own in which we map snapshots to their respective names in PBM)
  • pbm agents are workers with only one simple purpose.. independently of the above mentioned snapshots to always capture and generate 1 minute oplog chunks - later to be saved on backup archive tapes.

physical restores: - more complexity. for us there’s nothing more simple & reliable than the above mentioned way. your code is fine but in that case to do incrementals, physicals,… we are driving into completely another sea of possible failures, bottlenecks, bugs etc.

But the main reason is that it’s heavy on network adapter, SSD reads, … I think I don’t need to expand on these three dots as you are good engineers and understand, that nothing above actual hardware-based layer underneeth the operating system can match SSD snashots in speeds, actualy Zero - impact on production (nobody notices even 1 % spike in I/O, CPU, Mem..), it takes even less then 1-second to make a snapshot (fast as my iphone’s camera during the daylight) , and backup cursor is not opened for long so journals can be safely commited to data-files with almost zero risks.. very fast, and very cheap.

If you deprecate oplog-only.. we will basically be forced to program our own solution which handles all of that lightweight of the unwanted code-base (for physicals, logicals we actually dont need).

We would therefore be very glad if you don’t deprecate this option. This actually provides more flexibility and makes PBM really different than what MongoDB database-tools are not able to provide.

And one more thing if I might suggest.. I also documented our observation on oplog-replay (during reconciliation tests to validate snaps + oplogs consistency in our recovery strategy) the current issue with how PBM ensures consistency / precision in regards to timestamps.

From our point of view it would be not just “nice to have” but inherently logical, to treat oplog-replay surgically. By this I mean that –start / –end should not rely on seconds and truncate ‘increment’ positions. In our cases with oplog-heavy instances, there might be thousands of operations in the single exact second (epoch timestamp). Hence we observed the inconsistent result due to not being able to tell the oplog-replay at which exact operation-position to stop, while processing oplog chunks.
Oracle ensures that level of consistency by using it’s own sequencing mechanism for all operations. Unfortunately mongo operates in unix epoch + increments..