Controlling mongod shutdown during PBM incremental restore

Hello everyone,

I’m working with Percona Backup for MongoDB (PBM) on a Percona Server for MongoDB (PSMDB) cluster and I’d like to use incremental backups and restores. In one of the forum threads I saw that the restore process triggers a full shutdown of all mongod instances in order to apply the backups:
https://forums.percona.com/t/cannot-restore-from-incremental-backup-pbm-and-psmdb-in-the-same-pod/28041/2?u=nir-elk

My questions are:

  1. Can I control the shutdown behavior?
    For example, is it possible to perform a staggered shutdown of my replica set members—shutting down one secondary at a time, restoring it, and then moving on—instead of taking down the entire cluster at once?
  2. Is there a way to restore without shutting down any mongod instances?
    Or to minimize downtime by keeping parts of the cluster online while another part is being restored?

Any advice on PBM configuration options, flags, or best practices to achieve this would be greatly appreciated. Thank you in advance for your help!

— Nir

Hi, a staggered shutdown would be a bad idea. During restore you need to prevent any writes to the cluster for consistency reasons.
What I suggest you do in that scenario is simply restore to a brand new cluster: Restore from a backup into a new environment - Percona Backup for MongoDB

Hi Ivan,

Thanks for your response. I’d like to clarify a few points regarding your suggestions.
From my understanding, we only need the application to be in read-only mode. I’m able to configure the application accordingly.
My main goal is to restore incremental backups with minimal downtime and without extra cluster.

Additionally, could you please explain why you chose to perform incremental backups as physical rather than logical backups?

Best regards,
Nir

Hi @nir-elk

Thanks for your response. I’d like to clarify a few points regarding your suggestions.
From my understanding, we only need the application to be in read-only mode. I’m able to configure the application accordingly.

With regards to Incremental Restore, the complete shutdown is necessary because we are restoring physical files.
To move those files back, the entire cluster, whether it’s a Replica Set or Shard Cluster, needs to be halted; it’s not possible to restore files with the database running. You can find more details at the documentation.

My main goal is to restore incremental backups with minimal downtime and without extra cluster.

Let’s assume you have a simple Replica Set with 3 members, and workaround to restore your backup into 1 one of those 3 nodes.
In theory, that node first needs to be ejected from the Replica Set → receive the restore, which can take some time → Then, to re-sync the other 2 nodes, it requires wiping and triggering an initial sync on each, taking their time accordingly to restore → only after that will you have the ReplicaSet fully functional.
Not only it seem a longer restore process, but also a more error-prone, too.

Additionally, could you please explain why you chose to perform incremental backups as physical rather than logical backups?

It’s important to clarify that incremetal backups exists only for physical backups; From the documentation:

Incremental backups require a physical base backup as a reference.
So, you have a base backup + incr1 + incr2 + incr3..

  • In an incremental restore, It will use a base physical backup + incr until the point you want to restore.

For logical backups, you can use the PITR(Point in-time Recovery) feature, which will save slices of the Oplog after X given minutes*(default 10min)*.

  • During a logical restore, you will use a Full Logical backup + replay the oplog via PiTR until the time you want to restore.

Logical restore is usually slower than physical restore and adds some overhead on the replication, as logical data is restored on PRIMARY and then replicated to the SECONDARY nodes. But on the other hand, physical backup consumes more of your backup storage as files are stored with all the fragmentation and index structure. While, logical backup is only the data, without fragmentation or indexes.

The correct option always depends on your business requirements.

Best,
Jean.