PBM 2.12 on MongoDB 8.0 Sharded Cluster – “storage is not initialized” + “get epoch: lookup: element not found”

I have a sharded MongoDB 8.0 cluster with 2 mongos + 3 config nodes and 2 shard replica sets (3 nodes each).
PBM 2.12 is installed on every mongod node and each PBM agent connects to its local mongod using the correct URI.

I applied PBM storage config separately to config RS, shard1 RS, and shard2 RS.
pbm config is correct everywhere - filesystem storage (also tried with S3 compatible Blackblaze storage) is initialized and identical on all replica sets.

But when I run a backup through mongos:

pbm backup ...

I always get:

get backup metadata: context deadline exceeded

And in every PBM agent log (config + shards) I repeatedly see:

storage is not initialized
get epoch: lookup: element not found

Cluster is healthy, agents authenticate fine, all nodes are reachable, and nothing is blocked.

I can’t understand why PBM thinks “storage is not initialized” when the storage config is already present on all replica sets.

I need help understanding why PBM cannot create the epoch or initialize storage metadata on MongoDB 8.0.

That’s the issue. Any leads will be appreciated!
Thank you for your time.

Hi and welcome to our forum!

PBM requires a shared storage (all nodes need to see the same volume or bucket) please check the steps to add a remote storage for more details.

Thank you for your response.@radoslaw.szulgo

The storage prefix issue is now resolved, and PBM storage is successfully initialized on the remote backend. However, the backup is still failing due to an authorization error that occurs only on the Config Server PRIMARY.

From the Config Primary PBM agent logs, we consistently see the following error during every backup attempt:

I got command backup [name: 2025-12-09T21:35:22Z, compression: s2 (level: default)] <ts: 1765316122>, opid: 6938961a63b09adf1c82e4e4
 I got epoch {1765315340 7}
 [backup/2025-12-09T21:44:39Z] get balancer status: run mongo command: (Unauthorized) not authorized on admin to execute command { _configsvrBalancerStatus: 1, lsid: { id: UUID(**") }, $clusterTime: { clusterTime: Timestamp(1765316680, 15), signature: { hash: BinData(0, **), keyId: *** } }, $db: "admin" }

This error appears only on the Config Server Primary.
On the Shard replica sets (Shard1 and Shard2), PBM agents authenticate and operate without any authorization errors. The original “storage is not initialized” errors were coming from other nodes earlier and are now resolved after fixing the prefix and re-initializing storage.

The MongoDB user used by PBM is defined as follows:

{
  _id: 'admin.**',
  userId: **
  user: '**',
  db: 'admin',
  roles: [ { role: 'root', db: 'admin' } ],
  mechanisms: [ 'SCRAM-SHA-1', 'SCRAM-SHA-256' ]
}

Despite the user having the root role on the admin database, PBM is still unable to execute _configsvrBalancerStatus on the Config Server Primary, which causes the backup to fail with:

get backup metadata: context deadline exceeded

Other all nodes logs:

image

Cluster Topology:

  • 2 × mongos

  • 3 × config server nodes (config replica set)

  • 2 × shards (each a replica set)

  • 1 × HAProxy

Environment:

  • PBM version: 2.12

  • MongoDB version: 8.0 (sharded cluster)

At this point:

  • Storage is fully initialized

  • PBM agents authenticate successfully on all nodes

  • Shard nodes work without auth issues

  • Only the Config Server Primary throws the unauthorized error for _configsvrBalancerStatus, which blocks the backup from progressing

Could you please advise:

  • Whether _configsvrBalancerStatus requires any additional internal privilege on MongoDB 8.0 config servers, and

  • If any new role or privilege is required beyond root for PBM to function correctly in this version?

Thank you very much for your time and support.

Hi, the root role is not enough please check the documentation for creating the pbmAnyAction role Configure authentication in MongoDB - Percona Backup for MongoDB

Hi Team,

Thanks for all the support so far, our PBM setup is now working, storage is initialized, and all agents authenticate correctly. I just need clarification on a few behaviors we are seeing in our MongoDB 8.0 sharded cluster (2 mongos, 3 config nodes, 2 shards).


1. Backups only succeed when run from Config Server Primary

When I run:

pbm backup --mongodb-uri=configPrimary

the backup succeeds on all replica sets (config + shard1 + shard2).

But when I run:

pbm backup --mongodb-uri=mongos

it always fails with:

image

incompatible: Backup has no data for the config server or sole replicaset

Question:
Is it expected that PBM backups/restores must be initiated from a config primary or shard primary, and should not be run through mongos?


2. Restore performance seems very slow

Even restoring a small backup (~6 GB) takes close to 1 hour.
For our real dataset (~200 GB), PBM logical restore may take many hours.

Questions:

  • Are these restore times expected for PBM logical restores on sharded clusters?

  • Does PBM restore collections and indexes in parallel, or is it single-threaded per node?

  • Any recommended tuning to speed up restore?

  • For ~200GB clusters, is PBM logical backup still recommended for DR, or should we consider other approaches?


3. What is the recommended best practice?

Before finalizing our DR runbook, I want to confirm:

  • Should all PBM commands (backup/restore) always be run from config or shard primaries?

  • Is mongos officially unsupported for initiating PBM operations?

  • What is the realistic expected restore time for 200GB logical data?

  • Any Percona recommendations for faster DR on large sharded clusters?

4. Best Practices for Large Clusters (200GB)

  • For large sharded clusters, does Percona still recommend PBM logical backups for DR?

  • Or should we consider:

    • mongodump/mongorestore with filters,

    • or a hybrid approach?


5. Partial Restore / Last N Weeks

If I want to restore only the last 10 weeks of data:

  • Does PBM support any form of partial/filtered restore?

  • Or is the recommended approach to use mongodump with --query per collection for partial restores?

  • Is it safe to mix PBM (full cluster backup) with filtered mongodump during DR workflows?


Everything else is working fine (storage, roles, prefix, agent auth).
I just want to confirm whether this behavior is expected and understand the best strategy for scalable backups/restores.

Thank you!

Hi, let me see if I can answer your questions:

  1. Actually you can run backup from any host that has PBM client installed even if it’s not part of the cluster. The thing to keep in mind is that in a sharded cluster, you have to specify the mongodb-uri of the config server replica set.

  2. Restore speed depends on many factors but you can play with parallelism and number of download threads among other things

  3. I answered that mostly in question 1). For a reference of expected times you can check this post.

  4. For large datasets we recommend either physical or snapshot backups with PBM

  5. Selective backup/restores are supported but we don’t have the option to do a custom query. Feel free to open a FR for that in our JIRA.

Hope that helps, our consulting team is available in case you need help implementing any of this.

Hi Ivan,

Thanks again to you and the team for all the support — really appreciate it.

Just to confirm my understanding on a couple of points:

  1. For sharded clusters, while backups can be triggered from any host with PBM installed, both backup and restore operations should use the config server replica set URI (rather than mongos or any other). Is that correct?

  2. In our testing on a sharded cluster, restoring ~6 GB of data with mongorestore completed in ~10 minutes on a smaller (4 GB RAM) server.
    The same dataset restored using PBM logical restore took close to ~1 hour on similar resources, but completed much faster (around a few minutes) when tested on a higher-memory server (~16 GB RAM).

    Is this difference in restore time expected, given PBM’s additional coordination and consistency guarantees for sharded clusters, and its higher resource requirements compared to mongorestore?

Thanks again, this really helps us set the right expectations and sizing guidelines for our DR runbook.

Hi, you are welcome.

  1. Yes
  2. mongorestore and pbm should perform roughly similar if configured the same. You can try configuring pbm with 4 parallel workers and 1 worker per collection to mimic mongorestore on a small server.