PBM Not Selecting Healthy Hidden/Delayed Node for Backup

frank.pan · July 21, 2025, 1:00pm

Hello Percona Community,

We are facing a persistent issue where Percona Backup for MongoDB (PBM) is not selecting our hidden, delayed secondary node for backups, even though it appears to be the highest-priority candidate. We have followed a detailed troubleshooting process but have hit a wall, and we would appreciate any insights.

Environment
PBM Version: 2.10.0

MongoDB Version: 8.0.8

Replica Set: 1 Primary, 2 standard Secondaries, 1 Hidden/Delayed Secondary.

Problem Description
Expected Behavior:
PBM should prioritize our hidden and delayed member (10.100.7.166:28017) as the backup source, according to the documented selection priority (Hidden > Secondary > Primary).

Actual Behavior:
PBM consistently skips the hidden node and performs the backup on one of the regular secondary nodes (e.g., 10.100.4.225:28017).

Troubleshooting Steps and Findings
We have performed the following diagnostic steps:

Confirmed Replica Set Configuration

The hidden/delayed member is configured with hidden: true, priority: 0, and votes: 0.

JSON

// rs.conf() member details for the hidden node
{
    "_id": 3,
    "host": "10.100.7.166:28017",
    "arbiterOnly": false,
    "buildIndexes": true,
    "hidden": true,
    "priority": 0,
    "tags": {},
    "secondaryDelaySecs": NumberLong("259200"),
    "votes": 0
}

Analyzed PBM Status

The output of pbm status -p shows that PBM recognizes the hidden node but, crucially, does not assign it a backup priority (Bkp Prio), indicating it has been deemed ineligible.

Code snippet

# pbm status -p
Cluster:
========
mic-rs:
  - 10.100.4.223:28017 [P], Bkp Prio: [1.0], PITR Prio: [0.5]: pbm-agent [v2.10.0] OK
  - 10.100.4.224:28017 [S], Bkp Prio: [1.0], PITR Prio: [1.0]: pbm-agent [v2.10.0] OK
  - 10.100.4.225:28017 [S], Bkp Prio: [1.0], PITR Prio: [1.0]: pbm-agent [v2.10.0] OK
  - 10.100.7.166:28017 [D]: pbm-agent [v2.10.0] OK

Analyzed PBM Agent Logs (INFO Level)

The pbm-agent log on the hidden node (10.100.7.166) confirms that it receives the backup command but immediately decides it is unfit.

Code snippet

2025-07-21T10:06:47Z I [mic-rs/10.100.7.166:28017] got command backup [...]
2025-07-21T10:06:47Z I [mic-rs/10.100.7.166:28017] got epoch {...}
2025-07-21T10:06:47Z I [mic-rs/10.100.7.166:28017] [backup/...] node is not suitable for backup

Checked MongoDB Node Health (rs.status())

This is where the mystery deepens. The rs.status() output shows that the hidden node is perfectly healthy and in the correct state from MongoDB’s perspective.

JSON

// rs.status() output for the hidden member
{
    "_id": 3,
    "name": "10.100.7.166:28017",
    "health": 1,
    "state": 2,
    "stateStr": "SECONDARY",
    "uptime": 1202205,
    "optimeDate": ISODate("2025-07-18T09:42:23.000Z"),
    // ... other fields show normal operation
}

Checked System Requirements & Permissions

We reviewed the official system requirements documentation. We have verified that our PBM user (pbmuser) has the required roles (clusterAdmin, backup, restore, etc.) as specified.

Attempted to Capture DEBUG Logs

We have set the log level to DEBUG (pbm config --set log.level=DEBUG) and restarted the agent on the hidden node to ensure it loaded the configuration. The agent startup log confirms log-level:D. However, when we trigger a new backup, the logs still only show the INFO level message (node is not suitable for backup) without any preceding DEBUG lines to explain why it is not suitable.

Core Question
Why does the pbm-agent consider a node “not suitable for backup” when MongoDB’s own rs.status() reports that the same node is perfectly healthy (health: 1, stateStr: “SECONDARY”)?

There seems to be an additional internal check within PBM that is failing, and this check is not being logged even at the DEBUG level.

We would be very grateful for any guidance on:

What other internal checks does PBM perform to determine node suitability?

Is this a known issue or limitation with PBM v2.10.0 on MongoDB 8.0.8?

What further steps can we take to diagnose the root cause of this “not suitable” status?

Thank you for your time and assistance.

Ivan_Groenewold · July 21, 2025, 1:54pm

Hi, as far as I can see you have assigned custom priorities for all the nodes except the hidden one. I suggest either removing your custom priorities (which will cause PBM to prefer the hidden node) or assign the hidden one a priority greater than the rest (eg. 2.0)

frank.pan · July 21, 2025, 2:35pm

Hi,

Thanks for your quick reply and suggestion.

Regarding the custom priorities, we haven’t actually configured any. The backup.priority section is not set in our configuration. Here is the full output of our pbm config:

storage:
  type: filesystem
  filesystem:
    path: /data/local_backups
pitr:
  enabled: true
  compression: s2
backup:
  oplogSpanMin: 0
  compression: s2
  numParallelCollections: 2
restore: {}

Ivan_Groenewold · July 21, 2025, 3:04pm

Hi, sorry I missed the part about the node being delayed. Currently we don’t support picking a delayed member for backup.
If that feature is important for you, I invite you to open a feature request at jira.percona.com.

Topic		Replies	Views
Scheduled full backups Percona Backup for MongoDB	1	702	May 3, 2022
Pbm pitr.oplogOnly backup set the priority Percona Backup for MongoDB	2	791	October 25, 2022
Question on Backup priority	2	1204	August 10, 2022
PBM 1.2.0 - Keep out some node for backup election Percona Backup for MongoDB	3	682	June 2, 2020
Percona mongodb operator Node selection for logical backup Percona Operator for MongoDB	1	406	November 15, 2023

PBM Not Selecting Healthy Hidden/Delayed Node for Backup

Related topics