Hello Percona Community,
We are facing a persistent issue where Percona Backup for MongoDB (PBM) is not selecting our hidden, delayed secondary node for backups, even though it appears to be the highest-priority candidate. We have followed a detailed troubleshooting process but have hit a wall, and we would appreciate any insights.
Environment
PBM Version: 2.10.0
MongoDB Version: 8.0.8
Replica Set: 1 Primary, 2 standard Secondaries, 1 Hidden/Delayed Secondary.
Problem Description
Expected Behavior:
PBM should prioritize our hidden and delayed member (10.100.7.166:28017) as the backup source, according to the documented selection priority (Hidden > Secondary > Primary).
Actual Behavior:
PBM consistently skips the hidden node and performs the backup on one of the regular secondary nodes (e.g., 10.100.4.225:28017).
Troubleshooting Steps and Findings
We have performed the following diagnostic steps:
- Confirmed Replica Set Configuration
The hidden/delayed member is configured with hidden: true, priority: 0, and votes: 0.
JSON
// rs.conf() member details for the hidden node
{
"_id": 3,
"host": "10.100.7.166:28017",
"arbiterOnly": false,
"buildIndexes": true,
"hidden": true,
"priority": 0,
"tags": {},
"secondaryDelaySecs": NumberLong("259200"),
"votes": 0
}
- Analyzed PBM Status
The output of pbm status -p shows that PBM recognizes the hidden node but, crucially, does not assign it a backup priority (Bkp Prio), indicating it has been deemed ineligible.
Code snippet
# pbm status -p
Cluster:
========
mic-rs:
- 10.100.4.223:28017 [P], Bkp Prio: [1.0], PITR Prio: [0.5]: pbm-agent [v2.10.0] OK
- 10.100.4.224:28017 [S], Bkp Prio: [1.0], PITR Prio: [1.0]: pbm-agent [v2.10.0] OK
- 10.100.4.225:28017 [S], Bkp Prio: [1.0], PITR Prio: [1.0]: pbm-agent [v2.10.0] OK
- 10.100.7.166:28017 [D]: pbm-agent [v2.10.0] OK
- Analyzed PBM Agent Logs (INFO Level)
The pbm-agent log on the hidden node (10.100.7.166) confirms that it receives the backup command but immediately decides it is unfit.
Code snippet
2025-07-21T10:06:47Z I [mic-rs/10.100.7.166:28017] got command backup [...]
2025-07-21T10:06:47Z I [mic-rs/10.100.7.166:28017] got epoch {...}
2025-07-21T10:06:47Z I [mic-rs/10.100.7.166:28017] [backup/...] node is not suitable for backup
- Checked MongoDB Node Health (rs.status())
This is where the mystery deepens. The rs.status() output shows that the hidden node is perfectly healthy and in the correct state from MongoDB’s perspective.
JSON
// rs.status() output for the hidden member
{
"_id": 3,
"name": "10.100.7.166:28017",
"health": 1,
"state": 2,
"stateStr": "SECONDARY",
"uptime": 1202205,
"optimeDate": ISODate("2025-07-18T09:42:23.000Z"),
// ... other fields show normal operation
}
- Checked System Requirements & Permissions
We reviewed the official system requirements documentation. We have verified that our PBM user (pbmuser) has the required roles (clusterAdmin, backup, restore, etc.) as specified.
- Attempted to Capture DEBUG Logs
We have set the log level to DEBUG (pbm config --set log.level=DEBUG) and restarted the agent on the hidden node to ensure it loaded the configuration. The agent startup log confirms log-level:D. However, when we trigger a new backup, the logs still only show the INFO level message (node is not suitable for backup) without any preceding DEBUG lines to explain why it is not suitable.
Core Question
Why does the pbm-agent consider a node “not suitable for backup” when MongoDB’s own rs.status() reports that the same node is perfectly healthy (health: 1, stateStr: “SECONDARY”)?
There seems to be an additional internal check within PBM that is failing, and this check is not being logged even at the DEBUG level.
We would be very grateful for any guidance on:
What other internal checks does PBM perform to determine node suitability?
Is this a known issue or limitation with PBM v2.10.0 on MongoDB 8.0.8?
What further steps can we take to diagnose the root cause of this “not suitable” status?
Thank you for your time and assistance.