PBM Backup Issues on Config Server Nodes vs. Shard Nodes in Kubernetes

Hello,

I’m seeking assistance with an issue related to Percona Backup for MongoDB (PBM) agents in a Kubernetes environment. Specifically, backups initiated from shard-data pods appear to work fine, but backups from config server nodes fail, with PBM agents on MongoDB shard nodes reported as “NOT FOUND” or “FAILED” despite being active.

Issue Description:

Backups initiated directly from shard-data pods execute without issue. However, when attempting backups from config server nodes, they fail, as evidenced by the following log entries from a config server node:

2024-02-16T12:30:10Z E [mongodb7-configsvr/mongodb7-configsvr-2.mongodb7-headless.mongodb.svc.cluster.local:27017] [pitr] streaming oplog: get backup start TS: run out of tries
2024-02-16T12:30:11Z I [mongodb7-configsvr/mongodb7-configsvr-2.mongodb7-headless.mongodb.svc.cluster.local:27017] [backup/2024-02-16T12:29:19Z] mark backup as error `couldn't get response from all shards: convergeClusterWithTimeout: reached converge timeout`: <nil>
2024-02-16T12:30:11Z E [mongodb7-configsvr/mongodb7-configsvr-2.mongodb7-headless.mongodb.svc.cluster.local:27017] [backup/2024-02-16T12:29:19Z] backup: couldn't get response from all shards: convergeClusterWithTimeout: reached converge timeout

I appreciate any advice, insights, or guidance the community can offer to help resolve these backup inconsistencies within our sharded MongoDB cluster

hi,

This is not enough to understand. Can you send full pbm logs for the backup?