Cannot take backup via pbm on sharded mongodb cluster, works ok with single-node mongodb.
After trying to run pbm backup the following error occurred: Failed with “couldn’t get response from all shards: convergeClusterWithTimeout: reached converge timeout”. Mongodb uri is set to config set replicaset like ‘mongodb://pbmuser:pbmpass@cs1:27019,cs2:27019,cs3:27019/?authSource=admin&replicaSet=csReplSet’. Credentials are valid, pbm-agent is present on all mongod nodes accross all shards + config set nodes.
In logs I see messages like:Jun 4 08:46:43 pbm-agent[15215]: 2020/06/04 08:46:43 Got command resyncBcpListJun 4 08:46:43 pbm-agent[15215]: 2020/06/04 08:46:43 [INFO] resync_list: operation has been scheduled on another replset nodeJun 4 08:47:48 pbm-agent[15215]: 2020/06/04 08:47:48 Got command backup 2020-06-04T08:47:47ZJun 4 08:47:48 pbm-agent[15215]: 2020/06/04 08:47:48 Backup has been scheduled on another replset nodeJun 5 03:30:01 pbm-agent[15215]: 2020/06/05 03:30:01 Got command backup 2020-06-05T03:30:01ZJun 5 03:30:01 pbm-agent[15215]: 2020/06/05 03:30:01 Backup has been scheduled on another replset node
Jun 5 08:00:36 pbm-agent[18799]: 2020/06/05 08:00:36 Got command backup 2020-06-05T08:00:36ZJun 5 08:00:36 pbm-agent[18799]: 2020/06/05 08:00:36 Backup 2020-06-05T08:00:36Z started on node csReplSet/10.1.1.33:27019Jun 5 08:00:52 pbm-agent[18799]: 2020/06/05 08:00:52 Mark backup as failed couldn't get response from all shards: convergeClusterWithTimeout: reached converge timeout
: <nil>Jun 5 08:00:52 pbm-agent[18799]: 2020/06/05 08:00:52 [ERROR] backup: couldn’t get response from all shards: convergeClusterWithTimeout: reached converge timeout
How can I enable debug logs to see more? Or maybe you know the reason?
MongoDB version - 4.0.6PBM version - 1.1.3
Thanks in advance.
Hi @vvol
Each pbm-agent processes should connect to their localhost mongod with a standalone type of connection. So agent’s URIs should be in format “mongodb://pbmuser:pbmpass@cs1:27019”, “mongodb://pbmuser:pbmpass@cs2:27019”, “mongodb://pbmuser:pbmpass@rs0:27019” and so on. Since each agent should serve only one node.
Now we have a bit confusing example in the documentation. I think we will fix it.
Cheers!
Thanks for reply! Backup was successful after I had changed URI.