ERROR: couldn’t get response from all shards: convergeClusterWithTimeout:

An error appeared when creating a backup

2021-06-01T17:44:24Z 0.00B [ERROR: couldn’t get response from all shards: convergeClusterWithTimeout: reached converge timeout] [2021-06-01T17:45:16]

pbm-agent v1.4.1

What can cause such an error and how to fix what would not appear in the future?

Ato backup is done according to the schedule and I think that it did, but in fact it did not.

1 Like

Hi, have you checked that all the cluster members have pbm-agent running properly?

1 Like

All participants have pbm-agent v1.4.1 OK status

What to do?

1 Like

Hi, have you done any changes to the cluster after starting the pbm-agent? like adding a new shard? Can you try a restart of the agent everywhere just in case?
Also what is the connection string you are using for the agent?

1 Like

After restarting pbm-agent on all nodes, the backup passed.
But how to track whether pbm-agent is operational on all nodes? When the status shows pbm-agent v1.4.1 OK status for all nodes

1 Like

Glad to hear that solved it. The agent will alert if it cannot connect, but even if the agent is up and running, backups might fail for different reasons. What I recommend is monitor the backup output to check for any errors.

1 Like