PBM doesn't work when a mongodb replica set member is down

Hello,

I’m using PBM 1.7.0 on a replica set of 3 members (one of them is just working as arbiter).
PBM is install on all replica set members (but the arbiter). Everything works fine until one of the members of the replica set fails (by instance one of the servers is freezing due some kind of memory issue which I’m troubleshooing still) this server which is failing is a secondary server in the replica set and It was the first server where I installed PBM on this replica set the first time I deployed it.
Restarting pbm-agent doesn’t help and when I issue a pbm command this doesn’t give an output on console, it just stays running but not output is given (no error, no nothing).
When both replica set members are up pbm works as intented and I can run all PBM commands from both servers.
Issue seems similar this this one https://forums.percona.com/t/pbm-doesnt-tolerate-replset-member-being-down/8293 but according to the msgs on that theread seems that issue should be already fixed on a earlier version of PBM.
Any idea where I could take a look to TS further this issue ? Could be this an bug on PBM ?
Thank you in advance for your time.

1 Like

Hello, are you using the replicaSet connection string for the pbm command? For example:

export PBM_MONGODB_URI=“mongodb://pbmuser:secretpwd@localhost:27017/?replicaSet=testRPL”

In this case pbm should work fine even if one member is down. Please note that this is different from the contents of pbm-agent file, as the agent has to connect to the local node only (not the replicaset).

1 Like

Hello, thank you for replying.
This is a great catch, however seems this is not the issue.
I ran multiple tests and for some reason if the 2nd mongodb server on the replica set is down pbm-agent doesn’t work.

The following are some of the last tests I did:

Test #1:

Having both mongodb servers up, ran pbm-commands from server #1 (which is the Primary on this replica set) all commands works, even seems that the PIT backups are being made from this server.

Test #2:
Stoping mongodb service on server #2. pbm-agent service doesn’t fail on server #1, but journalctl doesn’t show new logs related to pbm-agent.service and the PBM cli commands doesn’t work (commands doesn’t fail neither, it just keeps waiting and the command doesn’t return anything, in this state pbm-agent doesn’t make PIT backups of course)

Test #3:
Having mongodb service stoped on server #2, stopped pbm-agent service on server #1 and run pbm-agent manually on terminal of server #1: pbm-agent --mongodb-uri “mongodb://user:password@localhost:27017/”

This commands never returns and stays awaiting for return (same behaviour than running pbm cli commands when server #2 is down), then if on another terminal and without terminating the pbm-agent command, I start the mongodb service on server #2, then pbm-agent command on terminal of server #1 start showing logs as it should if all were working fine and shows “listening for the commands” msg

Note: The content of /etc/default/pbm-agent on server #1 and #2 is the same:

PBM_MONGODB_URI=“mongodb://user:password@localhost:27017”

Thanks in advance for your time and reply.

1 Like

This sounds like a bug to me. Would you mind opening a bug report in jira.percona.com?

1 Like

I do believe the same. I’ll do. Thanks

1 Like