[error]: Percona backup for MongoDB -- Error while doing Restore -- "not master"

Hello team,
pbm version:

Version:   1.1.1Platform:  linux/amd64GitCommit: 457bc0eaf861c8c15c997333ce1d8108a138874bGitBranch: masterBuildTime: 2020-01-31_08:16_UTCGoVersion: go1.12.9

pbm_config.yaml

storage: type: filesystem filesystem:   path: /backup/


I have a 3 node - PSS cluster. I have pbm-agent setup on all 3 nodes. I take backup onto NFS mount point. Permissions look correct. PBM backups are successful.

When I try do restore, it automatically chose the primary for restore , but it failed at the end saying Not master. The mongoDB URI contains all the 3 nodes. And one node is primary.

2020/04/13 12:55:12 Got command restore 2020-04-13T11:21:15Z2020/04/13 12:55:12 [INFO] Restore of ‘2020-04-13T11:21:15Z’ started2020-04-13T12:55:12.860+0000    preparing collections to restore from2020-04-13T12:55:12.879+0000    finished restoring admin.myOutput (0 documents, 0 failures)
2020/04/13 12:55:12 [ERROR] restore: restore mongo dump (successes: 0 / fails: 0): admin.my    Output: error dropping collection: (NotMaster) not master

I then remove changed the URI to point to only the master and tried again, and getting the same error.

  1. Any idea how to debug or what might cause this issue? The online docs does not say how to resolve this.
  2. Should we be restoring on to a new empty cluster? or the existing cluster?

Hi.
It sounds as though it might be an misleading error message., the one about “not master”, if the pbm-agent was the one on the server with the primary node. A misleading error message is a bug in it’s own right. But I’ll try to guess the problem ‘behind’.
Two hypotheses:

  • The pbm-agent connected to the wrong node.
    Q. Is it started with config that has the MongoDB URI with all three nodes? That would be a problem if so. It should connect in standalone/direct style to only the mongod on the same host only. If it was given a replicaset-style URI it would connect to the primary wherever it is. I.e. all three pbm-agents would act as the one on the primary.
  • I notice the filesystem type of storage is being used. This is very easy to incompletely set up. (Basically I don’t recommend it; but I see it as the first thing most users try when testing.) I wonder if the node that acted couldn’t read the backup files because they weren’t in whichever remote fileserver mount was mounted at /backup/.
Cheers,
Akira

Hello Akira,
Thanks !! I changed the pbm-agent config MONGODB URI only to one node on which its running and it resolved the issue.
Continuing with the pbm-agent service file, if we use the systemd service file i dont see logs.
The logs start appearing only when we use 
nohup
pbm-agent --mongodb-uri=“mongodb://pbmuser:secretpwd@$HOSTNAME:27017”
> /var/log/pbm-agent.log 2>&1 &
can you please share a sample systemd service file with logging.

Hello Akira,
Thanks !! I changed the pbm-agent config MONGODB URI only to one node on which its running and it resolved the issue.
Continuing with the pbm-agent service file, if we use the systemd service file i dont see logs.
The logs start appearing only when we use 
nohup
pbm-agent --mongodb-uri=“mongodb://pbmuser:secretpwd@$HOSTNAME:27017”
> /var/log/pbm-agent.log 2>&1 &
can you please share a sample systemd service file with logging.