Server Selection Error timeout

Hi,
Im trying to setup Percona backup for Mongodb, and its not as easy as i thought.

The setup is quite simple

1x mongodb docker container
1x percona backup for mongodb container

Mongodb server is running, and is working, started pbm container with: docker run --name percona_backup_master -e PBM_MONGODB_URI=“mongodb://172.25.0.7:27017” --network trax_repox_mongo_net -d percona/percona-backup-mongodb:latest’

after 20-30 sec the container exits with the logs entries below:

user@perconatest:/tmp$ docker logs -f percona_backup_master
+ '[' pbm-agent = pbm-agent ']'
++ mktemp
+ OUT=/tmp/tmp.uWEMOOzj6p
++ mktemp
+ OUT_CFG=/tmp/tmp.FwoQ7lhGzc
+ timeout=5
+ for i in {1..10}
+ '[' '' ']'
+ set +o xtrace
1
+ exit_status=0
++ grep -E '^([0-9]+)$' /tmp/tmp.uWEMOOzj6p
+ rs_size=1
+ [[ 0 == 0 ]]
+ [[ 1 -ge 1 ]]
+ break
+ rm /tmp/tmp.uWEMOOzj6p
+ exec pbm-agent
2023/08/02 08:40:36 Exit: connect to PBM: create mongo connection: mongo ping: server selection error: server selection timeout, current topology: { Type: ReplicaSetNoPrimary, Servers: [{ Addr: 127.0.0.1:27017, Type: Unknown, Last error: dial tcp 127.0.0.1:27017: connect: connection refused }, ] }

What i dont get, is why its trying to connect to 127.0.0.1? and not the mongodb://172.25.0.7:27017?

Please help, i´m pretty stuck here, and have to get this working on a sharded cluster, if this POC is working :slight_smile:

/Morten

Hi Pjotz and welcome to the Percona Community.

It looks like you have a standalone MongoDB instance. PBM does not work against a standalone server, it works only if the Replica Sert is configured.

The thing you can do is restarting the container setting the ReplSetName configuration option and then running rs.initiate() on the node. This way you will get a 1-node Replica Set and sp PBM should work.

Thanks for the reply, will have a look at it tomorrow. and return to you :slight_smile:

Hi again,
I have now my cluster running, with a master node, and 3 shards all running docker. They all have the pbm-agent installed and running. have created a storage.yaml to a file storage which is a mountpoint to a shared storage on the hosts.
When I to run the pbm config inside the container nodes eg: node_rs1:
pbm config --file=./storage.yaml --mongodb-uri=“mongodb://127.0.0.1:27017”
im getting an error:
Error: connect to mongodb: create mongo connection: mongo ping: server selection error: server selection timeout, current topology: { Type: Unknown, Servers: [{ Addr: 127.0.0.1:27017, Type: RSGhost, Average RTT: 1526614 }, ] }
When I run the same command on my master node.
root@455e377854f8:/opt/xci/percona# pbm config --file=./storage.yaml --mongodb-uri=“mongodb://localhost:27017”
pitr:
enabled: false
oplogSpanMin: 0
compression: s2
storage:
type: filesystem
filesystem:
path: /opt/backup/
backup:
compression: s2

when i start the pbm-agent on my master node, i get this:

root@455e377854f8:/opt/backup# pbm-agent --mongodb-uri mongodb://127.0.0.1:27017
2023/08/18 09:21:25 Exit: pre-start check: mongos is not supported

What to do when you are running with a mongoS?

Is there something im missing here? do i need to run the command on all nodes? and if i make a backup on the master node, will it then use the pbm-agent on all the shard nodes?

Think I´m confused on how this is working. :face_with_diagonal_mouth:

/Morten

Hi @Pjotz ,

For a sharded cluster, run any pbm related commands on Config Server ReplSet since it contains the metadata of the whole cluster.

Ref.: Configuration file - Percona Backup for MongoDB

Regards,
Parag

Thanks for the reply, tried to run everything from one of the cfg servers:

sorry its a screenshot, but im working with airgapped, so only way i could get it out at this point :slight_smile:

Hi @Pjotz ,

Kindly share the below details -

  1. pbm status output.
  2. MongoDB connection URI for pbm CLI (PBM_MONGODB_URI).
  3. pbm logs.

Regards,
Parag

Thanks for the swift reply,
I´m a lot closer now, have gotten the backup running and everything is almost good :slight_smile:

Only issue I´m facing now, is that I´m running percona mongodb server, in my dockerfile i install the pbm-agent. Created a shell script, that starts the pbm-agent, and put it in /docker-entrypoint-initdb.d and mounted the backup folder via docker-compose.
Everything on the config servers are running like a charm. but the data nodes aint running the entrypoint script, because the database is initialised (i can see in the entrypoint.sh script that is the case).
So how is it intended to start the pbm-agent inside the containers when they start up?

/Morten

Update:

I added the pbm-agent & to the mongodb entrypoint, so it now starts when container starts up. But…

i do a incremental backup, everything works perfect. then i delete one of my collections, and will try and run a restore.
The following below is the whole restore process, and logs, after it tried to restore, the config servers studently loosing connection, have no idea why, and when i check my db. the collection is not restored.

Hope you can see what happens from the logs :face_with_diagonal_mouth:

[mongodb@mongodb_content_cfg_server1 percona]$pbm status
Cluster:

mongo_data_rs2:

  • mongo_data_rs2/mongodb_content_rs2n1:27017 [P]: pbm-agent v2.2.1 OK
    mongo_data_rs3:
  • mongo_data_rs3/mongodb_content_rs3n1:27017 [P]: pbm-agent v2.2.1 OK
    mongo_data_rs1:
  • mongo_data_rs1/mongodb_content_rs1n1:27017 [P]: pbm-agent v2.2.1 OK
    mongo_conf:
  • mongo_conf/mongodb_content_cfg_server1:27017 [P]: pbm-agent v2.2.1 OK
  • mongo_conf/mongodb_content_cfg_server2:27017 [S]: pbm-agent v2.2.1 OK
  • mongo_conf/mongodb_content_cfg_server3:27017 [S]: pbm-agent v2.2.1 OK

PITR incremental backup:

Status [OFF]

Currently running:

(none)

Backups:

FS /opt/backup/
Snapshots:
2023-08-28T10:30:25Z 24.09MB [restore_to_time: 2023-08-28T10:30:28Z]
2023-08-28T09:01:02Z 14.49MB <incremental, base> [restore_to_time: 2023-08-28T09:01:04Z]

[mongodb@mongodb_content_cfg_server1 percona]$ pbm list
Backup snapshots:
2023-08-28T09:01:02Z <incremental, base> [restore_to_time: 2023-08-28T09:01:04Z]
2023-08-28T10:30:25Z [restore_to_time: 2023-08-28T10:30:28Z]

PITR :

[mongodb@mongodb_content_cfg_server1 percona]$ pbm restore 2023-08-28T10:30:25Z
Starting restore 2023-08-28T10:55:42.716998345Z from ‘2023-08-28T10:30:25Z’…Restore of the snapshot from ‘2023-08-28T10:30:25Z’ has started.
Check restore status with: pbm describe-restore 2023-08-28T10:55:42.716998345Z -c </path/to/pbm.conf.yaml>
No other pbm command is available while the restore is running!

[mongodb@mongodb_content_cfg_server1 percona]$ pbm restore 2023-08-28T10:30:25Z
Starting restore 2023-08-28T10:55:42.716998345Z from ‘2023-08-28T10:30:25Z’…Restore of the snapshot from ‘2023-08-28T10:30:25Z’ has started.
Check restore status with: pbm describe-restore 2023-08-28T10:55:42.716998345Z -c </path/to/pbm.conf.yaml>
No other pbm command is available while the restore is running!

[mongodb@mongodb_content_cfg_server1 percona]$ pbm describe-restore 2023-08-28T10:55:42.716998345Z -c /opt/percona/storage.yaml
name: “2023-08-28T10:55:42.716998345Z”
opid: “”
backup: “”
type: physical
status: running
last_transition_time: “2023-08-28T10:56:09Z”
replsets:

  • name: mongo_data_rs1
    status: running
    last_transition_time: “2023-08-28T10:55:59Z”
    nodes:
    • name: mongodb_content_rs1n1:27017
      status: running
      last_transition_time: “2023-08-28T10:55:54Z”
  • name: mongo_data_rs2
    status: running
    last_transition_time: “2023-08-28T10:56:03Z”
    nodes:
    • name: mongodb_content_rs2n1:27017
      status: running
      last_transition_time: “2023-08-28T10:55:58Z”
  • name: mongo_data_rs3
    status: running
    last_transition_time: “2023-08-28T10:56:03Z”
    nodes:
    • name: mongodb_content_rs3n1:27017
      status: running
      last_transition_time: “2023-08-28T10:55:58Z”
  • name: mongo_conf
    status: running
    last_transition_time: “2023-08-28T10:56:04Z”
    nodes:
    • name: mongodb_content_cfg_server1:27017
      status: running
      last_transition_time: “2023-08-28T10:55:59Z”
    • name: mongodb_content_cfg_server2:27017
      status: running
      last_transition_time: “2023-08-28T10:55:58Z”
    • name: mongodb_content_cfg_server3:27017
      status: running
      last_transition_time: “2023-08-28T10:55:58Z”

[mongodb@mongodb_content_cfg_server1 percona]$ pbm status
Cluster:

mongo_data_rs1:

  • mongo_data_rs1/mongodb_content_rs1n1:27017 [P]: pbm-agent v2.2.1 OK
    mongo_data_rs2:
  • mongo_data_rs2/mongodb_content_rs2n1:27017 [P]: pbm-agent v2.2.1 OK
    mongo_conf:
  • mongo_conf/mongodb_content_cfg_server1:27017 [P]: pbm-agent FAILED status:

    ERROR with ERROR: lost agent, last heartbeat: 1693220172

  • mongo_conf/mongodb_content_cfg_server2:27017 [S]: pbm-agent FAILED status:

    ERROR with ERROR: lost agent, last heartbeat: 1693220171

  • mongo_conf/mongodb_content_cfg_server3:27017 [S]: pbm-agent FAILED status:

    ERROR with ERROR: lost agent, last heartbeat: 1693220172
    mongo_data_rs3:

  • mongo_data_rs3/mongodb_content_rs3n1:27017 [P]: pbm-agent v2.2.1 OK

PITR incremental backup:

Status [OFF]

Currently running:

(none)

Backups:

FS /opt/backup/
Snapshots:
2023-08-28T10:30:25Z 24.09MB [restore_to_time: 2023-08-28T10:30:28Z]
2023-08-28T09:01:02Z 14.49MB <incremental, base> [restore_to_time: 2023-08-28T09:01:04Z]

[mongodb@mongodb_content_cfg_server1 percona]$ pbm list --restore
Restores history:

[mongodb@mongodb_content_cfg_server1 percona]$ pbm logs
2023-08-28T10:55:43Z I [mongo_conf/mongodb_content_cfg_server1:27017] [restore/2023-08-28T10:55:42.716998345Z] waiting for starting status in rs map[.pbm.restore/2023-08-28T10:55:42.716998345Z/rs.mongo_conf/node.mongodb_content_cfg_server1:27017:{} .pbm.restore/2023-08-28T10:55:42.716998345Z/rs.mongo_conf/node.mongodb_content_cfg_server2:27017:{} .pbm.restore/2023-08-28T10:55:42.716998345Z/rs.mongo_conf/node.mongodb_content_cfg_server3:27017:{}]
2023-08-28T10:55:44Z I [mongo_data_rs1/mongodb_content_rs1n1:27017] [restore/2023-08-28T10:55:42.716998345Z] moving to state starting
2023-08-28T10:55:44Z I [mongo_data_rs1/mongodb_content_rs1n1:27017] [restore/2023-08-28T10:55:42.716998345Z] waiting for starting status in rs map[.pbm.restore/2023-08-28T10:55:42.716998345Z/rs.mongo_data_rs1/node.mongodb_content_rs1n1:27017:{}]
2023-08-28T10:55:48Z I [mongo_data_rs3/mongodb_content_rs3n1:27017] [restore/2023-08-28T10:55:42.716998345Z] waiting for cluster
2023-08-28T10:55:48Z I [mongo_data_rs2/mongodb_content_rs2n1:27017] [restore/2023-08-28T10:55:42.716998345Z] waiting for cluster
2023-08-28T10:55:48Z I [mongo_conf/mongodb_content_cfg_server1:27017] [restore/2023-08-28T10:55:42.716998345Z] waiting for shards map[.pbm.restore/2023-08-28T10:55:42.716998345Z/rs.mongo_conf/rs:{} .pbm.restore/2023-08-28T10:55:42.716998345Z/rs.mongo_data_rs1/rs:{} .pbm.restore/2023-08-28T10:55:42.716998345Z/rs.mongo_data_rs2/rs:{} .pbm.restore/2023-08-28T10:55:42.716998345Z/rs.mongo_data_rs3/rs:{}]
2023-08-28T10:55:49Z I [mongo_data_rs1/mongodb_content_rs1n1:27017] [restore/2023-08-28T10:55:42.716998345Z] waiting for cluster
2023-08-28T10:55:54Z I [mongo_conf/mongodb_content_cfg_server1:27017] [restore/2023-08-28T10:55:42.716998345Z] waiting for cluster
2023-08-28T10:56:23Z I [mongo_data_rs1/mongodb_content_rs1n1:27017] pbm-agent:
Version: 2.2.1
Platform: linux/amd64
GitCommit: 5d2c6eeb81be87a21ccf8d3ead6cba762e6bf70f
GitBranch: release-2.2.1
BuildTime: 2023-07-25_13:42_UTC
GoVersion: go1.19
2023-08-28T10:56:23Z

DataNode logs:

[mongodb@mongodb_content_rs2n1 /]$ pbm logs
2023-08-28T10:51:23Z I [mongo_conf/mongodb_content_cfg_server1:27017] got epoch {1693219863 8}
2023-08-28T10:51:23Z I [mongo_conf/mongodb_content_cfg_server1:27017] [restore/2023-08-28T10:51:23.296062775Z] backup: 2023-08-28T10:30:25Z
2023-08-28T10:51:23Z I [mongo_conf/mongodb_content_cfg_server1:27017] [restore/2023-08-28T10:51:23.296062775Z] recovery started
2023-08-28T10:51:23Z I [mongo_data_rs2/mongodb_content_rs2n1:27017] got command restore [name: 2023-08-28T10:51:23.296062775Z, snapshot: 2023-08-28T10:30:25Z] <ts: 1693219883>
2023-08-28T10:51:23Z I [mongo_data_rs2/mongodb_content_rs2n1:27017] got epoch {1693219863 8}
2023-08-28T10:51:23Z I [mongo_data_rs1/mongodb_content_rs1n1:27017] got command restore [name: 2023-08-28T10:51:23.296062775Z, snapshot: 2023-08-28T10:30:25Z] <ts: 1693219883>
2023-08-28T10:51:23Z I [mongo_data_rs2/mongodb_content_rs2n1:27017] [restore/2023-08-28T10:51:23.296062775Z] backup: 2023-08-28T10:30:25Z
2023-08-28T10:51:23Z I [mongo_data_rs2/mongodb_content_rs2n1:27017] [restore/2023-08-28T10:51:23.296062775Z] recovery started
2023-08-28T10:51:23Z I [mongo_data_rs1/mongodb_content_rs1n1:27017] got epoch {1693219863 8}
2023-08-28T10:51:24Z I [mongo_conf/mongodb_content_cfg_server1:27017] [restore/2023-08-28T10:51:23.296062775Z] moving to state starting
2023-08-28T10:51:24Z I [mongo_data_rs1/mongodb_content_rs1n1:27017] [restore/2023-08-28T10:51:23.296062775Z] backup: 2023-08-28T10:30:25Z
2023-08-28T10:51:24Z I [mongo_data_rs1/mongodb_content_rs1n1:27017] [restore/2023-08-28T10:51:23.296062775Z] recovery started
2023-08-28T10:51:24Z I [mongo_conf/mongodb_content_cfg_server1:27017] [restore/2023-08-28T10:51:23.296062775Z] waiting for starting status in rs map[.pbm.restore/2023-08-28T10:51:23.296062775Z/rs.mongo_conf/node.mongodb_content_cfg_server1:27017:{} .pbm.restore/2023-08-28T10:51:23.296062775Z/rs.mongo_conf/node.mongodb_content_cfg_server2:27017:{} .pbm.restore/2023-08-28T10:51:23.296062775Z/rs.mongo_conf/node.mongodb_content_cfg_server3:27017:{}]
2023-08-28T10:51:24Z I [mongo_data_rs2/mongodb_content_rs2n1:27017] [restore/2023-08-28T10:51:23.296062775Z] moving to state starting
2023-08-28T10:51:24Z I [mongo_data_rs2/mongodb_content_rs2n1:27017] [restore/2023-08-28T10:51:23.296062775Z] waiting for starting status in rs map[.pbm.restore/2023-08-28T10:51:23.296062775Z/rs.mongo_data_rs2/node.mongodb_content_rs2n1:27017:{}]
2023-08-28T10:51:24Z I [mongo_data_rs1/mongodb_content_rs1n1:27017] [restore/2023-08-28T10:51:23.296062775Z] moving to state starting
2023-08-28T10:51:24Z I [mongo_data_rs1/mongodb_content_rs1n1:27017] [restore/2023-08-28T10:51:23.296062775Z] waiting for starting status in rs map[.pbm.restore/2023-08-28T10:51:23.296062775Z/rs.mongo_data_rs1/node.mongodb_content_rs1n1:27017:{}]
2023-08-28T10:51:28Z I [mongo_data_rs3/mongodb_content_rs3n1:27017] [restore/2023-08-28T10:51:23.296062775Z] waiting for cluster
2023-08-28T10:51:29Z I [mongo_data_rs2/mongodb_content_rs2n1:27017] [restore/2023-08-28T10:51:23.296062775Z] waiting for cluster
2023-08-28T10:51:29Z I [mongo_data_rs1/mongodb_content_rs1n1:27017] [restore/2023-08-28T10:51:23.296062775Z] waiting for cluster

Any chance you guys have a qualified shoot on what is going wrong here?