Cannot restore incremetal backup

Hi, im trying to restore an incremental backup, all instances are running in docker containers.

I added the pbm-agent & to the mongodb entrypoint, so it now starts when container starts up. But…

i do a incremental backup, everything works perfect. then i delete one of my collections, and will try and run a restore.
The following below is the whole restore process, and logs, after it tried to restore, the config servers studently loosing connection, have no idea why, and when i check my db. the collection is not restored.

Hope you can see what happens from the logs :face_with_diagonal_mouth:

[mongodb@mongodb_content_cfg_server1 percona]$pbm status

Cluster:

mongo_data_rs2:

  • mongo_data_rs2/mongodb_content_rs2n1:27017 [P]: pbm-agent v2.2.1 OK
    mongo_data_rs3:
  • mongo_data_rs3/mongodb_content_rs3n1:27017 [P]: pbm-agent v2.2.1 OK
    mongo_data_rs1:
  • mongo_data_rs1/mongodb_content_rs1n1:27017 [P]: pbm-agent v2.2.1 OK
    mongo_conf:
  • mongo_conf/mongodb_content_cfg_server1:27017 [P]: pbm-agent v2.2.1 OK
  • mongo_conf/mongodb_content_cfg_server2:27017 [S]: pbm-agent v2.2.1 OK
  • mongo_conf/mongodb_content_cfg_server3:27017 [S]: pbm-agent v2.2.1 OK

PITR incremental backup:

Status [OFF]

Currently running:

(none)

Backups:

FS /opt/backup/
Snapshots:
2023-08-28T10:30:25Z 24.09MB [restore_to_time: 2023-08-28T10:30:28Z]
2023-08-28T09:01:02Z 14.49MB <incremental, base> [restore_to_time: 2023-08-28T09:01:04Z]

[mongodb@mongodb_content_cfg_server1 percona]$ pbm list
Backup snapshots:
2023-08-28T09:01:02Z <incremental, base> [restore_to_time: 2023-08-28T09:01:04Z]
2023-08-28T10:30:25Z [restore_to_time: 2023-08-28T10:30:28Z]

PITR :

[mongodb@mongodb_content_cfg_server1 percona]$ pbm restore 2023-08-28T10:30:25Z
Starting restore 2023-08-28T10:55:42.716998345Z from ‘2023-08-28T10:30:25Z’…Restore of the snapshot from ‘2023-08-28T10:30:25Z’ has started.
Check restore status with: pbm describe-restore 2023-08-28T10:55:42.716998345Z -c </path/to/pbm.conf.yaml>
No other pbm command is available while the restore is running!

[mongodb@mongodb_content_cfg_server1 percona]$ pbm restore 2023-08-28T10:30:25Z
Starting restore 2023-08-28T10:55:42.716998345Z from ‘2023-08-28T10:30:25Z’…Restore of the snapshot from ‘2023-08-28T10:30:25Z’ has started.
Check restore status with: pbm describe-restore 2023-08-28T10:55:42.716998345Z -c </path/to/pbm.conf.yaml>
No other pbm command is available while the restore is running!

[mongodb@mongodb_content_cfg_server1 percona]$ pbm describe-restore 2023-08-28T10:55:42.716998345Z -c /opt/percona/storage.yaml
name: “2023-08-28T10:55:42.716998345Z”
opid: “”
backup: “”
type: physical
status: running
last_transition_time: “2023-08-28T10:56:09Z”
replsets:

  • name: mongo_data_rs1
    status: running
    last_transition_time: “2023-08-28T10:55:59Z”
    nodes:
    • name: mongodb_content_rs1n1:27017
      status: running
      last_transition_time: “2023-08-28T10:55:54Z”
  • name: mongo_data_rs2
    status: running
    last_transition_time: “2023-08-28T10:56:03Z”
    nodes:
    • name: mongodb_content_rs2n1:27017
      status: running
      last_transition_time: “2023-08-28T10:55:58Z”
  • name: mongo_data_rs3
    status: running
    last_transition_time: “2023-08-28T10:56:03Z”
    nodes:
    • name: mongodb_content_rs3n1:27017
      status: running
      last_transition_time: “2023-08-28T10:55:58Z”
  • name: mongo_conf
    status: running
    last_transition_time: “2023-08-28T10:56:04Z”
    nodes:
    • name: mongodb_content_cfg_server1:27017
      status: running
      last_transition_time: “2023-08-28T10:55:59Z”
    • name: mongodb_content_cfg_server2:27017
      status: running
      last_transition_time: “2023-08-28T10:55:58Z”
    • name: mongodb_content_cfg_server3:27017
      status: running
      last_transition_time: “2023-08-28T10:55:58Z”

[mongodb@mongodb_content_cfg_server1 percona]$ pbm status

Cluster:

mongo_data_rs1:

  • mongo_data_rs1/mongodb_content_rs1n1:27017 [P]: pbm-agent v2.2.1 OK
    mongo_data_rs2:
  • mongo_data_rs2/mongodb_content_rs2n1:27017 [P]: pbm-agent v2.2.1 OK
    mongo_conf:
  • mongo_conf/mongodb_content_cfg_server1:27017 [P]: pbm-agent FAILED status:

ERROR with ERROR: lost agent, last heartbeat: 1693220172

  • mongo_conf/mongodb_content_cfg_server2:27017 [S]: pbm-agent FAILED status:

ERROR with ERROR: lost agent, last heartbeat: 1693220171

  • mongo_conf/mongodb_content_cfg_server3:27017 [S]: pbm-agent FAILED status:

ERROR with ERROR: lost agent, last heartbeat: 1693220172
mongo_data_rs3:

  • mongo_data_rs3/mongodb_content_rs3n1:27017 [P]: pbm-agent v2.2.1 OK

PITR incremental backup:

Status [OFF]

Currently running:

(none)

Backups:

FS /opt/backup/
Snapshots:
2023-08-28T10:30:25Z 24.09MB [restore_to_time: 2023-08-28T10:30:28Z]
2023-08-28T09:01:02Z 14.49MB <incremental, base> [restore_to_time: 2023-08-28T09:01:04Z]

[mongodb@mongodb_content_cfg_server1 percona]$ pbm list --restore
Restores history:

[mongodb@mongodb_content_cfg_server1 percona]$ pbm logs
2023-08-28T10:55:43Z I [mongo_conf/mongodb_content_cfg_server1:27017] [restore/2023-08-28T10:55:42.716998345Z] waiting for starting status in rs map[.pbm.restore/2023-08-28T10:55:42.716998345Z/rs.mongo_conf/node.mongodb_content_cfg_server1:27017:{} .pbm.restore/2023-08-28T10:55:42.716998345Z/rs.mongo_conf/node.mongodb_content_cfg_server2:27017:{} .pbm.restore/2023-08-28T10:55:42.716998345Z/rs.mongo_conf/node.mongodb_content_cfg_server3:27017:{}]
2023-08-28T10:55:44Z I [mongo_data_rs1/mongodb_content_rs1n1:27017] [restore/2023-08-28T10:55:42.716998345Z] moving to state starting
2023-08-28T10:55:44Z I [mongo_data_rs1/mongodb_content_rs1n1:27017] [restore/2023-08-28T10:55:42.716998345Z] waiting for starting status in rs map[.pbm.restore/2023-08-28T10:55:42.716998345Z/rs.mongo_data_rs1/node.mongodb_content_rs1n1:27017:{}]
2023-08-28T10:55:48Z I [mongo_data_rs3/mongodb_content_rs3n1:27017] [restore/2023-08-28T10:55:42.716998345Z] waiting for cluster
2023-08-28T10:55:48Z I [mongo_data_rs2/mongodb_content_rs2n1:27017] [restore/2023-08-28T10:55:42.716998345Z] waiting for cluster
2023-08-28T10:55:48Z I [mongo_conf/mongodb_content_cfg_server1:27017] [restore/2023-08-28T10:55:42.716998345Z] waiting for shards map[.pbm.restore/2023-08-28T10:55:42.716998345Z/rs.mongo_conf/rs:{} .pbm.restore/2023-08-28T10:55:42.716998345Z/rs.mongo_data_rs1/rs:{} .pbm.restore/2023-08-28T10:55:42.716998345Z/rs.mongo_data_rs2/rs:{} .pbm.restore/2023-08-28T10:55:42.716998345Z/rs.mongo_data_rs3/rs:{}]
2023-08-28T10:55:49Z I [mongo_data_rs1/mongodb_content_rs1n1:27017] [restore/2023-08-28T10:55:42.716998345Z] waiting for cluster
2023-08-28T10:55:54Z I [mongo_conf/mongodb_content_cfg_server1:27017] [restore/2023-08-28T10:55:42.716998345Z] waiting for cluster
2023-08-28T10:56:23Z I [mongo_data_rs1/mongodb_content_rs1n1:27017] pbm-agent:
Version: 2.2.1
Platform: linux/amd64
GitCommit: 5d2c6eeb81be87a21ccf8d3ead6cba762e6bf70f
GitBranch: release-2.2.1
BuildTime: 2023-07-25_13:42_UTC
GoVersion: go1.19
2023-08-28T10:56:23Z

DataNode logs:

[mongodb@mongodb_content_rs2n1 /]$ pbm logs
2023-08-28T10:51:23Z I [mongo_conf/mongodb_content_cfg_server1:27017] got epoch {1693219863 8}
2023-08-28T10:51:23Z I [mongo_conf/mongodb_content_cfg_server1:27017] [restore/2023-08-28T10:51:23.296062775Z] backup: 2023-08-28T10:30:25Z
2023-08-28T10:51:23Z I [mongo_conf/mongodb_content_cfg_server1:27017] [restore/2023-08-28T10:51:23.296062775Z] recovery started
2023-08-28T10:51:23Z I [mongo_data_rs2/mongodb_content_rs2n1:27017] got command restore [name: 2023-08-28T10:51:23.296062775Z, snapshot: 2023-08-28T10:30:25Z] <ts: 1693219883>
2023-08-28T10:51:23Z I [mongo_data_rs2/mongodb_content_rs2n1:27017] got epoch {1693219863 8}
2023-08-28T10:51:23Z I [mongo_data_rs1/mongodb_content_rs1n1:27017] got command restore [name: 2023-08-28T10:51:23.296062775Z, snapshot: 2023-08-28T10:30:25Z] <ts: 1693219883>
2023-08-28T10:51:23Z I [mongo_data_rs2/mongodb_content_rs2n1:27017] [restore/2023-08-28T10:51:23.296062775Z] backup: 2023-08-28T10:30:25Z
2023-08-28T10:51:23Z I [mongo_data_rs2/mongodb_content_rs2n1:27017] [restore/2023-08-28T10:51:23.296062775Z] recovery started
2023-08-28T10:51:23Z I [mongo_data_rs1/mongodb_content_rs1n1:27017] got epoch {1693219863 8}
2023-08-28T10:51:24Z I [mongo_conf/mongodb_content_cfg_server1:27017] [restore/2023-08-28T10:51:23.296062775Z] moving to state starting
2023-08-28T10:51:24Z I [mongo_data_rs1/mongodb_content_rs1n1:27017] [restore/2023-08-28T10:51:23.296062775Z] backup: 2023-08-28T10:30:25Z
2023-08-28T10:51:24Z I [mongo_data_rs1/mongodb_content_rs1n1:27017] [restore/2023-08-28T10:51:23.296062775Z] recovery started
2023-08-28T10:51:24Z I [mongo_conf/mongodb_content_cfg_server1:27017] [restore/2023-08-28T10:51:23.296062775Z] waiting for starting status in rs map[.pbm.restore/2023-08-28T10:51:23.296062775Z/rs.mongo_conf/node.mongodb_content_cfg_server1:27017:{} .pbm.restore/2023-08-28T10:51:23.296062775Z/rs.mongo_conf/node.mongodb_content_cfg_server2:27017:{} .pbm.restore/2023-08-28T10:51:23.296062775Z/rs.mongo_conf/node.mongodb_content_cfg_server3:27017:{}]
2023-08-28T10:51:24Z I [mongo_data_rs2/mongodb_content_rs2n1:27017] [restore/2023-08-28T10:51:23.296062775Z] moving to state starting
2023-08-28T10:51:24Z I [mongo_data_rs2/mongodb_content_rs2n1:27017] [restore/2023-08-28T10:51:23.296062775Z] waiting for starting status in rs map[.pbm.restore/2023-08-28T10:51:23.296062775Z/rs.mongo_data_rs2/node.mongodb_content_rs2n1:27017:{}]
2023-08-28T10:51:24Z I [mongo_data_rs1/mongodb_content_rs1n1:27017] [restore/2023-08-28T10:51:23.296062775Z] moving to state starting
2023-08-28T10:51:24Z I [mongo_data_rs1/mongodb_content_rs1n1:27017] [restore/2023-08-28T10:51:23.296062775Z] waiting for starting status in rs map[.pbm.restore/2023-08-28T10:51:23.296062775Z/rs.mongo_data_rs1/node.mongodb_content_rs1n1:27017:{}]
2023-08-28T10:51:28Z I [mongo_data_rs3/mongodb_content_rs3n1:27017] [restore/2023-08-28T10:51:23.296062775Z] waiting for cluster
2023-08-28T10:51:29Z I [mongo_data_rs2/mongodb_content_rs2n1:27017] [restore/2023-08-28T10:51:23.296062775Z] waiting for cluster
2023-08-28T10:51:29Z I [mongo_data_rs1/mongodb_content_rs1n1:27017] [restore/2023-08-28T10:51:23.296062775Z] waiting for cluster