Initially we achieved successful restore of a Physical backup of Percona Mongo using PBM using just one DB.
Now, we have created a Replicaset with two Percona Mongo DBs and one arbiter. Backup works fine, but on restore, first PBM seems to be waiting for the Containers (Percona Mongo running in Docker) to be turned off, which is fine. But then after turning both the data-bearing nodes off, the restore procedure hangs. We just get this message repeatedly in the logs:
E [pitr] init: get conf: get: server selection error: server selection timeout, current topology: { Type: ReplicaSetNoPrimary, Servers: [{ Addr: perconamongo8:27017, Type: Unknown, Last error: dial tcp: lookup perconamongo8 on 127.0.0.11:53: no such host }, { Addr: perconamongo7:27017, Type: Unknown, Last error: dial tcp: lookup perconamongo7 on 127.0.0.11:53: no such host }, { Addr: perconamongoarb:27017, Type: RSArbiter, Average RTT: 343041 }, ] }
For a Physical backup restore, what are we supposed to do with a replicaset? If keep the nodes running, PBM seems to wait for them to be turned off. If we turn them off, then PBM complains that it cannot see them.
I am providing logs from our two Percona Mongo nodes. The arbiter has been turned off. We have tried different combinations of restoring to the Primary or Secondary. The Percona PBM on the Primary node seems to wait for the node to become secondary, and the Secondary waits to be shutdown, according to the Docker logs on the PBM containers.
The documentation here says to turn off the nodes for Physical restore, so we have tried that too (even though I understood from the above that we should only turn off the Arbiter)..
2025-03-25T16:16:45.000+0000 D [restore/2025-03-25T16:13:36.827616408Z] waiting to became secondary
2025-03-25T16:16:46.000+0000 D [restore/2025-03-25T16:13:36.827616408Z] waiting to became secondary
2025-03-25T16:16:47.000+0000 D [restore/2025-03-25T16:13:36.827616408Z] waiting to became secondary
2025-03-25T16:16:48.000+0000 D [restore/2025-03-25T16:13:36.827616408Z] waiting for the node to shutdown
2025-03-25T16:19:06.000+0000 E [pitr] init: get conf: get: server selection error: server selection timeout, current topology: { Type: ReplicaSetNoPrimary, Servers: [{ Addr: perconamongo7:27017, Type: Unknown, Last error: dial tcp: lookup perconamongo7 on 127.0.0.11:53: no such host }, { Addr: perconamongo8:27017, Type: RSSecondary, Average RTT: 349172 }, { Addr: perconamongoarb:27017, Type: Unknown, Last error: dial tcp: lookup perconamongoarb on 127.0.0.11:53: no such host }, ] }
2025-03-25T16:13:37.000+0000 I got epoch {1742919217 4}
2025-03-25T16:13:37.000+0000 I [restore/2025-03-25T16:13:36.827616408Z] backup: 2025-02-20T11:51:16Z
2025-03-25T16:13:37.000+0000 I [restore/2025-03-25T16:13:36.827616408Z] recovery started
2025-03-25T16:13:37.000+0000 D [restore/2025-03-25T16:13:36.827616408Z] port: 27282
2025-03-25T16:13:38.000+0000 D [restore/2025-03-25T16:13:36.827616408Z] mongod binary: mongod, version: v8.0.4-1
2025-03-25T16:13:38.000+0000 I [restore/2025-03-25T16:13:36.827616408Z] moving to state starting
2025-03-25T16:13:38.000+0000 I [restore/2025-03-25T16:13:36.827616408Z] waiting for cluster
2025-03-25T16:13:48.000+0000 D [restore/2025-03-25T16:13:36.827616408Z] converged to state starting
2025-03-25T16:13:48.000+0000 D [restore/2025-03-25T16:13:36.827616408Z] starting
2025-03-25T16:13:48.000+0000 I [restore/2025-03-25T16:13:36.827616408Z] moving to state running
2025-03-25T16:13:48.000+0000 I [restore/2025-03-25T16:13:36.827616408Z] waiting for cluster
2025-03-25T16:14:03.000+0000 D [restore/2025-03-25T16:13:36.827616408Z] converged to state running
2025-03-25T16:14:03.000+0000 I [restore/2025-03-25T16:13:36.827616408Z] send to stopAgent chan
2025-03-25T16:14:03.000+0000 D [restore/2025-03-25T16:13:36.827616408Z] stop agents heartbeats
2025-03-25T16:14:03.000+0000 I [restore/2025-03-25T16:13:36.827616408Z] stopping mongod and flushing old data
2025-03-25T16:14:03.000+0000 D [restore/2025-03-25T16:13:36.827616408Z] shutdown server
2025-03-25T16:14:03.000+0000 D [restore/2025-03-25T16:13:36.827616408Z] waiting for the node to shutdown
2025-03-25T16:19:08.000+0000 E [pitr] init: get conf: get: server selection error: server selection timeout, current topology: { Type: ReplicaSetNoPrimary, Servers: [{ Addr: perconamongo8:27017, Type: RSSecondary, Average RTT: 710142 }, { Addr: perconamongo7:27017, Type: Unknown, Last error: dial tcp: lookup perconamongo7 on 127.0.0.11:53: no such host }, { Addr: perconamongoarb:27017, Type: Unknown, Last error: dial tcp: lookup perconamongoarb on 127.0.0.11:53: no such host }, ] }
there is no way of “restoring to the primary or secondary”. PBM restores ALL the members of the replica set to the same point in time.
The doc page you linked is about restoring from a logical backup and since you are trying to restore a physical backup you should be reading this instead.
the doc page about physical restores mentions stopping mongos router and arbiter nodes. Don’t stop primary or secondary. Also check instructions here for running in Docker.
If you still suspect an issue please open a bug report [here] (Jira) with full instructions to reproduce so the dev team can take a look.