Mongod-failure-after-restore

Hello Ivan,

Just a follow-up on this ticket. Mongod-failure-after-restore - #2 by Ivan_Groenewold

I attempted to perform a restore on the secondary instance despite the Mongo service failure. The PBM status has been mentioned above.

Could you please share the documentation or Linux commands for performing the restore? It would be very helpful.

Thank you.

I believe you need to follow the initial sync procedure as @Ivan_Groenewold mentioned. Here’s the guide:

Hi radoslaw.szulgo,

we have tried initial sync procedure also given output here – > Mongod-failure-after-restore

Hi radoslaw.szulgo

We have tried the initial sync procedure and provided the output Mongod-failure-after-restore . We are also using PBM for backup and restore operations.

can you please provide some more information on what doesn’t work for you? Any error messages? Without that it’s really hard to help in your case.

We are using a cloud infrastructure running on Ubuntu with three servers: one primary and two secondary. We are running Percona Backup for MongoDB (PBM) version v2.10. For backups, we use Amazon S3 as the storage destination.

However, when we attempt to restore, the mongod service fails even though the restoration process is reported as successful. After this, all three mongod services fail. I am trying to restore the data only on a secondary server.

Command used for backup:
pbm backup --type=physical

Command used for restore:
pbm restore --time=“2025-09-24T13:30:00Z”

PBM status after backup:

pbm status
Cluster:

poc:

  • private_ip:27018 [S]: pbm-agent [v2.10.0] OK

  • private_ip:27018 [S]: pbm-agent [v2.10.0] OK

  • private_ip:27018 [P]: pbm-agent [v2.10.0] OK

PITR incremental backup:

Status [ON]
Running members: poc/private_ip:27018

Currently running:

(none)

Backups:

S3 ap-south-1 s3://mongo-percona-bk
Snapshots:
2025-10-01T12:13:21Z 33.59GB success [restore_to_time: 2025-10-01T12:13:24]
PITR chunks [11.10MB]:
2025-10-01T12:13:25 - 2025-10-02T02:53:23

PBM status after restore:

pbm status
Cluster:

poc:

  • private_ip:27018 : pbm-agent [NOT FOUND]

  • private_ip:27018 : pbm-agent [NOT FOUND]

  • private_ip:27018 : pbm-agent [NOT FOUND]

PITR incremental backup:

Status [OFF]

Currently running:

(none)

Backups:

S3 ap-south-1 s3://mongo-percona-bk
(none)

Thanks for the explanation and more context - this definitely helps. Why do you want to use PBM instead of initial sync to “rebuild/recover” 1/3 nodes? As @Ivan_Groenewold already wrote in the other topic - PBM is used to recover all nodes rather than a specific one.

We are taking incremental backups through S3. When we initiate the backup from the secondary server, it works successfully. However, when we try to restore the data to the secondary server, all mongod processes on both the primary and secondary servers fail. Additionally, we are unable to perform this backup and restore activity using the Initial Sync process.

Through the Initial Sync process, we cannot take or restore database backups via S3, as it is a MongoDB replication-level rebuild mechanism, not a PBM-based backup/restore method.

Right, you cannot do “backup” for initial-sync… but why would you?

Can we take one step back and clarify what’s the use case you’re trying to implement? Why do you want to restore just 1 out of 3 nodes? Something is broken or that’s a “recovery” procedure for any node? Something else?

Right now, we are performing a POC with the goal of taking a full data backup using PBM. After that, we plan to enable incremental PITR to take hourly backups. During the POC, we successfully pushed the data to S3 through PBM. However, when we tried to restore the same data after enabling PITR, the data restoration completed successfully, but all mongod processes on both the primary and secondary servers failed. We have also configured a replica set between the primary and secondary servers to ensure data synchronization.

Can you provide logs from at least 1 server? What error is shown there?

The procedure I followed for the restore was: I removed the MongoDB data directory, but after doing so the mongod service failed to start. I started it again and then executed the restore command pbm restore 2025-10-30T07:01:38Z. A few minutes later, the mongod service went inactive. I’ve also attached the last few logs.

2025-10-30T05:51:17.000+0000 D [restore/2025-10-30T05:38:05.927063022Z] download stat: buf 536870912, arena 268435456, span 33554432, spanNum 8, cc 2, [{2 0} {2 0}]
2025-10-30T05:51:17.000+0000 I [restore/2025-10-30T05:38:05.927063022Z] preparing data
2025-10-30T05:51:35.000+0000 D [restore/2025-10-30T05:38:05.927063022Z] oplogTruncateAfterPoint: {1761737952 1}
2025-10-30T05:51:37.000+0000 I [restore/2025-10-30T05:38:05.927063022Z] recovering oplog as standalone
2025-10-30T05:51:54.000+0000 I [restore/2025-10-30T05:38:05.927063022Z] clean-up and reset replicaset config
2025-10-30T05:52:06.000+0000 D [restore/2025-10-30T05:38:05.927063022Z] uploading “.pbm.restore/2025-10-30T05:38:05.927063022Z/rs.poc/node.10.200.10.98:27018.hb” [size hint: 10 (10.00B); part size: 10485760 (10.00MB)]
2025-10-30T05:52:06.000+0000 D [restore/2025-10-30T05:38:05.927063022Z] uploading “.pbm.restore/2025-10-30T05:38:05.927063022Z/rs.poc/rs.hb” [size hint: 10 (10.00B); part size: 10485760 (10.00MB)]
2025-10-30T05:52:06.000+0000 D [restore/2025-10-30T05:38:05.927063022Z] uploading “.pbm.restore/2025-10-30T05:38:05.927063022Z/cluster.hb” [size hint: 10 (10.00B); part size: 10485760 (10.00MB)]
2025-10-30T05:52:12.000+0000 D [restore/2025-10-30T05:38:05.927063022Z] dropping ‘admin.pbmAgents’
2025-10-30T05:52:12.000+0000 D [restore/2025-10-30T05:38:05.927063022Z] dropping ‘admin.pbmBackups’
2025-10-30T05:52:12.000+0000 D [restore/2025-10-30T05:38:05.927063022Z] dropping ‘admin.pbmRestores’
2025-10-30T05:52:12.000+0000 D [restore/2025-10-30T05:38:05.927063022Z] dropping ‘admin.pbmCmd’
2025-10-30T05:52:12.000+0000 D [restore/2025-10-30T05:38:05.927063022Z] dropping ‘admin.pbmPITRChunks’
2025-10-30T05:52:12.000+0000 D [restore/2025-10-30T05:38:05.927063022Z] dropping ‘admin.pbmPITR’
2025-10-30T05:52:12.000+0000 D [restore/2025-10-30T05:38:05.927063022Z] dropping ‘admin.pbmOpLog’
2025-10-30T05:52:12.000+0000 D [restore/2025-10-30T05:38:05.927063022Z] dropping ‘admin.pbmLockOp’
2025-10-30T05:52:12.000+0000 D [restore/2025-10-30T05:38:05.927063022Z] dropping ‘admin.pbmLock’
2025-10-30T05:52:12.000+0000 D [restore/2025-10-30T05:38:05.927063022Z] dropping ‘admin.pbmLock’
2025-10-30T05:52:12.000+0000 D [restore/2025-10-30T05:38:05.927063022Z] dropping ‘admin.pbmLog’
2025-10-30T05:52:14.000+0000 I [restore/2025-10-30T05:38:05.927063022Z] restore on node succeed
2025-10-30T05:52:14.000+0000 I [restore/2025-10-30T05:38:05.927063022Z] moving to state done
2025-10-30T05:52:14.000+0000 D [restore/2025-10-30T05:38:05.927063022Z] uploading “.pbm.restore/2025-10-30T05:38:05.927063022Z/rs.poc/node.10.200.10.98:27018.done” [size hint: 10 (10.00B); part size: 10485760 (10.00MB)]
2025-10-30T05:52:14.000+0000 I [restore/2025-10-30T05:38:05.927063022Z] waiting for done status in rs map[.pbm.restore/2025-10-30T05:38:05.927063022Z/rs.poc/node.10.200.10.104:27018:{} .pbm.restore/2025-10-30T05:38:05.927063022Z/rs.poc/node.10.200.10.94:27018:{} .pbm.restore/2025-10-30T05:38:05.927063022Z/rs.poc/node.10.200.10.98:27018:{}]
2025-10-30T05:52:19.000+0000 D [restore/2025-10-30T05:38:05.927063022Z] uploading “.pbm.restore/2025-10-30T05:38:05.927063022Z/rs.poc/rs.done” [size hint: 10 (10.00B); part size: 10485760 (10.00MB)]
2025-10-30T05:52:19.000+0000 I [restore/2025-10-30T05:38:05.927063022Z] waiting for shards map[.pbm.restore/2025-10-30T05:38:05.927063022Z/rs.poc/rs:{}]
2025-10-30T05:52:24.000+0000 D [restore/2025-10-30T05:38:05.927063022Z] uploading “.pbm.restore/2025-10-30T05:38:05.927063022Z/cluster.done” [size hint: 10 (10.00B); part size: 10485760 (10.00MB)]
2025-10-30T05:52:24.000+0000 I [restore/2025-10-30T05:38:05.927063022Z] waiting for cluster
2025-10-30T05:52:29.000+0000 D [restore/2025-10-30T05:38:05.927063022Z] converged to state done
2025-10-30T05:52:29.000+0000 D [restore/2025-10-30T05:38:05.927063022Z] uploading “.pbm.restore/2025-10-30T05:38:05.927063022Z/rs.poc/stat.10.200.10.98:27018” [size hint: 73 (73.00B); part size: 10485760 (10.00MB)]
2025-10-30T05:52:29.000+0000 I [restore/2025-10-30T05:38:05.927063022Z] writing restore meta
2025-10-30T05:52:29.000+0000 W [restore/2025-10-30T05:38:05.927063022Z] meta .pbm.restore/2025-10-30T05:38:05.927063022Z.json already exists, trying write done status with ‘’
2025-10-30T05:52:29.000+0000 D [restore/2025-10-30T05:38:05.927063022Z] rm tmp conf
2025-10-30T05:52:29.000+0000 D [restore/2025-10-30T05:38:05.927063022Z] wait for cluster status
2025-10-30T05:52:34.000+0000 D [restore/2025-10-30T05:38:05.927063022Z] no cleanup strategy to apply
2025-10-30T05:52:34.000+0000 I [restore/2025-10-30T05:38:05.927063022Z] recovery successfully finished
2025-10-30T05:52:34.000+0000 I change stream was closed
2025-10-30T05:52:34.000+0000 D [restore/2025-10-30T05:38:05.927063022Z] hearbeats stopped
2025-10-30T05:52:34.000+0000 I Exit:
2025-10-30T05:52:34.000+0000 D [agentCheckup] deleting agent status
2025-10-30T05:53:05.000+0000 E Exit: connect to PBM: create mongo connection: ping: server selection error: server selection timeout, current topology: { Type: Unknown, Servers: [{ Addr: 127.0.0.1:27018, Type: Unknown, Last error: dial tcp 127.0.0.1:27018: connect: connection refused }, ] }
2025-10-30T05:53:35.000+0000 E Exit: connect to PBM: create mongo connection: ping: server selection error: server selection timeout, current topology: { Type: Unknown, Servers: [{ Addr: 127.0.0.1:27018, Type: Unknown, Last error: dial tcp 127.0.0.1:27018: connect: connection refused }, ] }

This states that the PBM agent connection to the PSMDB was unexpectedly interrupted. Can you please provide logs from the server so we can see what happened on the server side at that time?

which logs require syslog?

I don’t understand the question. Sorry…

You have asked for server logs we have many logs, so could you please specify which ones are required?

Ah, that’s clear now. I asked precisely about Percona Server for MongoDB logs:

By default under: /var/log/mongodb/server1.log*

Below are the MongoDB logs taken during the restoration.

mongod.log (60.9 KB)