Hello Team,
I’ve 2 node replica setup and configure backup using pbm and it will store into the s3 bucket. Scheduled cron job for the backups as very sunday and wendesday base backup and the remaining days are incremental backup.
I’m trying to restore the latest base backup and this is output,
[root@mongodb-test3 ~]# pbm restore --time=“2024-11-05T18:30:02”
Starting restore 2024-11-07T05:53:34.381347779Z to point-in-time 2024-11-05T18:30:02 from ‘2024-11-04T18:30:02Z’…
when I check the pbm status it is showing as,
Cluster:
rs1:
- 192.168.40.41:27017 [P]: pbm-agent [v2.6.0] FAILED status:
ERROR with lost agent, last heartbeat: 1730958885
- 192.168.40.40:27017 [S]: pbm-agent [v2.6.0] FAILED status:
ERROR with lost agent, last heartbeat: 1730958890
Can anyone help me with this. What are the steps do i need to follow before restoration.
regards,
JP
Hi, it seems PBM ran into some issue. Can you check the pbm-agent logs on each node with
journalctl -u pbm-agent
?
Hello,
Thanks for your reply.
This is the command I’m using for the restore,
[root@65981f33aae5 ~]# pbm restore --time=“2024-11-05T18:30:02”
Starting restore 2024-11-07T12:15:58.27830722Z to point-in-time 2024-11-05T18:30:02 from ‘2024-11-04T18:30:02Z’
…
and I can see the output for “journalctl -u pbm-agent -f”
Nov 07 17:45:58 mongodb-test3 pbm-agent[59256]: 2024-11-07T17:45:58.000+0530 I got command restore [name: 2024-11-07T12:15:58.27830722Z, snapshot: 202 4-11-04T18:30:02Z point-in-time: <1730831402,0>] <ts: 1730981758>
Nov 07 17:45:58 mongodb-test3 pbm-agent[59256]: 2024-11-07T17:45:58.000+0530 I got epoch {1730958814 6}
Nov 07 17:45:58 mongodb-test3 pbm-agent[59256]: 2024-11-07T17:45:58.000+0530 I [restore/2024-11-07T12:15:58.27830722Z] to time: 2024-11-05T18:30:02Z
Nov 07 17:45:58 mongodb-test3 pbm-agent[59256]: 2024-11-07T17:45:58.000+0530 I [restore/2024-11-07T12:15:58.27830722Z] oplog slicer disabled
Nov 07 17:45:58 mongodb-test3 pbm-agent[59256]: 2024-11-07T17:45:58.000+0530 I [restore/2024-11-07T12:15:58.27830722Z] backup: 2024-11-04T18:30:02Z
Nov 07 17:45:58 mongodb-test3 pbm-agent[59256]: 2024-11-07T17:45:58.000+0530 I [restore/2024-11-07T12:15:58.27830722Z] recovery started
Nov 07 17:45:58 mongodb-test3 pbm-agent[59256]: 2024-11-07T17:45:58.000+0530 D [restore/2024-11-07T12:15:58.27830722Z] port: 27218
Nov 07 17:45:58 mongodb-test3 pbm-agent[59256]: 2024-11-07T17:45:58.000+0530 D [restore/2024-11-07T12:15:58.27830722Z] uploading “.pbm.restore/2024-11 -07T12:15:58.27830722Z/rs.rs1/node.192.168.40.41:27017.hb” [size hint: 10 (10.00B); part size: 10485760 (10.00MB)]
Nov 07 17:45:59 mongodb-test3 pbm-agent[59256]: 2024-11-07T17:45:59.000+0530 D [restore/2024-11-07T12:15:58.27830722Z] uploading “.pbm.restore/2024-11 -07T12:15:58.27830722Z/rs.rs1/rs.hb” [size hint: 10 (10.00B); part size: 10485760 (10.00MB)]
Nov 07 17:45:59 mongodb-test3 pbm-agent[59256]: 2024-11-07T17:45:59.000+0530 D [restore/2024-11-07T12:15:58.27830722Z] uploading “.pbm.restore/2024-11 -07T12:15:58.27830722Z/cluster.hb” [size hint: 10 (10.00B); part size: 10485760 (10.00MB)]
Nov 07 17:45:59 mongodb-test3 pbm-agent[59256]: 2024-11-07T17:45:59.000+0530 D [restore/2024-11-07T12:15:58.27830722Z] mongod binary: mongod, version: v6.0.17
Nov 07 17:46:00 mongodb-test3 pbm-agent[59256]: 2024-11-07T17:46:00.000+0530 D [restore/2024-11-07T12:15:58.27830722Z] get src 2024-11-03T18:30:02Z
Nov 07 17:46:00 mongodb-test3 pbm-agent[59256]: 2024-11-07T17:46:00.000+0530 D [restore/2024-11-07T12:15:58.27830722Z] get src 2024-11-02T18:30:01Z
Nov 07 17:46:35 mongodb-test3 pbm-agent[59256]: 2024-11-07T17:46:35.000+0530 I [restore/2024-11-07T12:15:58.27830722Z] moving to state starting
Nov 07 17:46:35 mongodb-test3 pbm-agent[59256]: 2024-11-07T17:46:35.000+0530 D [restore/2024-11-07T12:15:58.27830722Z] uploading “.pbm.restore/2024-11 -07T12:15:58.27830722Z/rs.rs1/node.192.168.40.41:27017.starting” [size hint: 10 (10.00B); part size: 10485760 (10.00MB)]
Nov 07 17:46:35 mongodb-test3 pbm-agent[59256]: 2024-11-07T17:46:35.000+0530 I [restore/2024-11-07T12:15:58.27830722Z] waiting for starting
status i n rs map[.pbm.restore/2024-11-07T12:15:58.27830722Z/rs.rs1/node.192.168.40.40:27017:{} .pbm.restore/2024-11-07T12:15:58.27830722Z/rs.rs1/node.192.168. 40.41:27017:{}]
Nov 07 17:46:43 mongodb-test3 pbm-agent[59256]: 2024-11-07T17:46:43.000+0530 D [restore/2024-11-07T12:15:58.27830722Z] uploading “.pbm.restore/2024-11 -07T12:15:58.27830722Z/rs.rs1/rs.starting” [size hint: 10 (10.00B); part size: 10485760 (10.00MB)]
Nov 07 17:46:43 mongodb-test3 pbm-agent[59256]: 2024-11-07T17:46:43.000+0530 I [restore/2024-11-07T12:15:58.27830722Z] waiting for shards map[.pbm.res tore/2024-11-07T12:15:58.27830722Z/rs.rs1/rs:{}]
Nov 07 17:46:49 mongodb-test3 pbm-agent[59256]: 2024-11-07T17:46:49.000+0530 D [restore/2024-11-07T12:15:58.27830722Z] uploading “.pbm.restore/2024-11 -07T12:15:58.27830722Z/cluster.starting” [size hint: 10 (10.00B); part size: 10485760 (10.00MB)]
Nov 07 17:46:49 mongodb-test3 pbm-agent[59256]: 2024-11-07T17:46:49.000+0530 I [restore/2024-11-07T12:15:58.27830722Z] waiting for cluster
Nov 07 17:46:55 mongodb-test3 pbm-agent[59256]: 2024-11-07T17:46:55.000+0530 D [restore/2024-11-07T12:15:58.27830722Z] converged to state starting
Nov 07 17:46:55 mongodb-test3 pbm-agent[59256]: 2024-11-07T17:46:55.000+0530 D [restore/2024-11-07T12:15:58.27830722Z] starting
Nov 07 17:46:55 mongodb-test3 pbm-agent[59256]: 2024-11-07T17:46:55.000+0530 I [restore/2024-11-07T12:15:58.27830722Z] moving to state running
Nov 07 17:46:55 mongodb-test3 pbm-agent[59256]: 2024-11-07T17:46:55.000+0530 D [restore/2024-11-07T12:15:58.27830722Z] uploading “.pbm.restore/2024-11 -07T12:15:58.27830722Z/rs.rs1/node.192.168.40.41:27017.running” [size hint: 10 (10.00B); part size: 10485760 (10.00MB)]
Nov 07 17:46:56 mongodb-test3 pbm-agent[59256]: 2024-11-07T17:46:56.000+0530 I [restore/2024-11-07T12:15:58.27830722Z] waiting for running
status in rs map[.pbm.restore/2024-11-07T12:15:58.27830722Z/rs.rs1/node.192.168.40.40:27017:{} .pbm.restore/2024-11-07T12:15:58.27830722Z/rs.rs1/node.192.168.4 0.41:27017:{}]
Nov 07 17:47:04 mongodb-test3 pbm-agent[59256]: 2024-11-07T17:47:04.000+0530 D [restore/2024-11-07T12:15:58.27830722Z] uploading “.pbm.restore/2024-11 -07T12:15:58.27830722Z/rs.rs1/rs.running” [size hint: 10 (10.00B); part size: 10485760 (10.00MB)]
Nov 07 17:47:04 mongodb-test3 pbm-agent[59256]: 2024-11-07T17:47:04.000+0530 I [restore/2024-11-07T12:15:58.27830722Z] waiting for shards map[.pbm.res tore/2024-11-07T12:15:58.27830722Z/rs.rs1/rs:{}]
Nov 07 17:47:10 mongodb-test3 pbm-agent[59256]: 2024-11-07T17:47:10.000+0530 D [restore/2024-11-07T12:15:58.27830722Z] uploading “.pbm.restore/2024-11 -07T12:15:58.27830722Z/cluster.running” [size hint: 10 (10.00B); part size: 10485760 (10.00MB)]
Nov 07 17:47:10 mongodb-test3 pbm-agent[59256]: 2024-11-07T17:47:10.000+0530 I [restore/2024-11-07T12:15:58.27830722Z] waiting for cluster
Nov 07 17:47:16 mongodb-test3 pbm-agent[59256]: 2024-11-07T17:47:16.000+0530 D [restore/2024-11-07T12:15:58.27830722Z] converged to state running
Nov 07 17:47:16 mongodb-test3 pbm-agent[59256]: 2024-11-07T17:47:16.000+0530 I [restore/2024-11-07T12:15:58.27830722Z] send to stopAgent chan
Nov 07 17:47:17 mongodb-test3 pbm-agent[59256]: 2024-11-07T17:47:17.000+0530 D [restore/2024-11-07T12:15:58.27830722Z] stop agents heartbeats
Nov 07 17:47:17 mongodb-test3 pbm-agent[59256]: 2024-11-07T17:47:17.000+0530 I [restore/2024-11-07T12:15:58.27830722Z] stopping mongod and flushing ol d data
Nov 07 17:47:17 mongodb-test3 pbm-agent[59256]: 2024-11-07T17:47:17.000+0530 D [restore/2024-11-07T12:15:58.27830722Z] shutdown server
Nov 07 17:47:17 mongodb-test3 pbm-agent[59256]: 2024-11-07T17:47:17.000+0530 D [restore/2024-11-07T12:15:58.27830722Z] waiting to became secondary
Nov 07 17:47:18 mongodb-test3 pbm-agent[59256]: 2024-11-07T17:47:18.000+0530 D [restore/2024-11-07T12:15:58.27830722Z] waiting to became secondary
Nov 07 17:47:19 mongodb-test3 pbm-agent[59256]: 2024-11-07T17:47:19.000+0530 D [restore/2024-11-07T12:15:58.27830722Z] waiting to became secondary
Nov 07 17:47:20 mongodb-test3 pbm-agent[59256]: 2024-11-07T17:47:20.000+0530 D [restore/2024-11-07T12:15:58.27830722Z] waiting to became secondary
Nov 07 17:47:21 mongodb-test3 pbm-agent[59256]: 2024-11-07T17:47:21.000+0530 D [restore/2024-11-07T12:15:58.27830722Z] waiting to became secondary
Nov 07 17:47:22 mongodb-test3 pbm-agent[59256]: 2024-11-07T17:47:22.000+0530 D [restore/2024-11-07T12:15:58.27830722Z] waiting to became secondary
Nov 07 17:47:23 mongodb-test3 pbm-agent[59256]: 2024-11-07T17:47:23.000+0530 D [restore/2024-11-07T12:15:58.27830722Z] waiting to became secondary
Nov 07 17:47:24 mongodb-test3 pbm-agent[59256]: 2024-11-07T17:47:24.000+0530 D [restore/2024-11-07T12:15:58.27830722Z] waiting to became secondary
and in the secondary server I can see the logs as,
[root@mongodb-test4 ~]# journalctl -u pbm-agent -f
Nov 07 17:47:13 mongodb-test4 pbm-agent[24530]: 2024-11-07T17:47:13.000+0530 D [restore/2024-11-07T12:15:58.27830722Z] shutdown server
Nov 07 17:47:29 mongodb-test4 pbm-agent[24530]: 2024-11-07T17:47:29.000+0530 D [restore/2024-11-07T12:15:58.27830722Z] waiting for the node to shutdown
Nov 07 17:47:30 mongodb-test4 pbm-agent[24530]: 2024-11-07T17:47:30.000+0530 D [restore/2024-11-07T12:15:58.27830722Z] uploading “.pbm.restore/2024-11-07T12:15:58.27830722Z/rs.rs1/node.192.168.40.40:27017.error” [size hint: 115 (115.00B); part size: 10485760 (10.00MB)]
Nov 07 17:47:30 mongodb-test4 pbm-agent[24530]: 2024-11-07T17:47:30.000+0530 E [restore/2024-11-07T12:15:58.27830722Z] restore: shutdown: check for lock file /data/db/mongod.lock: stat /data/db/mongod.lock: no such file or directory
Nov 07 17:47:30 mongodb-test4 pbm-agent[24530]: 2024-11-07T17:47:30.000+0530 I change stream was closed
Nov 07 17:47:30 mongodb-test4 pbm-agent[24530]: 2024-11-07T17:47:30.000+0530 D [restore/2024-11-07T12:15:58.27830722Z] uploading “.pbm.restore/2024-11-07T12:15:58.27830722Z/rs.rs1/log/192.168.40.40:27017.0.log” [size hint: -1 (unknown); part size: 10485760 (10.00MB)]
Nov 07 17:47:30 mongodb-test4 pbm-agent[24530]: 2024-11-07T17:47:30.000+0530 D [restore/2024-11-07T12:15:58.27830722Z] hearbeats stopped
Nov 07 17:47:31 mongodb-test4 pbm-agent[24530]: 2024/11/07 17:47:31 Exit:
Nov 07 17:47:31 mongodb-test4 systemd[1]: pbm-agent.service: Deactivated successfully.
Nov 07 17:47:31 mongodb-test4 systemd[1]: pbm-agent.service: Consumed 37.606s CPU time.
Hi, can you check the contents of this file in the S3 bucket?
“.pbm.restore/2024-11-07T12:15:58.27830722Z/rs.rs1/node.192.168.40.40:27017.error”
also check any mongo logs on mongodb-test4 as it seems that node had an issue during the recovery process
Hi, I can see the following information in the mentioned error log
1730981850:shutdown: check for lock file /data/db/mongod.lock: stat /data/db/mongod.lock: no such file or directory
Where can we check the mongo logs. As checked, i don’t find any in the mongodb-test4 machine
[root@mongodb-test4 ~]# ls -lrt /var/log/mongodb/mongod.log
-rw-r-----. 1 mongod mongod 0 Aug 14 05:03 /var/log/mongodb/mongod.log
Since the instance is running on the docker container i’m not sure where the log file is exists.
Seems you hit a bug. I have created [PBM-1435] - Jira for the dev team to take a look
Thanks a lot
I’ll wait for the updates
Hi Ivan,
Meanwhile, could you please give me the document for the PBM backup and restore (PITR). It would be helpful for me while configuring in the production environment.
Thank you.
How long does it take for the bug fix, any idea.
Hi @Jayaprakash_Duraisam !
Do I understand correctly that mongodb-test4 - is docker container which is running mongodb and pbm-agent is connecting to this instance from the outside?
Hello @oleksandr.havryliak
Sorry for the delayed response!!!
Yes, you’re right. Mongodb is running in the docker container and I’m doing the backups and restore from the outside.
regards,
Jayaprakash Duraisamy
Hi!
Unfortunately this setup is not supported for the physical/incremental backup/restore. To be able to perform physical/incremental restore pbm-agent must have R/W access to mongodb data directory and also must have direct access to the mongod executable.
Thanks @oleksandr.havryliak
Could you please help me to achieve this. I mean, how to give R/W access to the mongodb data directory and mongod executable
If we are talking about containers you can try using supervisord to start and run both mongodb and pbm-agent inside the container. You can find the examples here: mongodb.ini, pbm-agent.ini