Hi Anil,
thank you for your kind answer.
I will try to provide you with all the details you requested.
First of all I am trying to do a logical backup, so if I correctly understood how the “startingStatus: 60” configuration works this should be irrelevant because it only applies in the case of physical backups, can you confirm this?
I checked /var/log/messages and only on one of the two data nodes where the pbm-agent is running do I find this:
pbm-agent[3059366]: 2024-07-15T23:00:02.000+0200 I got command cleanup <ts: 1721077201>
pbm-agent[3059366]: 2024-07-15T23:00:02.000+0200 I got epoch {1721077201 1}
pbm-agent[3059366]: 2024-07-15T23:00:02.000+0200 D [cleanup] skip: lock not acquired
pbm-agent[3059366]: 2024-07-15T23:00:02.000+0200 I got epoch {1721077201 1}
pbm-agent[3059366]: 2024-07-15T23:00:02.000+0200 D [cleanup] skip: lock not acquired
pbm-agent[3059366]: 2024-07-15T23:00:02.000+0200 I got command backup [name: 2024-07-15T21:00:01Z, compression: s2 (level: default)] <ts: 1721077201>
pbm-agent[3059366]: 2024-07-15T23:00:02.000+0200 I got command backup [name: 2024-07-15T21:00:01Z, compression: s2 (level: default)] <ts: 1721077201>
pbm-agent[3059366]: 2024-07-15T23:00:02.000+0200 I got epoch {1721077201 1}
pbm-agent[3059366]: 2024-07-15T23:00:02.000+0200 I got epoch {1721077201 1}
pbm-agent[3059366]: 2024-07-15T23:00:02.000+0200 D [backup/2024-07-15T21:00:01Z] init backup meta
pbm-agent[3059366]: 2024-07-15T23:00:02.000+0200 D [backup/2024-07-15T21:00:01Z] init backup meta
pbm-agent[3059366]: 2024-07-15T23:00:02.000+0200 D [backup/2024-07-15T21:00:01Z] nomination list for MongoDBReplset1: [[data_node1:27101] [data_node2:27102]]
pbm-agent[3059366]: 2024-07-15T23:00:02.000+0200 D [backup/2024-07-15T21:00:01Z] nomination list for MongoDBReplset1: [[data_node1:27101] [data_node2:27102]]
pbm-agent[3059366]: 2024-07-15T23:00:02.000+0200 D [backup/2024-07-15T21:00:01Z] nomination MongoDBReplset1, set candidates [data_node1:27101]
pbm-agent[3059366]: 2024-07-15T23:00:02.000+0200 D [backup/2024-07-15T21:00:01Z] nomination MongoDBReplset1, set candidates [data_node1:27101]
pbm-agent[3059366]: 2024-07-15T23:00:07.000+0200 D [backup/2024-07-15T21:00:01Z] nomination MongoDBReplset1, set candidates [data_node2:27102]
pbm-agent[3059366]: 2024-07-15T23:00:07.000+0200 D [backup/2024-07-15T21:00:01Z] nomination MongoDBReplset1, set candidates [data_node2:27102]
pbm-agent[3059366]: 2024-07-15T23:00:27.000+0200 D [backup/2024-07-15T21:00:01Z] nomination timeout
pbm-agent[3059366]: 2024-07-15T23:00:27.000+0200 D [backup/2024-07-15T21:00:01Z] nomination timeout
pbm-agent[3059366]: 2024-07-15T23:00:27.000+0200 D [backup/2024-07-15T21:00:01Z] skip after nomination, probably started by another node
pbm-agent[3059366]: 2024-07-15T23:00:27.000+0200 D [backup/2024-07-15T21:00:01Z] skip after nomination, probably started by another node
it seems to be trying to find a candidate but failing. Mongo logs contain nothing relevant.
I paste below the other required logs:
pbm status
Cluster:
========
MongoDBReplset1:
- MongoDBReplset1/data_node1:27101 [S]: pbm-agent v2.5.0 OK
- MongoDBReplset1/data_node2:27102 [P]: pbm-agent v2.5.0 OK
- MongoDBReplset1/arbiter_node:27110 [!Arbiter]: arbiter node is not supported
PITR incremental backup:
========================
Status [OFF]
Currently running:
==================
(none)
Backups:
========
FS /dump/mongo_backup
(none)
pbm list
Backup snapshots:
pbm logs
2024-07-13T21:00:02Z I [MongoDBReplset1/data_node2:27102] got epoch {1720904401 1}
2024-07-13T21:00:02Z I [MongoDBReplset1/data_node1:27101] got command backup [name: 2024-07-13T21:00:01Z, compression: s2 (level: default)] <ts: 1720904401>
2024-07-13T21:00:02Z I [MongoDBReplset1/data_node1:27101] got epoch {1720904401 1}
2024-07-14T21:00:02Z I [MongoDBReplset1/data_node1:27101] got command backup [name: 2024-07-14T21:00:02Z, compression: s2 (level: default)] <ts: 1720990802>
2024-07-14T21:00:02Z I [MongoDBReplset1/data_node1:27101] got epoch {1720990801 2}
2024-07-14T21:00:02Z I [MongoDBReplset1/data_node2:27102] got command backup [name: 2024-07-14T21:00:02Z, compression: s2 (level: default)] <ts: 1720990802>
2024-07-14T21:00:02Z I [MongoDBReplset1/data_node2:27102] got epoch {1720990801 2}
2024-07-14T21:00:03Z I [MongoDBReplset1/data_node1:27101] [backup/2024-07-14T21:00:02Z] backup started
2024-07-14T21:05:21Z I [MongoDBReplset1/data_node1:27101] [backup/2024-07-14T21:00:02Z] mongodump finished, waiting for the oplog
2024-07-14T21:05:26Z I [MongoDBReplset1/data_node1:27101] [backup/2024-07-14T21:00:02Z] created chunk 2024-07-14T21:00:03 - 2024-07-14T21:05:23
2024-07-14T21:05:26Z I [MongoDBReplset1/data_node1:27101] [backup/2024-07-14T21:00:02Z] dropping tmp collections
2024-07-14T21:05:28Z I [MongoDBReplset1/data_node1:27101] [backup/2024-07-14T21:00:02Z] backup finished
2024-07-15T21:00:02Z I [MongoDBReplset1/data_node1:27101] got command cleanup <ts: 1721077201>
2024-07-15T21:00:02Z I [MongoDBReplset1/data_node1:27101] got epoch {1721077201 1}
2024-07-15T21:00:02Z I [MongoDBReplset1/data_node2:27102] got command cleanup <ts: 1721077201>
2024-07-15T21:00:02Z I [MongoDBReplset1/data_node2:27102] got epoch {1721077201 1}
2024-07-15T21:00:02Z I [MongoDBReplset1/data_node2:27102] got command backup [name: 2024-07-15T21:00:01Z, compression: s2 (level: default)] <ts: 1721077201>
2024-07-15T21:00:02Z I [MongoDBReplset1/data_node2:27102] got epoch {1721077201 1}
2024-07-15T21:00:02Z I [MongoDBReplset1/data_node1:27101] got command backup [name: 2024-07-15T21:00:01Z, compression: s2 (level: default)] <ts: 1721077201>
2024-07-15T21:00:02Z I [MongoDBReplset1/data_node1:27101] got epoch {1721077201 1}
journalctl -u pbm-agent.service
Jul 15 23:00:02 srvmongodata1 pbm-agent[1315]: 2024-07-15T23:00:02.000+0200 I got command cleanup <ts: 1721077201>
Jul 15 23:00:02 srvmongodata1 pbm-agent[1315]: 2024-07-15T23:00:02.000+0200 I got epoch {1721077201 1}
Jul 15 23:00:02 srvmongodata1 pbm-agent[1315]: 2024-07-15T23:00:02.000+0200 D [cleanup] got physical restores list: 0
Jul 15 23:00:02 srvmongodata1 pbm-agent[1315]: 2024-07-15T23:00:02.000+0200 D [cleanup] got backups list: 0
Jul 15 23:00:02 srvmongodata1 pbm-agent[1315]: 2024-07-15T23:00:02.000+0200 I got command backup [name: 2024-07-15T21:00:01Z, compression: s2 (level: default)] <ts: 1721077201>
Jul 15 23:00:02 srvmongodata1 pbm-agent[1315]: 2024-07-15T23:00:02.000+0200 I got epoch {1721077201 1}
Jul 15 23:00:17 srvmongodata1 pbm-agent[1315]: 2024-07-15T23:00:17.000+0200 D [backup/2024-07-15T21:00:01Z] nomination timeout
Jul 15 23:00:17 srvmongodata1 pbm-agent[1315]: 2024-07-15T23:00:17.000+0200 D [backup/2024-07-15T21:00:01Z] skip after nomination, probably started by another node
Jul 15 23:00:02 srvmongodata2 pbm-agent[3059366]: 2024-07-15T23:00:02.000+0200 I got command cleanup <ts: 1721077201>
Jul 15 23:00:02 srvmongodata2 pbm-agent[3059366]: 2024-07-15T23:00:02.000+0200 I got epoch {1721077201 1}
Jul 15 23:00:02 srvmongodata2 pbm-agent[3059366]: 2024-07-15T23:00:02.000+0200 D [cleanup] skip: lock not acquired
Jul 15 23:00:02 srvmongodata2 pbm-agent[3059366]: 2024-07-15T23:00:02.000+0200 I got command backup [name: 2024-07-15T21:00:01Z, compression: s2 (level: default)] <ts: 1721077201>
Jul 15 23:00:02 srvmongodata2 pbm-agent[3059366]: 2024-07-15T23:00:02.000+0200 I got epoch {1721077201 1}
Jul 15 23:00:02 srvmongodata2 pbm-agent[3059366]: 2024-07-15T23:00:02.000+0200 D [backup/2024-07-15T21:00:01Z] init backup meta
Jul 15 23:00:02 srvmongodata2 pbm-agent[3059366]: 2024-07-15T23:00:02.000+0200 D [backup/2024-07-15T21:00:01Z] nomination list for MongoDBReplset1: [[data_node1:27101] [data_node2:27102>
Jul 15 23:00:02 srvmongodata2 pbm-agent[3059366]: 2024-07-15T23:00:02.000+0200 D [backup/2024-07-15T21:00:01Z] nomination MongoDBReplset1, set candidates [data_node1:27101]
Jul 15 23:00:07 srvmongodata2 pbm-agent[3059366]: 2024-07-15T23:00:07.000+0200 D [backup/2024-07-15T21:00:01Z] nomination MongoDBReplset1, set candidates [data_node2:27102]
Jul 15 23:00:27 srvmongodata2 pbm-agent[3059366]: 2024-07-15T23:00:27.000+0200 D [backup/2024-07-15T21:00:01Z] nomination timeout
Jul 15 23:00:27 srvmongodata2 pbm-agent[3059366]: 2024-07-15T23:00:27.000+0200 D [backup/2024-07-15T21:00:01Z] skip after nomination, probably started by another node
I’m not sure how pbm decides which node is the best candidate to perform the backup, does it use mongo’s election mechanism? Is it possible that I also have to install the pbm-agent on the arbiter node otherwise it won’t be able to nominate a candidate?
Thank you again for your patience and help
Regards,
Giuseppe