PBM backup failed

Hello.
I am trying to backup mongodb using pbm-backup.
The base is over 400GB and running on a kubernetes cluster.
The setup consists of:
2 date (replicaset)
2 config (replicaset)
1 arbiter
1 manager/router

bash-4.2$ pbm status
Cluster:
========
configrs:
  - configrs/config1:27017: pbm-agent v1.6.0 OK
  - configrs/config2:27017: pbm-agent v1.6.0 OK
datars:
  - datars/data1:27017: pbm-agent v1.6.0 OK
  - datars/data2:27017: pbm-agent v1.6.0 OK

After a few hours making backup i heve error:

2021-11-10T13:53:10Z I [datars/data1:27017] [backup/2021-11-09T06:34:11Z] mongodump finished, waiting for the oplog
2021-11-10T13:53:12Z I [datars/data1:27017] [backup/2021-11-09T06:34:11Z] dropping tmp collections
2021-11-10T13:53:12Z I [datars/data1:27017] [backup/2021-11-09T06:34:11Z] mark RS as error `oplog: read data: oplog has insufficient range, some records since the last saved ts {1636439661 1} are missing. Run `pbm backup` to create a valid starting point for the PITR.`: <nil>
2021-11-10T13:53:12Z E [datars/data1:27017] [backup/2021-11-09T06:34:11Z] backup: oplog: read data: oplog has insufficient range, some records since the last saved ts {1636439661 1} are missing. Run `pbm backup` to create a valid starting point for the PITR.
2021-11-10T13:53:17Z I [configrs/config1:27017] [backup/2021-11-09T06:34:11Z] dropping tmp collections
2021-11-10T13:53:17Z I [configrs/config1:27017] [backup/2021-11-09T06:34:11Z] mark RS as error `check cluster for backup done: convergeCluster: backup on shard datars failed with: `: <nil>
2021-11-10T13:53:17Z I [configrs/config1:27017] [backup/2021-11-09T06:34:11Z] mark backup as error `check cluster for backup done: convergeCluster: backup on shard datars failed with: `: <nil>
2021-11-10T13:53:17Z E [configrs/config1:27017] [backup/2021-11-09T06:34:11Z] backup: check cluster for backup done: convergeCluster: backup on shard datars failed with:

Can anyone tell me where the problem may be?

1 Like

Hi @patrykr
backup: oplog: read data: oplog has insufficient range, some records since the last saved ts {1636439661 1} are missing. Run pbm backup to create a valid starting point for the PITR. error indicates that you’ve run out oplog collection (local.oplog.rs) capacity. In order to achieve consistency, after the backup was made, PBM also saves the oplog which covers the backup time. So when the backup is finished oplog events that coincide with the backup start time are already rewritten. Try to increase the oplog size so it is big enough to fit all events while the backup phase is running.

2 Likes

Hi Andrew, I’m having the exact same issue, already increased the oplog size to 21GiB and the error persists. My database is 450G, I believe transaction log is well below 21GiB.

Using Mongo 5, pbm-agent 1.6.1

Snapshots:
    2021-11-23T20:41:47Z 0.00B [ERROR: oplog: read data: oplog has insufficient range, some records since the last saved ts {1637700109 10} are missing. Run `pbm backup` to create a valid starting point for the PITR.] [2021-11-24T06:39:32]
1 Like

ps. I am not using docker or kubernetes, just Mongo on Debian virtual machine

1 Like

HI @robertomurta
Can you send what’s in pbm status and pbm logs -e backup/2021-11-23T20:41:47Z -t 0 -s D (where 2021-11-23T20:41:47Z is the failed backup)?

1 Like

$ pbm status

Cluster:
========
rs0:
  - rs0/db0.ourdomain.com:27017: pbm-agent v1.6.1 OK
  - rs0/db2.ourdomain.com:27017: pbm-agent v1.6.1 OK
  - rs0/db1.ourdomain.com:27017: pbm-agent v1.6.1 OK


PITR incremental backup:
========================
Status [OFF]

Currently running:
==================
(none)

Backups:
========
S3 location s3://https://s3.location.backblazeb2.com/ourbucket/data/pbm/backup
  Snapshots:
    2021-11-23T20:41:47Z 0.00B [ERROR: oplog: read data: oplog has insufficient range, some records since the last saved ts {1637700109 10} are missing. Run `pbm backup` to create a valid starting point for the PITR.] [2021-11-24T06:39:32]
    2021-10-29T16:28:01Z 0.00B [ERROR: oplog: read data: oplog has insufficient range, some records since the last saved ts {1635524899 25} are missing. Run `pbm backup` to create a valid starting point for the PITR.] [2021-10-30T00:28:00]

$ pbm logs -e backup/2021-11-23T20:41:47Z -t 0 -s D

2021-11-23T20:41:48Z D [rs0/db0.ourdomain.com:27017] [backup/2021-11-23T20:41:47Z] init backup meta
2021-11-23T20:41:48Z D [rs0/db0.ourdomain.com:27017] [backup/2021-11-23T20:41:47Z] nomination list for rs0: [[db1.ourdomain.com:27017 db2.ourdomain.com:27017] [db0.ourdomain.com:27017]]
2021-11-23T20:41:48Z D [rs0/db0.ourdomain.com:27017] [backup/2021-11-23T20:41:47Z] nomination rs0, set candidates [db1.ourdomain.com:27017 db2.ourdomain.com:27017]
2021-11-23T20:41:48Z I [rs0/db1.ourdomain.com:27017] [backup/2021-11-23T20:41:47Z] backup started
2021-11-23T20:41:48Z D [rs0/db2.ourdomain.com:27017] [backup/2021-11-23T20:41:47Z] skip after nomination, probably started by another node
2021-11-23T20:41:48Z D [rs0/db0.ourdomain.com:27017] [backup/2021-11-23T20:41:47Z] skip after nomination, probably started by another node
2021-11-23T20:41:51Z D [rs0/db1.ourdomain.com:27017] [backup/2021-11-23T20:41:47Z] wait for tmp users {1637700111 24}
2021-11-23T20:41:51Z I [rs0/db1.ourdomain.com:27017] [backup/2021-11-23T20:41:47Z] s3.uploadPartSize is set to 45544041 (~43Mb)
2021-11-23T20:41:53Z D [rs0/db0.ourdomain.com:27017] [backup/2021-11-23T20:41:47Z] bcp nomination: rs0 won by db1.ourdomain.com:27017
2021-11-24T06:39:28Z I [rs0/db1.ourdomain.com:27017] [backup/2021-11-23T20:41:47Z] mongodump finished, waiting for the oplog
2021-11-24T06:39:31Z D [rs0/db1.ourdomain.com:27017] [backup/2021-11-23T20:41:47Z] set oplog span to {1637700109 10} / {1637735968 16}
2021-11-24T06:39:31Z I [rs0/db1.ourdomain.com:27017] [backup/2021-11-23T20:41:47Z] s3.uploadPartSize is set to 10485760 (~10Mb)
2021-11-24T06:39:32Z I [rs0/db1.ourdomain.com:27017] [backup/2021-11-23T20:41:47Z] dropping tmp collections
2021-11-24T06:39:32Z I [rs0/db1.ourdomain.com:27017] [backup/2021-11-23T20:41:47Z] mark RS as error `oplog: read data: oplog has insufficient range, some records since the last saved ts {1637700109 10} are missing. Run `pbm backup` to create a valid starting point for the PITR.`: <nil>
2021-11-24T06:39:32Z I [rs0/db1.ourdomain.com:27017] [backup/2021-11-23T20:41:47Z] mark backup as error `oplog: read data: oplog has insufficient range, some records since the last saved ts {1637700109 10} are missing. Run `pbm backup` to create a valid starting point for the PITR.`: <nil>
2021-11-24T06:39:32Z E [rs0/db1.ourdomain.com:27017] [backup/2021-11-23T20:41:47Z] backup: oplog: read data: oplog has insufficient range, some records since the last saved ts {1637700109 10} are missing. Run `pbm backup` to create a valid starting point for the PITR.
2021-11-24T06:39:32Z D [rs0/db1.ourdomain.com:27017] [backup/2021-11-23T20:41:47Z] releasing lock
1 Like

Hello. Fix form me was increase oplog size.
I use cmd:

use local
db.adminCommand({replSetResizeOplog: 1, size: 35000})

This cmd fix the problem.
Thx for helping me.

2 Likes

After some other trouble, and upgrading the agent to 1.6.1, snapshot and PITR were successfull.
I used:

use local
db.adminCommand({replSetResizeOplog: 1, size: 21000})
1 Like