PITR issue - oplog has insufficient range

peter.surovy · March 26, 2024, 12:41pm

Hello.

We are using following MongoDB setup:

replica set consisting of three separate servers
each server is running MongoDB version 7.0.5-3
each server is running PBM agent version 2.4.0
environment is built from scratch, during the process a logical base backup is made and PITR is turned on
PITR is using these settings:
pitr:
enabled: true
oplogSpanMin: 10
compression: gzip
compressionLevel: -1
oplogonly: false
oplog size is approximately 1,5GB
logical base backup is made once every day

During the standard operation, PITR gathers chunks every 10 minutes based on configuration. Using “pbm status” command we see something like this:
“Backups:”,
“========”,
“S3 local s3://http:xxxxxxxxxxxxxxxxxxxxxxxxx”,
" Snapshots:“,
" 2024-03-25T07:33:03Z 292.33KB <incremental, base> [restore_to_time: 2024-03-25T07:33:11Z]”,
" PITR chunks [3.07GB]:“,
" 2024-03-25T07:33:12Z - 2024-03-25T10:33:24Z”

But we have come across following situation:

during 10 minute PITR window there’s a large data write to the DB resulting into whole oplog being filled and rotated
when PITR is about to create new chunk, it detects, that whole oplog has rotated and fails with following error:

2024-03-25T10:43:17Z E [xxxx] [pitr] streaming oplog: oplog has insufficient range, some records since the last saved ts {17
11362804 17} are missing. Run pbm backup to create a valid starting point for the PITR

Okay, makes sense - PITR has lost track of some records within the 10 minute span.
How do we make it work again? We have created new base backup, shouldn’t PITR automatically start new chunk chain? See example below:

“Backups:”,
“========”,
“S3 local s3://http:xxxxxxxxxxxxxxxxxxxxxxxxx”,
" Snapshots:“,
" 2024-03-25T11:46:13Z 3.88GB <incremental, base> [restore_to_time: 2024-03-25T11:46:20Z]”, – we made a new backup here
" 2024-03-25T07:33:03Z 292.33KB <incremental, base> [restore_to_time: 2024-03-25T07:33:11Z]“,
" PITR chunks [3.07GB]:”,
" shouldn’t new chunk chain start here using 2024-03-25T11:46:13Z backup? " – shouldn’t new chain start here?
" 2024-03-25T07:33:12Z - 2024-03-25T10:33:24Z"

PITR keeps failing with the same exception, unbothered that there is new base available. Is this expected behaviour? We’d expect a new chunk chain to start.

Thank you for your comments.

Sandra_Romanchenko · August 19, 2024, 1:25pm

Hi,

You’re right, there is an issue on PBM side which will be addressed within PBM-1344 in the upcoming PBM release.

Also for PBM to be able to save oplog between backups, I’d suggest you either increase the oplog size on server side, or decrease oplogSpanMin interval on PBM side.

Topic		Replies	Views
Pbm backup errors out Percona Server for MongoDB	2	493	March 2, 2021
Pbm status shows an error message even though the latest backup succeeded Percona Backup for MongoDB	2	52	November 15, 2024
PBM backup failed Percona Backup for MongoDB mongodb	7	2120	November 30, 2021
Percona backup for MongoDB no PITR chunks during full backup? Percona Backup for MongoDB	2	698	September 6, 2022
Upload error with pitr backups to fileshare. Is retry possible? Percona Backup for MongoDB pbm	4	1037	August 3, 2023

PITR issue - oplog has insufficient range

Related topics