Upload error with pitr backups to fileshare. Is retry possible?

la_parka · May 11, 2023, 5:33pm

Percona Backup: 2.0.3
Percona MongoDB Distribution: 6.0.4-3

Friends. I am having an issue with backup. I am backing up to a cifs share which is mounted at startup.
After a few hours, I get the following error from the pitr backup:

2023-05-10T22:28:23.000-0500 I [pitr] created chunk 2023-05-11T03:23:15 - 2023-05-11T03:28:15. Next chunk creation scheduled to begin at ~2023-05-10T22:33:12
2023-05-10T22:33:15.000-0500 D [pitr] remove pbmPitr/xxx/xxx/20230511/20230511032815-11856.20230511033315-11361.oplog.s2 due to upload errors
2023-05-10T22:33:15.000-0500 E [pitr] streaming oplog: unable to upload chunk {1683775695 11856}.{1683775995 11361}: read data: oplog has insufficient range, some records since the last saved ts {1683775695 11856} are missing. Run pbm backup to create a valid starting point for the PITR.
2023-05-10T22:33:45.000-0500 D start_catchup
2023-05-10T22:33:45.000-0500 D lastTS set to {1683775695 11856} 2023-05-11T03:28:15
2023-05-10T22:33:45.000-0500 I streaming started from 2023-05-11 03:28:15 +0000 UTC / 1683775695
2023-05-10T22:33:45.000-0500 E streaming oplog: unable to upload chunk {1683775695 11856}.{1683775995 11361}: read data: oplog has insufficient range, some records since the last saved ts {1683775695 11856} are missing. Run pbm backup to create a valid starting point for the PITR.

And the insufficient range error repeats until I disable pitr.

Interestingly enough, on the directory i see these files, but the final file is this different naming format:
20230511030815-3.20230511031315-3.oplog.s2
20230511031315-3.20230511031815-3.oplog.s2
20230511031815-3.20230511032315-3.oplog.s2
20230511032315-3.20230511032815-11856.oplog.s2

I am able to write to the directory, both as myself and the mongod user via sudo -su.
Recreating a new snapshot works without issue but after a few hours, I’ll get these errors again.
I have tried reducing the backup window from 10 minutes to 5 minutes, as you can see above. But that didn’t help.
I thought one secondary node was having issues, so I changed the back priority of that server to .1, but I still got errors when backups were running from the primary node.
I haven’t tried changing the compression to gzip with no compression.

I have a development, stage, and test environment with the same configuration, with no issues.
Is there a retry available for pitr backup? Are there any other flags I can use?

Parag_Bhayani · May 11, 2023, 6:37pm

Hi @la_parka,

Welcome to the Percona Community Forum !!

2023-05-10T22:33:15.000-0500 E [pitr] streaming oplog: unable to upload chunk {1683775695 11856}.{1683775995 11361}: read data: oplog has insufficient range, some records since the last saved ts {1683775695 11856} are missing.

From the above error log, it indicates issue related to small Oplog sizing i.e it ran out of space and it started purging older data since it is a capped collection.
In order to achieve consistency, after the backup was made, PBM also saves the oplog which covers the backup time. So when the backup is finished oplog events that coincide with the backup start time are already rewritten.

Try to increase the oplog size so it is big enough to fit all events while the backup phase is running.

Regards,
Parag Bhayani

la_parka · May 11, 2023, 7:42pm

Thanks for the reply! I ran the command from the link you provided above and got
db.oplog.rs.stats().maxSize
1073741842

In mongod.conf, I have the setting at
replication:
oplogSizeMB: 10240

This is a pretty quiet server, so most of the .s2 files in the backup directory are around 20 KB.

Parag_Bhayani · May 12, 2023, 8:03pm

Hi,

Kindly share the output of rs.printReplicationInfo() from all the nodes in the repicaSet. Also share the output of pbm status and pbm list.

Regards,
Parag

la_parka · August 3, 2023, 3:58pm

Using rs.printReplicationInfo() I found the issue!
I found that even though I had the OpLog set to 10 GB in mongod.conf, it was only using 1GB!
I made sure to restart the service and verified that the opLog was 10GB.
I havent seen the error since then. Thank you for your help!

Topic		Replies	Views
PITR issue - oplog has insufficient range Percona Backup for MongoDB closed-no-reply , pbm	1	386	August 19, 2024
Pbm backup errors out Percona Server for MongoDB	2	515	March 2, 2021
PBM backup failed Percona Backup for MongoDB mongodb	7	2160	November 30, 2021
Pbm status shows an error message even though the latest backup succeeded Percona Backup for MongoDB	2	62	November 15, 2024
PITR restore a backup to a new cluster Percona Backup for MongoDB	10	983	September 9, 2022

Upload error with pitr backups to fileshare. Is retry possible?

Related topics