Upload error with pitr backups to fileshare. Is retry possible?

Percona Backup: 2.0.3
Percona MongoDB Distribution: 6.0.4-3

Friends. I am having an issue with backup. I am backing up to a cifs share which is mounted at startup.
After a few hours, I get the following error from the pitr backup:

2023-05-10T22:28:23.000-0500 I [pitr] created chunk 2023-05-11T03:23:15 - 2023-05-11T03:28:15. Next chunk creation scheduled to begin at ~2023-05-10T22:33:12
2023-05-10T22:33:15.000-0500 D [pitr] remove pbmPitr/xxx/xxx/20230511/20230511032815-11856.20230511033315-11361.oplog.s2 due to upload errors
2023-05-10T22:33:15.000-0500 E [pitr] streaming oplog: unable to upload chunk {1683775695 11856}.{1683775995 11361}: read data: oplog has insufficient range, some records since the last saved ts {1683775695 11856} are missing. Run pbm backup to create a valid starting point for the PITR.
2023-05-10T22:33:45.000-0500 D start_catchup
2023-05-10T22:33:45.000-0500 D lastTS set to {1683775695 11856} 2023-05-11T03:28:15
2023-05-10T22:33:45.000-0500 I streaming started from 2023-05-11 03:28:15 +0000 UTC / 1683775695
2023-05-10T22:33:45.000-0500 E streaming oplog: unable to upload chunk {1683775695 11856}.{1683775995 11361}: read data: oplog has insufficient range, some records since the last saved ts {1683775695 11856} are missing. Run pbm backup to create a valid starting point for the PITR.

And the insufficient range error repeats until I disable pitr.

Interestingly enough, on the directory i see these files, but the final file is this different naming format:
20230511030815-3.20230511031315-3.oplog.s2
20230511031315-3.20230511031815-3.oplog.s2
20230511031815-3.20230511032315-3.oplog.s2
20230511032315-3.20230511032815-11856.oplog.s2

I am able to write to the directory, both as myself and the mongod user via sudo -su.
Recreating a new snapshot works without issue but after a few hours, I’ll get these errors again.
I have tried reducing the backup window from 10 minutes to 5 minutes, as you can see above. But that didn’t help.
I thought one secondary node was having issues, so I changed the back priority of that server to .1, but I still got errors when backups were running from the primary node.
I haven’t tried changing the compression to gzip with no compression.

I have a development, stage, and test environment with the same configuration, with no issues.
Is there a retry available for pitr backup? Are there any other flags I can use?

Hi @la_parka,

Welcome to the Percona Community Forum !!

2023-05-10T22:33:15.000-0500 E [pitr] streaming oplog: unable to upload chunk {1683775695 11856}.{1683775995 11361}: read data: oplog has insufficient range, some records since the last saved ts {1683775695 11856} are missing.

From the above error log, it indicates issue related to small Oplog sizing i.e it ran out of space and it started purging older data since it is a capped collection.
In order to achieve consistency, after the backup was made, PBM also saves the oplog which covers the backup time. So when the backup is finished oplog events that coincide with the backup start time are already rewritten.

Try to increase the oplog size so it is big enough to fit all events while the backup phase is running.

Regards,
Parag Bhayani

Thanks for the reply! I ran the command from the link you provided above and got
db.oplog.rs.stats().maxSize
1073741842

In mongod.conf, I have the setting at
replication:
oplogSizeMB: 10240

This is a pretty quiet server, so most of the .s2 files in the backup directory are around 20 KB.

Hi,

Kindly share the output of rs.printReplicationInfo() from all the nodes in the repicaSet. Also share the output of pbm status and pbm list.

Regards,
Parag

Using rs.printReplicationInfo() I found the issue!
I found that even though I had the OpLog set to 10 GB in mongod.conf, it was only using 1GB!
I made sure to restart the service and verified that the opLog was 10GB.
I havent seen the error since then. Thank you for your help!