Paritial upload of Backups to GCS when using pbm 2.10.0

Hello, recently we’ve upgraded PBM from 2.9.1 to 2.10.0.

We’re uploading backups to GCS

After the migration we’ve noticed that some objects appear to be partially uploaded, unfortunately there is no errors logged when this occurs.

With 2.9.1 we were using the S3 driver with retries setup.

With 2.10.0 we had to migrate to GCS driver without configuring any retry (this option is not documented). Taking a look at the source code it looks like it is possible to setup backoff options (backoffInitial, backoffMax, and backoffMultiplier) but I am not entirely clear how to properly configure these so there’s a maximum number of retries, can you please clarify?

S3 driver has been quite reliable for us and it is logging errors that take place during transfer, is there any chance to force 2.10.0 to continue using S3 driver while uploading backups to GCS? The GCS implementation doesn’t seem as reliable and S3 one so we are forced to continue using 2.9.1 for now.

Hello,

Thank you for reporting this. Let me start by directly answering your questions:

is there any chance to force 2.10.0 to continue using S3 driver while uploading backups to GCS?

We were forced to migrate to dedicated GCS library due to the end-of-support for AWS V1 SDK: Announcing end-of-support for AWS SDK for Go (v1) effective July 31, 2025 | AWS Developer Tools Blog .
AWS v2 SDK doesn’t support GCS, so answer to your question is unfortunately no.

I am not entirely clear how to properly configure these so there’s a maximum number of retries

Retry options for GCS are documented here: Remote backup storage options - Percona Backup for MongoDB
So by default there will be retries after: 1, 2, 4, 8, 16 seconds.

In general, we’ll be able to help you more if you clarify the usage scenario a bit, especially this:

After the migration we’ve noticed that some objects appear to be partially uploaded, unfortunately there is no errors logged when this occurs.

What was the uploaded object size, and what was the expected one? How many of them failed, and how did you detect that if an error hasn’t been reported? Also, please share your GCS configuration that you are using with v2.10. Having log file can significantly clarify the situation, so please share it if that’s possible.

Hello,

thanks for your response!

Retry options for GCS are documented here: Remote backup storage options - Percona Backup for MongoDB
So by default there will be retries after: 1, 2, 4, 8, 16 seconds.

So backoffMax which defaults 30 state that when the computed retry is longer than 30 seconds the library will stop retrying? I though it would’ve been that the retries would’ve been: 1, 2, 4, 8, 16, 30, 30 … but then I was missing how to set the max number of retries.

Is there a way to make pmb log when it retries?

What was the uploaded object size, and what was the expected one? How many of them failed, and how did you detect that if an error hasn’t been reported?

We detected while attempting a restore of the backup, we couldn’t inflate the gz and then noticed it was smaller than previous backups. For example a file that is usually lager than 390GB was only 111GB and another that is usually larger than 590GB was 344GB. We did not do an in-depth review of all collections but it looks like larger collections have a higher chance of being ‘partially uploaded’.

This is the configuration used for pbm 2.10

pitr:
  enabled: true
  oplogSpanMin: 1
  compression: gzip
storage:
  type: gcs
  gcs:
    bucket: ***
    prefix: pbm/backup
    credentials:
      hmacAccessKey: ***
      hmacSecret: ***
backup:
  compression: gzip

for pmb 2.9 we are instead using

pitr:
  enabled: true
  oplogSpanMin: 1
  compression: gzip
storage:
  type: s3
  s3:
    provider: aws
    region: ***
    bucket: ***
    prefix: pbm/backup
    endpointUrl: https://storage.googleapis.com
    credentials:
      access-key-id: ***
      secret-access-key: ***
    retryer:
      numMaxRetries: 10
      minRetryDelay: 1s
      maxRetryDelay: 10m
backup:
  compression: gzip

With regards to the logs, I rather not send these over a public forum. Are you looking at something specific? I tried to filter the logs of the backup performed with 2.10 for which some objects were not completely uploaded and could not find any message at W,E or F severity.

I’ve managed to look at the debug logs and found the following lines for the collection that was supposed to be uploaded with a size of ~390GB:

2025-08-16T05:01:09Z D [***] [backup/2025-08-16T05:00:05Z] uploading "2025-08-16T05:00:05Z/production/${dbName}.${collectionName}.gz" [size hint: 785705676800 (731.75GB); part size: 117855850 (112.40MB)]
2025-08-16T23:32:33Z I [***] [backup/2025-08-16T05:00:05Z] dump collection "${dbName}.${collectionName}" done (size: 1578844574699)

Hello again,

We were able to reproduce the issue. Please follow this ticket to find out more information and progress: https://perconadev.atlassian.net/browse/PBM-1605

Additionally, there’s workaround for the GCS, please see this comment. So when using PBM v2.10, it should work if you use service account (email/private key credentials), but I am not sure if that’s the option for you.

Thanks, I’ve requested access to https://perconadev.atlassian.net to read the ticket

Hi, you should be able to see the Jira ticket now:

https://perconadev.atlassian.net/browse/PBM-1605