createBackup to GCS fails after aws-sdk-cpp upgrade

Note: I was unable to create a Jira ticket directly (permission error on create, even when logged in). If a team member could file this under the PSMDB Jira project or enable me to do so, that would be appreciated.

Hi,

We’re hitting a regression in createBackup when streaming to Google Cloud Storage via the S3-compatible API. Every CompleteMultipartUpload request is rejected with HTTP 400.

Affected versions: 8.0.20-8, 8.0.21-9, 8.0.23-10, 7.0.31-17, 7.0.32-18, 7.0.34-19
Last working versions: 8.0.19-7 and 7.0.30-16

The error from the mongod log:

[ERROR] TransferManager: Transfer handle [...] Failed to complete multi-part upload.
  Bucket: [our-gcs-bucket]
  Key: [backup-path/local/collection/14-3454594855776759712.wt]
  HTTP response code: 400
  Exception name: MalformedCompleteMultipartUploadRequest
  Error message: The complete multipart upload request XML you provided
                 was not well-formed or did not validate against our published schema.

Single-part uploads (PutObject) succeed. Multipart part uploads (UploadPart) also succeed. Only the CompleteMultipartUpload XML finalization is rejected by GCS. After one file fails, all other in-progress transfers are cancelled and the entire backup aborts.

How we confirmed the regression:

A control instance on 8.0.19-7 backs up successfully to the same GCS bucket, same credentials, same endpoint. Other instances that pulled 8.0-latest (resolved to 8.0.23-10 after a pod restart on June 9) started failing immediately. Same story for 7.0-latest > 7.0.34-19.

For us, approximately 133 MongoDB instances across multiple environments are affected. Different files fail on different instances (collection files, index files, journal files) - confirming the issue is in the XML serialization, not anything file-specific.

Root cause (source code analysis):

PSMDB-1892 upgraded the vendored aws-sdk-cpp from 1.9.379 to 1.11.471 for SBOM compliance. This landed in 8.0.20-8 and 7.0.31-17 — exactly matching the first broken versions.

The new SDK’s TransferManagerConfiguration introduces a checksumAlgorithm field that defaults to CRC32:

// aws-sdk-cpp 1.11.471 — TransferManager.h line 148
Aws::S3::Model::ChecksumAlgorithm checksumAlgorithm = S3::Model::ChecksumAlgorithm::CRC32;

PSMDB’s backup code in wiredtiger_kv_engine.cpp sets computeContentMD5 = true but does not override checksumAlgorithm. During UploadPart, computeContentMD5 correctly overrides the checksum to NOT_SET - so parts upload without requesting CRC32 from the server. GCS doesn’t return any CRC32 header.

However, the part completion callback still checks checksumAlgorithm (still CRC32) and calls GetChecksumCRC32() on the response - getting an empty string "". When assembling the CompleteMultipartUpload XML, SetChecksumCRC32("") is called on each CompletedPart, which unconditionally sets m_checksumCRC32HasBeenSet = true. The XML serializer then emits:

<CompleteMultipartUpload xmlns="http://s3.amazonaws.com/doc/2006-03-01/">
  <Part>
    <ETag>"abc123..."</ETag>
    <ChecksumCRC32></ChecksumCRC32>   <!-- spurious empty element -->
    <PartNumber>1</PartNumber>
  </Part>
</CompleteMultipartUpload>

GCS rejects this because <ChecksumCRC32> is not part of its CompleteMultipartUpload schema. AWS S3 likely accepts it because it’s more lenient with additional XML elements.

Suggested fix:

TransferManagerConfiguration trManConf(executor.get());
trManConf.s3Client = s3_client;
trManConf.computeContentMD5 = true;
trManConf.checksumAlgorithm = S3::Model::ChecksumAlgorithm::NOT_SET;  // add this line

Steps to reproduce:

  1. Deploy PSMDB 8.0.20-8 or later (or 7.0.31-17 or later)
  2. Configure createBackup with S3 parameters pointing to a GCS bucket via storage.googleapis.com
  3. Run createBackup on any database with files larger than the multipart threshold (~5 MB)
  4. Observe MalformedCompleteMultipartUploadRequest in the mongod logs

Related Jira tickets:

  • PSMDB-1892 — the aws-sdk-cpp upgrade that introduced this regression
  • PSMDB-731 — added multipart upload support for GCS in createBackup
  • PSMDB-715 — original EntityTooLarge issue that motivated multipart uploads

Has anyone else hit this with GCS or other S3-compatible backends (MinIO, Ceph RGW)?

Disclosure: The source code analysis was performed with AI assistance and verified by a human engineer. The problem is real and actively impacts our production environment.

Hi, thanks for reaching out. For S3-compatible storage the recommendation is to switch to the minio driver on PBM side. For Google specifically there is also built-in support for gcs. Can you switch to that driver? any reason why you use the aws driver?

Let me know if that helps.

sorry in my previous answer I did not realize you are not running PBM at all but using this feature Hot Backup - Percona Server for MongoDB 8.0 I have created a JIRA ticket for the engineering team to review. Feel free to subscribe or chime in there. In the meantime I encourage you to try PBM for backups. It has many benefits over streaming the backup directly from mongod.