Unable to take backup on s3 if file size is more than 5GB

Hi Team,

we got this error “errmsg” : “Cannot backup ‘/db/perconaServerBackup/collection.wt’ : EntityTooLarge : Unable to parse ExceptionName: EntityTooLarge Message: Your proposed upload exceeds the maximum allowed size” while taking the backup on s3 using the below command.

Command: db.runCommand({createBackup: 1, s3: {bucket: “datalake-dev”, region: “us-west-2”, path: “perconaServerBackup”, accessKeyId: “XXXX”, secretAccessKey: “XXXX”} })

Please assist on the same.

My total MongoDB size is 1.5TB(Dev) and some of the file sizes like 7GB,20GB,120GB,950GB…so on

/src/mongo/db/storage/wiredtiger_kv_engine.cpp: Seems like the below code causing the issue while taking the hotbackup(Binary backup to s3).I think multipart API needs to be used here (I am not sure but please have a look)

Aws::S3::Model::PutObjectRequest request;
request.SetBucket(s3params.bucket);
request.SetKey(destFile.string());
request.SetContentLength(fsize);
request.SetContentType("application/octet-stream");

auto fileToUpload = Aws::MakeShared<Aws::FStream>("AWS", srcFile.string(), std::ios_base::in | std::ios_base::binary);
if (!fileToUpload) {
    return Status(ErrorCodes::InvalidPath,
                  str::stream() << "Cannot open file '" << srcFile.string() << "' for backup"
                                << " : " << strerror(errno));
}
request.SetBody(fileToUpload);

auto outcome = s3_client.PutObject(request)
if (!outcome.IsSuccess()) {
    return Status(ErrorCodes::InternalError,
                  str::stream() << "Cannot backup '" << srcFile.string() << "'"
                                << " : " << outcome.GetError().GetExceptionName()
                                << " : " << outcome.GetError().GetMessage());
}
LOGV2_DEBUG(29004, 2, "Successfully uploaded file: {destFile}",
            "destFile"_attr = destFile.string());

Hello Surendra,

Thank you for your question and the follow-up details that you provided.

Based on your original backup command, it looks like you are using Percona Sever for MongoDB and specifically the hot backup feature with the option for “Streaming Hot Backups to a Remote Destination”.

As noted here: https://www.percona.com/doc/percona-server-for-mongodb/LATEST/hot-backup.html

“This feature was implemented in Percona Server for MongoDB 4.2.1-1. In this release, this feature has the EXPERIMENTAL status.”

Currently in this experimental release our implementation does not use the Multipart API. Instead it utilizes a single PUT operation to fully encapsulate the backup.

Thus your backup attempt then hits the AWS S3 max size limit of 5 GB for a single operation as noted here:

https://docs.aws.amazon.com/AmazonS3/latest/dev/UploadingObjects.html

“Upload an object in a single operation using the AWS SDKs, REST API, or AWS CLI—With a single PUT operation, you can upload objects up to 5 GB in size.”

As you noted in your response above, the Multipart API could be used which would then allow for up to 10,000 parts in an upload and a maximum size of up to 5 TB.

Note: Each part would still have a part size of between 5 MB and 5GB. So each part would still be that 5 GB max as seen with the single PUT operation as currently implemented.

We will enter this in as a Feature request. We can see how this would be very helpful to many. However we cannot guarantee when that would be available.

Workaround:

In the meantime, a potential workaround for you would be to backup to a local file system then use an OS level tool to perform the Multipart upload.

Thank you for your question and we will continue working to add functionality to our tools.

Regards,

Kimberly

MongoDB Tech Lead - Percona

Thank you Kwilk. I be-leave this is very very useful feature to many.

Is this feature available in percona backup for mongodb (Logical Backup to s3)? I am facing below error when I tried the same ( The size of the files are not more than 5TB).

Error:

2020-09-03T08:39:52.155+0000 writing AnalyticsNextGen.execution_hour to archive on stdout

2020/09/03 08:40:08 [INFO] backup/2020-09-03T08:19:07Z: mark backup as error `mongodump: write data: upload to S3: MultipartUpload: upload multipart failed

upload id: XXXX

caused by: TotalPartsExceeded: exceeded total allowed configured MaxUploadParts (10000). Adjust PartSize to fit in this limit.`:

2020/09/03 08:40:08 [ERROR] backup/2020-09-03T08:19:07Z: backup: mongodump: write data: upload to S3: MultipartUpload: upload multipart failed.

Thanks,

Surendra Babu

Hello @sbotlagunta

We are tracking this issue in our Jira. For Percona Server for MongoDB this is the ticket:

[PSMDB-715] createBackup using AWS remote location fails with “EntityTooLarge” - Percona JIRA

For Percona Backup for MongoDB the ticket is:

[PBM-485] Fix backups to S3 failing with MaxUploadParts limit by auto-adjusting uploadPartSize value - Percona JIRA

Thank you for the update Igor.

Got the percona backup for mongodb release version 1.3.2 with s3 multipart issue fix(Logical backup) and testing is going on from our end.

Waiting for percona server for mongodb release with s3 multipart fix (HotBackup).

Thanks,

Surendra Babu

Hi Igor,

Initiated the percona backup for mongodb to s3,the db size is 1.6TB. We got the below issue after completing the backup to s3 (it took 7 hrs) .

Config command: pbm config --file /etc/mongodb/pbm_s3_store.yaml --mongodb-uri mongodb://xxxx:xxxx@127.0.0.1:27017 --set pitr.enabled=true

Backup Command: pbm backup --mongodb-uri mongodb://xxxx:xxxx@127.0.0.1:27017

List: pbm list --mongodb-uri mongodb://xxxx:xxxx@127.0.0.1:27017

Error:

Backup snapshots:

2020-10-16T08:33:33Z Failed with “oplog: read data: oplog has insufficient range, some records since the last saved ts {1602837232 4340} are missing. Run pbm backup to create a valid starting point for the PITR.”

Error: check PITR state for shard ‘zion’: PITR backup failed

PITR :

!Failed to run PITR backup. Agent logs:

zion: 2020-10-16T15:26:14.000+0000 [ERROR] pitr: defining starting point for the backup: no backup found, a new backup is required to start PITR

Please assist on the same.

Thanks,

Surendra Babu