Help getting past download step on restore

Any advice in getting around S3 download errors/timeouts? I feel like this is a pretty core responsibility of the restore tool, so I must be misconfigured or something. Here are my details:

Mongodb version: 3.6
Replica set: Yes
Sharded: No
PITR: No
PBM version: tried with 1.3.4 and 1.4.0

Log:
2021-01-08T05:58:08.233+0000 using 1 insertion workers
2021-01-08T05:59:07.000+0000 W [restore/2021-01-07T17:37:32Z] errReadObj Err: context deadline exceeded (Client.Timeout or context cancellation while reading body)
2021-01-08T05:59:07.000+0000 W [restore/2021-01-07T17:37:32Z] got context deadline exceeded (Client.Timeout or context cancellation while reading body), try to reconnect in 1s
2021-01-08T05:59:08.000+0000 I [restore/2021-01-07T17:37:32Z] session recreated, resuming download
2021-01-08T05:59:38.000+0000 W [restore/2021-01-07T17:37:32Z] errReadObj Err: context deadline exceeded (Client.Timeout or context cancellation while reading body)
2021-01-08T05:59:38.000+0000 W [restore/2021-01-07T17:37:32Z] got context deadline exceeded (Client.Timeout or context cancellation while reading body), try to reconnect in 2s
2021-01-08T05:59:40.000+0000 I [restore/2021-01-07T17:37:32Z] session recreated, resuming download


the download attempts 10-15 times before giving up.


2021-01-08T06:08:38.000+0000 W [restore/2021-01-07T17:37:32Z] got context deadline exceeded (Client.Timeout or context cancellation while reading body), try to reconnect in 10s
2021-01-08T06:08:48.000+0000 I [restore/2021-01-07T17:37:32Z] session recreated, resuming download
2021-01-08T06:08:48.000+0000 E [restore/2021-01-07T17:37:32Z] download ‘my-bucket/2021-01-07T17:37:32Z_rs0.dump.s2’ file from S3:
2021-01-08T06:08:48.191+0000 finished restoring db_name.coll_name (619757 documents, 0 failures)
2021-01-08T06:08:48.196+0000 finished restoring db_name.coll_name2 (229667 documents, 0 failures)
2021-01-08T06:08:48.197+0000 demux finishing when there are still outs (8)
2021-01-08T06:08:48.198+0000 demux finishing (err:corruption found in archive; ParserConsumer.BodyBSON() ( corruption found in archive; read bson ( s2: corrupt input ) ))
2021-01-08T06:08:48.000+0000 E [restore/2021-01-07T17:37:32Z] restore: restore mongo dump (successes: 849451 / fails: 0): db_name.coll_name: error restoring from archive on stdin: reading bson input: error demultiplexing archive; archive io error
2021-01-08T06:08:48.000+0000 D [restore/2021-01-07T17:37:32Z] releasing lock

The download starts, and the restore starts loading. So, it’s not completely unable to communicate with S3. Maybe just the files are too large, and they are timing out? The timeout or file size are not configurable, are they? I have tried reducing the number of insertion workers (1) and batchSize (10) very low, but that didn’t help. Any help is appreciated. Thanks

1 Like

Hello. Can you please provide the entire command you are using?
Also, have you attempted to use the native ‘aws s3’ command line tool to see if you get the same behavior? If you get same behavior from the official AWS tool, then you know our tool isn’t really at fault.

1 Like

Thanks for the reply.
The command I am running is:
pbm --mongodb-uri “mongodb://xxxxxx:yyyy@127.0.0.1:27017/?authSource=admin&replicaSet=rs0” restore 2021-01-07T17:37:32Z

The download works fine using awscli:
aws s3 cp ~/ s3://my-bucket/2021-01-07T17:37:32Z_rs0.dump.s2

The dump file is 370 GB.

1 Like

It’s difficult to diagnose a potential pbm/s3 bug over forums. I would ask that you download the backup using ‘aws s3’ to your local machine first, then use pbm to restore that backup from local disk. If that works fine, then there might be some issue with pbm-s3 and you’ll have to open a bug report at https://jira.percona.com/ provide a repeatable test case, public bucket with data, and detail logs so our engineering teams can debug.

1 Like