Any advice in getting around S3 download errors/timeouts? I feel like this is a pretty core responsibility of the restore tool, so I must be misconfigured or something. Here are my details:
Mongodb version: 3.6
Replica set: Yes
Sharded: No
PITR: No
PBM version: tried with 1.3.4 and 1.4.0
Log:
2021-01-08T05:58:08.233+0000 using 1 insertion workers
2021-01-08T05:59:07.000+0000 W [restore/2021-01-07T17:37:32Z] errReadObj Err: context deadline exceeded (Client.Timeout or context cancellation while reading body)
2021-01-08T05:59:07.000+0000 W [restore/2021-01-07T17:37:32Z] got context deadline exceeded (Client.Timeout or context cancellation while reading body), try to reconnect in 1s
2021-01-08T05:59:08.000+0000 I [restore/2021-01-07T17:37:32Z] session recreated, resuming download
2021-01-08T05:59:38.000+0000 W [restore/2021-01-07T17:37:32Z] errReadObj Err: context deadline exceeded (Client.Timeout or context cancellation while reading body)
2021-01-08T05:59:38.000+0000 W [restore/2021-01-07T17:37:32Z] got context deadline exceeded (Client.Timeout or context cancellation while reading body), try to reconnect in 2s
2021-01-08T05:59:40.000+0000 I [restore/2021-01-07T17:37:32Z] session recreated, resuming download
âŠ
the download attempts 10-15 times before giving up.
âŠ
2021-01-08T06:08:38.000+0000 W [restore/2021-01-07T17:37:32Z] got context deadline exceeded (Client.Timeout or context cancellation while reading body), try to reconnect in 10s
2021-01-08T06:08:48.000+0000 I [restore/2021-01-07T17:37:32Z] session recreated, resuming download
2021-01-08T06:08:48.000+0000 E [restore/2021-01-07T17:37:32Z] download âmy-bucket/2021-01-07T17:37:32Z_rs0.dump.s2â file from S3:
2021-01-08T06:08:48.191+0000 finished restoring db_name.coll_name (619757 documents, 0 failures)
2021-01-08T06:08:48.196+0000 finished restoring db_name.coll_name2 (229667 documents, 0 failures)
2021-01-08T06:08:48.197+0000 demux finishing when there are still outs (8)
2021-01-08T06:08:48.198+0000 demux finishing (err:corruption found in archive; ParserConsumer.BodyBSON() ( corruption found in archive; read bson ( s2: corrupt input ) ))
2021-01-08T06:08:48.000+0000 E [restore/2021-01-07T17:37:32Z] restore: restore mongo dump (successes: 849451 / fails: 0): db_name.coll_name: error restoring from archive on stdin: reading bson input: error demultiplexing archive; archive io error
2021-01-08T06:08:48.000+0000 D [restore/2021-01-07T17:37:32Z] releasing lock
The download starts, and the restore starts loading. So, itâs not completely unable to communicate with S3. Maybe just the files are too large, and they are timing out? The timeout or file size are not configurable, are they? I have tried reducing the number of insertion workers (1) and batchSize (10) very low, but that didnât help. Any help is appreciated. Thanks