Hi.
Basically I think the AWS S3 http requests are being lost due to the network, and that if you were iterating through the files in the dbPath directory and sending them one by one using “aws s3 cp …”(*) you would encounter the same error.
The AWS library being used in PSMDB is the C++ SDK from AWS rather than the python one used by the AWS CLI. So they’re not perfectly the same. But I have faith that they’d both be doing this most-common type of S3 operation correctly, if not perfectly the same.
You could attempt to increase the information put in the log whilst it runs by raising the log level to 2. This will produce a lot more messages because there are many debug log calls at level 2 , even if we restrict just to the “STORAGE” component. (You can search for “LOGV2” in code of hotBackup() to see example of the messages I mean.)
db.setLogLevel(2, "storage")
db.adminCommand({ createBackup: 1, .... });
db.setLogLevel(-1, "storage"); /* Revert storage to global log level again. That is 0 by default */
//I will strongly suggest running the 'db.setLogLevel(-1, "storage");' command from
// another shell instance after 20 mins at most rather than wait for
// createBackup to finish normally. If it takes an hour for example the log file may be 10's of GB! Log level 2 is a danger of its own.
But I’m not hopeful this will provide any new perspective. The error messages you identified at the start (eg. CurlHttpClient … Curl returned error code 55 - Failed sending data to the peer) will only appear again next to otherwise normal AWS commands I believe. The curl library is telling us that the HTTP request outside of the mongod process was abruptly terminated.
* Or “aws s3api put-object …”, or “aws s3api create-multipart-upload”+“aws s3api upload-part …”