Lets discuss the problem.
“Using xbcloud to upload to s3 is a resource intensive task. even if i increased the --read-buffer-size which is the maximum file size that can be uploaded to s3 to 1 GB, it resulted in 1000 files for a 11 GB database.”
Can you please explain why the 1000 files are a problem?
Let me explain the intention of xbstream.
xbstream is just a format. So there is an encode and decode operation of the xbstream format.
And “stdout” is a single PIPE that is used by two processes to send data (in xbstream format). With FIFO, instead of using single PIPE, multiple such PIPEs are used to send data across.
From our testing, with stdout, the streaming speed is about 1.8Gbps. This is good enough for almost all user scenarios.
With FIFO, it can utilize network bandwidth, up to 10Gbps. It is intended for local LAN storage (S3 compatible).
Regarding the split files:
Amazon S3 and other cloud providers didn’t support multi-part uploads (or, let’s say, the ability to resume from a failed chunk), at least not until very recently.
xbcloud handles this problem by splitting the files into smaller chunks. So, imagine if you are uploading a 100G file. If there is an issue with one of the smaller chunk/file, xbcloud can retry the failed file instead of resending 100G again.
I understand that there is room to improve, get rid of the splitting by xbcloud, and utilize s3 multipart upload (Uploading and copying objects using multipart upload in Amazon S3 - Amazon Simple Storage Service). We have to also see if other cloud provider support it and the specifications around it.