How to restore backups without using xbcloud

As per the new feature of FIFO DATA SINK, we can write data from xtrabackup to multiple pipes. Ref: FIFO data sink - Percona XtraBackup.

We have a software which can read from the fifo pipes and store the data into a storage system.
How can we use those files to restore.

-rw-r–r–. 1 root root 5019625592 Mar 6 00:36 thread_1.xbstream
prw-------. 1 root root 0 Mar 6 00:36 thread_1
-rw-r–r–. 1 root root 6087227589 Mar 6 00:36 thread_0.xbstream
prw-------. 1 root root 0 Mar 6 00:36 thread_0

here thread_0 and thread_1 was opened by xtrabackup and i have read from those pipes and stored them in these 2 files (thread_0.xbstream and thread_1.xbstream). How can i now use those files to carry out restore?

Thanks

If you have files in xbstream format, you can decompress them using xbstream tool directly.

Unpack the xbstream to a directory mkdir backup; xbstream -xv -C ./backup < backup.xbstream

If the software is not proprietary, can you please here for the beneift of community users? What is the benefit of using that software and not using xbcloud ?

@satya.bodapati Using xbcloud to upload to s3 is a resource intensive task. even if i increased the --read-buffer-size which is the maximum file size that can be uploaded to s3 to 1 GB, it resulted in 1000 files for a 11 GB database. Such huge number of files can slow down the backups. Instead if we use say 8 fifo pipes and dump the data from the fifo pipes directly to some file using a tool over network which can handle retries properly, we can say get a 11 GB database to backup in just 8 files which will increase performance significantly.

For the benefit of community this is how we can check it in local system,

BACKUP

xtrabackup --backup --stream=xbstream --fifo-streams=8 --fifo-dir=/tmp/test/fifo --fifo-timeout=500 &

SAVE PIPE DATA to a different xbstream files

cat /tmp/test/fifo/thread_0 > /tmp/test/fifo/thread_0.xbstream &
cat /tmp/test/fifo/thread_1 > /tmp/test/fifo/thread_1.xbstream &
cat /tmp/test/fifo/thread_2 > /tmp/test/fifo/thread_2.xbstream &
cat /tmp/test/fifo/thread_3 > /tmp/test/fifo/thread_3.xbstream &
cat /tmp/test/fifo/thread_4 > /tmp/test/fifo/thread_4.xbstream &
cat /tmp/test/fifo/thread_5 > /tmp/test/fifo/thread_5.xbstream &
cat /tmp/test/fifo/thread_6 > /tmp/test/fifo/thread_6.xbstream &
cat /tmp/test/fifo/thread_7 > /tmp/test/fifo/thread_7.xbstream &

RESTORE

CREATE FIFO PIPES

mkfifo “/tmp/test/fifo_res/thread_0”
mkfifo “/tmp/test/fifo_res/thread_1”
mkfifo “/tmp/test/fifo_res/thread_2”
mkfifo “/tmp/test/fifo_res/thread_3”
mkfifo “/tmp/test/fifo_res/thread_4”
mkfifo “/tmp/test/fifo_res/thread_5”
mkfifo “/tmp/test/fifo_res/thread_6”
mkfifo “/tmp/test/fifo_res/thread_7”

READ DATA TO THE FIFO STREAMS

cat “/tmp/test/fifo/thread_0.xbstream” > “/tmp/test/fifo_res/thread_0” &
cat “/tmp/test/fifo/thread_1.xbstream” > “/tmp/test/fifo_res/thread_1” &
cat “/tmp/test/fifo/thread_2.xbstream” > “/tmp/test/fifo_res/thread_2” &
cat “/tmp/test/fifo/thread_3.xbstream” > “/tmp/test/fifo_res/thread_3” &
cat “/tmp/test/fifo/thread_4.xbstream” > “/tmp/test/fifo_res/thread_4” &
cat “/tmp/test/fifo/thread_5.xbstream” > “/tmp/test/fifo_res/thread_5” &
cat “/tmp/test/fifo/thread_6.xbstream” > “/tmp/test/fifo_res/thread_6” &
cat “/tmp/test/fifo/thread_7.xbstream” > “/tmp/test/fifo_res/thread_7” &

XBSTREAM USING THE FIFO PIPES

xbstream -x -C /tmp/test/fifo_res/restore --parallel=8 --fifo-dir=/tmp/test/fifo_res --fifo-streams=8 --verbose --fifo-timeout=300

@satya.bodapati isnt such approach better as it results in very few files. Is the reason for not going for this approach network failures during large chunk uploads?

Lets discuss the problem.
“Using xbcloud to upload to s3 is a resource intensive task. even if i increased the --read-buffer-size which is the maximum file size that can be uploaded to s3 to 1 GB, it resulted in 1000 files for a 11 GB database.”

Can you please explain why the 1000 files are a problem?

Let me explain the intention of xbstream.

xbstream is just a format. So there is an encode and decode operation of the xbstream format.

And “stdout” is a single PIPE that is used by two processes to send data (in xbstream format). With FIFO, instead of using single PIPE, multiple such PIPEs are used to send data across.
From our testing, with stdout, the streaming speed is about 1.8Gbps. This is good enough for almost all user scenarios.

With FIFO, it can utilize network bandwidth, up to 10Gbps. It is intended for local LAN storage (S3 compatible).

Regarding the split files:

Amazon S3 and other cloud providers didn’t support multi-part uploads (or, let’s say, the ability to resume from a failed chunk), at least not until very recently.

xbcloud handles this problem by splitting the files into smaller chunks. So, imagine if you are uploading a 100G file. If there is an issue with one of the smaller chunk/file, xbcloud can retry the failed file instead of resending 100G again.

I understand that there is room to improve, get rid of the splitting by xbcloud, and utilize s3 multipart upload (Uploading and copying objects using multipart upload in Amazon S3 - Amazon Simple Storage Service). We have to also see if other cloud provider support it and the specifications around it.