Hi there!
We have a handful of relatively large (5-10TB) MySQL databases spread across multiple datacenters for redundancy. When we want to spin up a new replica node, we utilize xtrabackup
to dump data from the primary, and send to the new replica for its initial sync.
Currently we are utilizing the --stream
option to save time and avoid unnecessarily filling disk space on the primary node, but often run into concerns over how long it can take to stream several TB over the internet (when going between datacenters).
We’ve introduced compression to help with this, but currently we have to stream compressed data to the replica, then wait for that compressed data to be decompressed, before it can be prepared and synced. This is not a huge concern, but decompression can take a notable amount of time, which adds anxiety when running up against our transaction log retention limit.
So as the title asks, I’m curious if there is any way to decompress our stream in realtime on the replica, as it is received?
Right now our command for streaming to a new replica, as executed on the primary node, is as follows:
xtrabackup --backup --stream=xbstream --compress | ssh user@replica "cd ~/xtrabackup_backupfiles && xbstream -x"
However, as is, this results in a compressed dataset in ~/xtrabackup_backupfiles on the replica when the stream concludes, requiring additional time spent decompressing, during which time the replica falls further behind the primary node, adding even more time to catch up before it is finally in sync.
I have tried a few variations of commands on the replica I thought might work, e.g.
... cd ~/xtrabackup_backupfiles && mbuffer -m 1G | zstd -d | xbstream -x
… which fails because zstd
can’t directly parse the xbstream data (I assume because xbstream only compresses the data, and not the metadata). Or:
... cd ~/xtrabackup_backupfiles && xbstream -x -c - | mbuffer -m 1G | zstd -d
… which would theoretically parse the xbstream format first, potentially solving the previous issue, but -x
and -c
can’t be paired in xbstream
.
Is there some magical combination anyone is aware of that would allow realtime decompression so my replica is ready to prepare as soon as the stream concludes?