Restore errors when piping xblcoud to xbstream and using parallel options

We’re using MySQL version 5.7.34 and xbstream/xbcloud 2.4.24.

We’re testing taking backups using xbcloud for backups larger than the S3 object size limit (5TB). When restoring backups using xbcloud we’re seeing a few different errors.

The general command that we’re using is:

xbcloud get --parallel=16 --storage=s3 --s3-endpoint=s3.amazonaws.com:443 --s3-bucket=<bucket> <backupname> | sudo -u mysql xbstream --extract --decrypt=aes256 --encrypt-key-file=<key file> --parallel=1 --encrypt-threads=1 --directory=/var/lib/mysql/

And the backup restores fine. However, we’re trying to speed up the process by setting the --parallel and --encrypt-threads options on xbstream similarly to our existing non-xbcloud restore process and we being to see a few different types of errors:

xbcloud get --parallel=16 --storage=s3 --s3-endpoint=s3.amazonaws.com:443 --s3-bucket=<bucket> <backupname> | sudo -u mysql xbstream --extract --decrypt=aes256 --encrypt-key-file=<key file> --parallel=8 --encrypt-threads=8 --directory=/var/lib/mysql/

xbstream:xb_crypt_decrypt invalid plaintext hash. Wrong encrytion key specified?
decrypt: failed to decrypt chunk.
xbstream: my_write() failed.
decrypt: failed to decrypt chunk.
xbstream: my_write() failed.
decrypt: failed to decrypt chunk.
xbstream: my_write() failed.
decrypt: failed to decrypt chunk.
xbstream: my_write() failed.
decrypt: failed to decrypt chunk.
xbstream: my_write() failed.
decrypt: failed to decrypt chunk.
xbstream: my_write() failed.
decrypt: failed to decrypt chunk.
xbstream: my_write() failed.
decrypt: failed to decrypt chunk.
xbstream: my_write() failed.
decrypt: incomplete data, 60 bytes are still not decrypted.
decrypt: incomplete data, 60 bytes are still not decrypted.
decrypt: incomplete data, 60 bytes are still not decrypted.
decrypt: incomplete data, 60 bytes are still not decrypted.

Another type of error I found when testing different values:

xbcloud get --parallel=4 --storage=s3 --s3-endpoint=s3.amazonaws.com:443 --s3-bucket=<bucket> <backupname> | sudo -u mysql xbstream --extract --decrypt=aes256 --encrypt-key-file=<key file> --parallel=16 --encrypt-threads=16 --directory=/var/lib/mysql/

211005 18:38:12 [04] decompressing ./sbtest/customer2.ibd.qp
qpress: Data error, not recoverable
qpress: Source file is corrupted - try the -R flag to recover
cat: write error: Broken pipe
211005 18:38:13 [04] removing ./sbtest/customer2.ibd.qp
211005 18:38:13 [04] decompressing ./sbtest/stock3.ibd.qp
211005 18:38:17 [04] removing ./sbtest/stock3.ibd.qp
211005 18:38:17 [04] decompressing ./sbtest/stock7.ibd.qp
qpress: Data error, not recoverable
qpress: Source file is corrupted - try the -R flag to recover
cat: write error: Broken pipe
Error: decrypt and decompress thread 0 failed.
Error: decrypt and decompress thread 1 failed.
Error: decrypt and decompress thread 2 failed.
Error: decrypt and decompress thread 3 failed.
Error: decrypt and decompress thread 4 failed.

Is there a different way we should be constructing the restore commands to make use of the parallel/encrypt-threads options?

1 Like

Hi There,

Let me clarify a few things here:

  1. You are doing backups where a single file at source is bigger than 5TB?
    I’m not sure if this will interfere with anything apart from having more chunks been uploaded, since xbstream will split the file into 10M chunk before sending it to xbcloud.

  2. What happens if you do this in two parts, first xbcloud > somefile then xbstream it. Does it work?

  3. PXB 2.4.24 has implemented retries on xbcloud operations. Can you either send us the full logs for xbcloud and xbstream or validate if around the time where xbstream fails to decrypt / decompress data xbcloud faced a network issue and had to retry any chunk?

  4. Does it happens always on the same file or there is no pattern?

1 Like

You are doing backups where a single file at source is bigger than 5TB?

The backup produced by xtrabackup and xbstream is over 5TB, so we can’t put that individual file in S3 directly. So, we’re using xbcloud to break up that file.

What happens if you do this in two parts, first xbcloud > somefile then xbstream it. Does it work?

That did not change the results, the same errors occurred.

PXB 2.4.24 has implemented retries on xbcloud operations. Can you either send us the full logs for xbcloud and xbstream or validate if around the time where xbstream fails to decrypt / decompress data xbcloud faced a network issue and had to retry any chunk?

This doesn’t look to be occurring at all from the xbcloud output and also given that xbcloud > file; xbstream < file resulted in the same behavior. And the fact that when running xbstream as long as either --parallel=1 or --encrypt-threads=1 is 100% reliable.

Does it happens always on the same file or there is no pattern?

For a given backup that causes errors, it’s consistent. We have a smaller cluster that is only 16gb in size and a backup for day 1 will restore successfully, but a backup for day 2 will consistently not restore. For our backups that are larger (terabytes in size) they consistently have issues restoring.

1 Like

Hi @nickcanz
This seems like a bug since you can systematically reproduce it.
Can I ask you to raise a JIRA ticket describing this situation, so we can further investigate it on our side?

https://jira.percona.com/projects/PXB/

Thanks

1 Like

Thanks @Marcelo_Altmann, I’ve created a ticket here: [PXB-2618] Restore errors when piping xblcoud to xbstream and using parallel options - Percona JIRA

2 Likes