Xtrabackup fails intermittently for large database

Hi,

We have large database of around 8TBs and the backup fails intermittently with below error. Full database backups runs every day and the it would fail with below error once or twice a week. Backups are getting streamed to S3 bucket using xbstream and it is a on-prem system and not EC2 or GCP.

~~handler failed with backup failed with errors: 2024-08-19T00:25:01.046033-00:00 6 [ERROR] [MY-011825] [Xtrabackup] failed to copy datafile ./dbname/dbf.ibd

This happens intermittently, we take backups from replicas, so tried taking backups from multiple replicas but same issue on all.

Server version: 8.0.35-27 Percona Server (GPL)
xtrabackup version 8.0.35-30 based on MySQL server 8.0.35 Linux (x86_64)

Please let me if there is any solution to this.

If you are streaming to S3, you should be using XBCloud to handle that aspect. PXB and XBCloud both have improved parallel channels/FIFO support to maximize network throughput. Check out the docs and blog for how to use this.

Hi,

Thanks for the update but we cannot use the mentioned feature as it still on tech preview which is not allowed in our org. Please let me know if you have any other ways by which we can make the backups work. We are using parallel as 16 and --read-buffer-size=1G.

xbcloud is not technical preview; it is GA. The multi FIFO channels are in tech preview. You should be using xbcloud at minimum to stream into S3. With an 8TB dataset, if I was the DBA, I would be using EBS snapshots as my backup solution. You can freeze MySQL (FLUSH TABLES WITH READ LOCK), and then call EBS snapshot from command-line. This is a 1-liner example:

$ mysql -e "FLUSH TABLES WITH READ LOCK; SHOW REPLICA STATUS;
  \! aws ebs snapshot --volume mysql1-data"
+---------------+-----------+--------------+------------------+-------------------+
| File          | Position  | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set |
+---------------+-----------+--------------+------------------+-------------------+
| binlog.000002 | 110295083 |              |                  |                   |
+---------------+-----------+--------------+------------------+-------------------+

EBS snapshots are stored in S3. Data recovery is simply recreating an EBS volume from the snapshot and starting MySQL. InnoDB will do crash recovery and then start up.

Hi,

I am aware that xbcloud is not in tech preview, I was talking about the FIFO channels only. We are not using aws s3 to store backups so can’t use EBS snapshot. Our backup goes to a different bucket and it directly gets uploaded using xbcloud.