Hello Team,
We are running Percona XtraBackup against MySQL 8 in a replication / HA setup. We use our HSB(Hot Standby) to rebuild read replicas (restore + CHANGE REPLICATION SOURCE / CHANGE MASTER from xtrabackup_slave_info as applicable).
We are hitting an operational conflict between consistency guidance (we use --safe-slave-backup along with --slave-info during rebuild) and availability (HA for Master), and we are not seeing a single straightforward option that satisfies both.
Environment:
-
MySQL: Percona Mysql Server 8.0.35-27
-
Topology: Traditional Mysql async replication with, Master, HSB , backup node and ReadReplicas
-
XtraBackup: percona-xtrabackup-80 (8.0.34-29-1.bookworm)
-
OS / orchestration: Deb 12(12.13)
-
Binlog format: [ROW] (For some other clusters we have Mixed and Statement format as well)
-
Storage engines: For the impacted cluster all tables are Innodb only(But we have some other clusters where its mix of innodb as well as MyISAM engine tables)
-
We rebuild replicas with XtraBackup options including --slave-info and --safe-slave-backup.
**
Observed problem:**
- Replication stops on HSB during backup: When a read replica is being rebuilt using XtraBackup sourced from the HSB node, the --safe-slave-backup option causes the SQL replication thread on the HSB to stop and wait(sometime indefinitely). This makes the HSB unavailable as a HA failover candidate for the master during this window, causing master HA unavailability.
- Rebuild failures: In some cases, the rebuild itself fails , likely due to the prolonged replication stop and timeout conditions.
We understand from the percona documentation that --safe-slave-backup stops the replication SQL thread and waits until Slave_open_temp_tables is 0, and that as of XtraBackup 8.0.22-15.0 the SQL thread is stopped before copying InnoDB files, whereas in older versions the stop happened later in the flow. That behavior change materially increases how long replication apply is paused on the backup replica, which may amplify our HA / orchestration sensitivity.
**
What we are unsure about:**
-
Recommended practice in 2026 for MySQL 8 cluster to use PXB without any downtime or replication stop.
-
Under what conditions is --safe-slave-backup still considered mandatory vs optional?
-
Are there specific signals (e.g. long-lived Slave_open_temp_tables, MIXED/STATEMENT binlog formats, explicit temp table patterns) that should force keeping it?
-
For mixed tables (if any MyISAM remains along with Innodb):
If we remove --safe-slave-backup to protect HA, what is the supported / accepted mitigation for non-InnoDB consistency?
Timeouts / failure modes:
- If Slave_open_temp_tables does not reach zero within --safe-slave-backup-timeout, backup fails , what are best practices to avoid flapping backups that interact badly with automation?
The main question is,Is --safe-slave-backup completely safe to remove in a MySQL 8.0 cluster(clusters with all innodb tables vs cluster with Innodb + MyISAM tables), if not what is the safe alternative here without any downtime OR replication stop?
Thanks in Advance.