Avoiding FLUSH TABLES during backup creation

We noticed that our system sometimes gets stuck while the database (MySQL 8.0.12) has many queries stuck in the state waiting for table flush. It seems that this is caused by xtrabackup (version 8.0.9) running FLUSH TABLES statements when creating a backup.

I tried to reproduce the problem locally, so what I did was I installed xtrabackup and MySQL my machine (both with the same versions as on the production system). Then I ran the following query on my local MySQL database to simulate a long-running select statement:

create table a (id int not null);
insert into a (id) values (1);
-- sleep(60) to simulate a long-running query, fetch from table b to keep the table open
select sleep(60), id from b

I have verified that while the statement above is running, FLUSH TABLES cannot complete.

Next, I ran xtrabackup:

sudo xtrabackup --datadir=/tmp/var/lib/mysql --user=root --host=127.0.0.1 --port 3306 --password=<redacted>  --backup --target-dir=/data/backup

Surprisingly, xtrabackup is not blocked by the select-statement. The select query is still running even after the backup has completed. Which is great in general, but it means I’m not able to reproduce the problem we have on our production server on my local machine.

So my question is, under what circumstances is xtrabackup able to finish the backup without running any FLUSH TABLES statement? Do you have any other hints or ideas what we could tweak to make it less likely that xtrabackup interrupts our production server due to FLUSH TABLES? I’ve already looked into the --ftwrl-wait-query-type and --kill-long-queries-timeout options, which sound very promising. But if possible, I would prefer a solution where FLUSH TABLES does not happen in the first place.

Thanks in advance.

1 Like

Hello @Fabian_Muscariello,
Are you using Percona Server MySQL? If not, there is no workaround. Percona Server has enhanced backup capabilities that are not available in Community MySQL.

If you are using PS8 and Xtrabackup 8 and still getting FTWRL issues, then we need to see more output from your backup process.

As a side note, SELECT’s never block other reads and never block writers in InnoDB; that’s why you didn’t see the issue with your SLEEP(60). (unless FKs are used)

2 Likes

Thanks for your reply @matthewb

We’re not using Percona Server MySQL, so I guess we have to live with xtrabackup running FLUSH TABLES statements.

During our most recent outage, I noticed the following statement in the slow queries log:

FLUSH TABLES WITH READ LOCK;

The Query time is about 62 seconds. We have seen similar patterns before, so I’m fairly confident that there is a connection between those FLUSH TABLES statements and our outages.

Does that mean that, when using xtrabackup on a stock MySQL (as opposed to Percona Server MySQL) database, your database is blocked for write access for at least a brief moment (more than just a few seconds)?

I’m just wondering how to handle those outages. If possible, we would like to improve our backup method so that our database remains accessible for both read and write access during the backup creation, but perhaps it’s necessary to use a hot-standby (or a read-only replica or whatever) and create our backups from this replica.

1 Like

Correct. Percona Server is a 100% completely free, completely open-source, drop-in replacement for Community MySQL. Any particular reason you can’t adopt?

The FTWRL is necessary to backup MySQL servers with both MyISAM and InnoDB tables. Even in 5.7, there are some MyISAM tables (system tables). If you can guarantee that these tables won’t change during the backup, you can tell xtrabackup not to use any locks. This will still get you a consistent backup for all InnoDB tables without blocking any reads/writes.

2 Likes

Thanks again for your help. I’m going to discuss with my team which option to choose, if we use adopt Percona Server, if we use xtrabackup without locks, or something else entirely.

1 Like

@Fabian_Muscariello are you using a dedicated backup replica to take the backups from? If not, that is a best-practice approach so that you can take reliable backups without suffering any issues from application traffic, or equally causing issues when taking locks during backups.