Slow incremental backups even with enabled `innodb_track_changed_pages`

I am running full/incremental backups using my onlime / xtrabackup-helper · GitLab XtraBackup wrapper which I have just published. It simply runs the following:

# full backup
$ xtrabackup --backup --target-dir=/xtrabackup/base
# incremental backup
$ xtrabackup --backup --target-dir=/xtrabackup/inc --incremental-basedir=/xtrabackup/base

running this on a Debian Buster system with Percona Server for MySQL 8.0.22, with the following setup:

$ xtrabackup --version
xtrabackup: recognized server arguments: --datadir=/var/lib/mysql --tmpdir=/var/tmpfs --open_files_limit=10000 --server-id=1 --log_bin=0 --innodb_file_per_table=1 --innodb_open_files=4096 --innodb_buffer_pool_size=40G --innodb_log_file_size=128M --innodb_io_capacity=10000 --open_files_limit=100000 --innodb_open_files=100000 
xtrabackup version 8.0.22-15 based on MySQL server 8.0.22 Linux (x86_64) (revision id: fea8a0e)

/var/lib/mysql currently is 8.3GB and a full xtrabackup takes just 1:45min, but every subsequent incremental backup (after 2hrs) takes 20 - 22mins.

According to XtraDB changed page tracking — Percona Server 8.0 Documentation I have enabled innodb_track_changed_pages and /var/lib/mysql/ib_modified_log_*.xdb files are around. Still, this did not improve incremental backup speed by any ways.

[mysqld]
loose_innodb_track_changed_pages = 1
mysql> SHOW VARIABLES LIKE 'innodb_track_changed_pages';
+----------------------------+-------+
| Variable_name              | Value |
+----------------------------+-------+
| innodb_track_changed_pages | ON    |
+----------------------------+-------+

Why are incremental backups so slow? Does xtrabackup in my setup somehow not respect innodb_track_changed_pages?

With such long incremental backup times, I am often running into Avoiding the "An optimized (without redo logging) DDL operation has been performed" Error with Percona XtraBackup - Percona Database Performance Blog which causes aborted backups with An optimized (without redo logging) DDL operation has been performed. error.

1 Like

Hi @onlime .
20 min for a 8GB database seems a lot. Have you tried the same incremental with page tracking disabled ?
Can you please share a full log of xtrabackup while its taking the incremental so we can validate where it is spending its time?

With such long incremental backup times, I am often running into Avoiding the “An optimized (without redo logging) DDL operation has been performed” Error with Percona XtraBackup - Percona Database Performance Blog which causes aborted backups with An optimized (without redo logging) DDL operation has been performed. error.

Regarding this one, you should be using --lock-ddl while taking the backups. This is the safest way of getting a consistent backup - Please check https://www.percona.com/blog/2020/12/22/redesign-of-lock-ddl-per-table-in-percona-xtrabackup/

1 Like

Thanks @Marcelo_Altmann

20 min for 8GB is really way to much, especially since a full backup only takes 2.5 mins and the server (LXC container) is running on fast SSD local storage (Intel SSD DC S3610 / ZFS raidz1). But I am not talking about a single 8GB database - it’s a shared hosting environment with a total of 50’000 tables (100% InnoDB, no MyISAM) in 700 databases. If I check incremental xtrabackup logs, I see 51’000 such copy operations:

Copying ./db1/tbl1.ibd to /xtrabackup/inc.part/db1/tbl1.ibd.delta

As a full backup is done in 2.5mins, I don’t think this is an IO issue, rather xtrabackup spending a lot of time for delta calculations for so many tables.

I could provide you ssh access to the logs, if you provide me your contact / email and ssh pubkey via DM. Cannot publish the logs here.

About the second issue: Great article Redesign of -lock-ddl-per-table in Percona XtraBackup - Percona Database Performance Blog and nice to hear that this has been improved with the new --lock-ddl. But the whole blog post is full of technical details and I don’t quite get the essence out of it. For me (sorry if I did not study the article deeply enough) it still leaves the following important questions open:

  • Does --lock-ddl not block any DML at runtime?
  • Does --lock-ddl only lock DDL on table level or on full table schema (database) level? If on schema level, this could still take quite some time considering the slow incremental backups.
  • Or does --lock-ddl even lock all DDL during the whole xtrabackup run? (so it would not be a real replacement for the legacy --lock-ddl-per-table)

I am still hesitating to use --lock-ddl as I don’t want to harm my customers in any ways, production should not be affected by XtraBackup runs. If XtraBackup cannot guarantee consistent backups without negative impact due to some locking, it is no solution for our use case and I would need to count on ZFS full system snapshots (LXC containers run on their own ZFS datasets) instead, which should also deliver a consistent state of MySQL data.

1 Like

Hi @onlime

  • Does --lock-ddl not block any DML at runtime?

I blocks DDL, the only issue you might see is in case of a DDL on table A gets blocked, all subsequent DML will wait until the DDL (and the backup) has completed.

  • Does --lock-ddl only lock DDL on table level or on full table schema (database) level? If on schema level, this could still take quite some time considering the slow incremental backups.
  • Or does --lock-ddl even lock all DDL during the whole xtrabackup run? (so it would not be a real replacement for the legacy --lock-ddl-per-table )

It blocks all DDL operations. CREATE DATABASE been a DDL it will get blocked too

Have you tried the same incremental with page tracking disabled ?

Regarding my above question, have you tried without page tracking to validate the timing?

I could provide you ssh access to the logs, if you provide me your contact / email and ssh pubkey via DM. Cannot publish the logs here.

I would rather prefer if you could share this with a link to some pastebin service.

1 Like

Hi @Marcelo_Altmann
thanks for your reply and sorry for my late feedback. I never found the time to anonymize an incremental Xtrabackup log to share via Pastebin, and anyway I gave up on incremental backups.
It really looked like running incremental backups with Xtrabackup does not perform well enough for our use case with > 50’000 tables. When you guys enabled --lock-ddl by default in Percona XtraBackup 8.0.23-16.0 which now seems to be the recommended way, running incremental backups was no option for us anymore, so we now just stick with nightly full backups.
On a regular webserver, this runs through in < 10s, and on our largest shared hosting server, it takes < 2mins, so locking DDL at night time with a not so international customer base is acceptable. In addition, we are running frequent snapshot based (ZFS) full-system backups, which should also be consistent.
Cheers,
Philip

1 Like