Xtrabackup 2.4 error last checkpoint LSN is larger than last copied LSN

Hi,
I am having strange problem that is happening sometimes during my nightly backups. Backup does not complete OK and have the error in the end:
xtrabackup: error: last checkpoint LSN (109863974838) is larger than last copied LSN (109863967574).

Here is my “setup”
db-4 is read only slave server running running MySQL 5.7.
db-restore is the server which runs nightly cronjob script using xtrabackup (percona-xtrabackup-24-2.4.24-1.el8.x86_64) to copy the databases from db-4 to db-restore. datadir from db-4 is shared via nfs (ro,no_root_squash) and mounted on db-restore.
Both servers are in same network, OS is Oracle Linux 8.5. On db-4 I have three mysql instances. Each of them is copied (one after another) during the night to db-restore. Only one of the instances is failing. And the strange thing is that the smallest and mostly inactive instance is failing.
During the day, if I run the same backup on the same instance manually it completes successfully.

Below is the log with removed most of the bulk repeating rows and tables. (Sorry, I am new user and have no permissions to attach yet).

xtrabackup: recognized server arguments: --datadir=/srv/config/nfsmount --parallel=6
xtrabackup: recognized client arguments: --password=* --backup=1 --host=192.168.111.41 --port=3316 --user=backup --target-dir=/srv/config/backup --safe-slave-backup=1 --slave-info=1
220218 00:45:02 version_check Connecting to MySQL server with DSN ‘dbi:mysql:;mysql_read_default_group=xtrabackup;host=192.168.111.41;port=3316’ as ‘backup’ (using password: YES).
220218 00:45:02 version_check Connected to MySQL server
220218 00:45:02 version_check Executing a version check against the server…
220218 00:45:02 version_check Done.
220218 00:45:02 Connecting to MySQL server host: 192.168.111.41, user: backup, password: set, port: 3316, socket: not set
Using server version 5.7.37-log
Warning: MySQL variable ‘datadir’ points to nonexistent directory ‘/srv/cpanel/data/’
Warning: option ‘datadir’ has different values:
‘/srv/config/nfsmount’ in defaults file
‘/srv/cpanel/data/’ in SHOW VARIABLES
/usr/bin/xtrabackup version 2.4.24 based on MySQL server 5.7.35 Linux (x86_64) (revision id: b4ee263)
xtrabackup: uses posix_fadvise().
xtrabackup: cd to /srv/config/nfsmount
xtrabackup: open files limit requested 0, set to 1024
xtrabackup: using the following InnoDB configuration:
xtrabackup: innodb_data_home_dir = .
xtrabackup: innodb_data_file_path = ibdata1:12M:autoextend
xtrabackup: innodb_log_group_home_dir = ./
xtrabackup: innodb_log_files_in_group = 2
xtrabackup: innodb_log_file_size = 50331648
InnoDB: Number of pools: 1
220218 00:45:02 >> log scanned up to (109863683631)
InnoDB: Opened 2 undo tablespaces
InnoDB: 0 undo tablespaces made active
xtrabackup: Generating a list of tablespaces
InnoDB: Allocated tablespace ID 34 for mysql/servers, old maximum was 2
xtrabackup: Starting 6 threads for parallel data files transfer
220218 00:45:02 [04] Copying ./mysql/servers.ibd to /srv/config/backup/mysql/servers.ibd
220218 00:45:02 [05] Copying ./mysql/plugin.ibd to /srv/config/backup/mysql/plugin.ibd
220218 00:45:02 [01] Copying ./ibdata1 to /srv/config/backup/ibdata1

220218 00:47:43 >> log scanned up to (109863967574)
220218 00:47:44 >> log scanned up to (109863967574)
220218 00:47:44 Slave open temp tables: 0
220218 00:47:44 Slave is safe to backup.
220218 00:47:44 Executing FLUSH NO_WRITE_TO_BINLOG TABLES…
220218 00:47:44 Executing FLUSH TABLES WITH READ LOCK…
220218 00:47:44 Starting to backup non-InnoDB tables and files
220218 00:47:44 [01] Copying ./mysql/db.opt to /srv/config/backup/mysql/db.opt
220218 00:47:44 [01] …done

220218 00:47:45 [01] …done
220218 00:47:45 Finished backing up non-InnoDB tables and files
220218 00:47:45 [00] Writing /srv/config/backup/xtrabackup_slave_info
220218 00:47:45 [00] …done
220218 00:47:45 [00] Writing /srv/config/backup/xtrabackup_binlog_info
220218 00:47:45 [00] …done
220218 00:47:45 Executing FLUSH NO_WRITE_TO_BINLOG ENGINE LOGS…
xtrabackup: The latest check point (for incremental): ‘109863974838’
xtrabackup: Stopping log copying thread.
.220218 00:47:45 >> log scanned up to (109863967574)

220218 00:47:45 Executing UNLOCK TABLES
220218 00:47:45 All tables unlocked
Starting slave SQL thread
220218 00:47:45 Backup created in directory ‘/srv/config/backup/’
MySQL binlog position: filename ‘ulo-db4-cpanel-bin.000111’, position ‘375017735’, GTID of the last change ‘1b39f160-b43d-3316-1dc1-0050569ccaa1:1-57363962,
1b4a83f7-0000-3316-2dc1-0050569ccaa1:1-5000840,
1b4a83f7-b43d-3316-2dc1-0050569ccaa1:1-123972,
1b75d65b-0000-3316-5dc1-0050569ccaa1:1-2,
1b8220fc-0000-3316-1dc2-0050569ccaa1:1-2745845,
1b8220fc-b43d-3316-1dc2-0050569ccaa1:1-380600,
1b8ec709-0000-3316-2dc2-0050569ccaa1:1-131440,
1b8ec709-b43d-3316-2dc2-0050569ccaa1:1-102,
ff9c1e99-0000-3316-2dc1-22c50dd5f44b:1-518708’
MySQL slave binlog position: master host ‘192.168.111.21’, purge list ‘1b39f160-b43d-3316-1dc1-0050569ccaa1:1-57363962, 1b4a83f7-0000-3316-2dc1-0050569ccaa1:1-5000840, 1b4a83f7-b43d-3316-2dc1-0050569ccaa1:1-123972, 1b75d65b-0000-3316-5dc1-0050569ccaa1:1-2, 1b8220fc-0000-3316-1dc2-0050569ccaa1:1-2745845, 1b8220fc-b43d-3316-1dc2-0050569ccaa1:1-380600, 1b8ec709-0000-3316-2dc2-0050569ccaa1:1-131440, 1b8ec709-b43d-3316-2dc2-0050569ccaa1:1-102, ff9c1e99-0000-3316-2dc1-22c50dd5f44b:1-518708’, channel name: ‘u2’

220218 00:47:45 [00] Writing /srv/config/backup/backup-my.cnf
220218 00:47:45 [00] …done
220218 00:47:45 [00] Writing /srv/config/backup/xtrabackup_info
220218 00:47:45 [00] …done
xtrabackup: Transaction log of lsn (109863677198) to (109863967574) was copied.
xtrabackup: error: last checkpoint LSN (109863974838) is larger than last copied LSN (109863967574).

1 Like

No one can help here?
I changed the number of parallel threads from 6 to 2, and for three days there is no problem.
Then I made parallel again 6, and still no problem. Very strange, I am not sure that this is actually xtrabackup issue. Or if it is, I cannot replicate or understand the conditions causing it.

1 Like

Your InnoDB redo log file size is too small, so it is rewritten while backup was running. You need to increase innodb_log_file_size or take backup in less busy time.

1 Like

Hi and thanks for the reply.

However, I dont think that the log_file_size is the problem here. This is small and very lightly loaded instance. During the the backup there are less than 200 rows modified.
delete - 0 rows
update - 1 rows
insert - 185 rows

The default 2x48MB log space I believe should be enough.
p.s. I dont have monitoring on the backed up server, but it is read only slave and I have monitor on the master server (and it has the same log_file_size configuration). I checked it and for the night the checkpoint writes are very very low - 3KB/s, and the max checkpoing lag is 30KB.


Anyway I will increase log_file_size and will monitor the backups.

2 Likes

changing log_file_size from 48 to 128MB, did not help, it failed again :frowning:

1 Like

The “last checkpoint LSN” is an LSN before backup started. If it is larger than the last copied LSN this means that either log files were overwritten or something went terribly wrong.

I noticed two issues with your XtraBackup run.

Warning: MySQL variable ‘datadir’ points to nonexistent directory ‘/srv/cpanel/data/’
Warning: option ‘datadir’ has different values:
‘/srv/config/nfsmount’ in defaults file
‘/srv/cpanel/data/’ in SHOW VARIABLES

Please check why datadir variable in SHOW VARIABLES points to not existent directory.

If you use NFS note that it cannot provide a reliable answer for XtraBackup and all kinds of inconsistency issues could happen.

Also, you are using MySQL 5.7.37 while the latest version of Percona Server is 5.7.36. While XtraBackup should work for MySQL Community 5.7.37 and I do not see any backward incompatibility changes in the release notes it would be good if you check how it works with the same datadir and load with Percona Server 5.7.36.

1 Like

In my opening post I explained the configuration.

About datadir warning:
This is because I run xtrabackup on a remote computer. Mysql server datadir /srv/cpanel/data/ is mounted via nfs in /srv/config/nfsmount (on the computer running xtrabackup). And I set this as a argument to xtrabackup --datadir=/srv/config/nfsmount

About the versions:
I have two other instances on the same source server, using the same mysqld 5.7.37, and they do not have problems with xtrabackup.

1 Like

While it is an unlikely reason for the redo log file issue, XtraBackup is designed to run locally.

I suggest you try a larger redo log file size or try running XtraBackup on the idle instance.

Forgot to ask earlier.

So if this is the case why the backup is not aborted in the begining when it starts to scan the logs?

Redo logs are not a problem, I already tried increasing them and showed that this does not make things any better. This instance is only 60G. If it does not fail with error, backup completes for 3 minutes.

Seems we are heading a dead end here. Is there anything useful I could try to debug or report more details or whatever without changing generally my setup that has been so far flawlessly.
Hm, in that relation, I think that the issue started to happen when I switched from centos 7 to oracle linux 8 (on both source and destination server). But it is strange, that the other two instances on the same servers are not affected.

We have mysql enterprise subscription and mysql enterprise backup, but few years ago when I was implementing our backups, I chose percona xtrabackup and I would like to stay with it.