Backup MySQL using LVM snapshot

sachin_japate · July 27, 2022, 1:17pm

Hi,

We are running 3 nodes (VM) cluster, 1 Master and 2 Replicas, replicas are enabled with multi-threaded NON-GTID replication. slave_parallel_workers = 8 and slave_parallel_type = LOGICAL_CLOCK. We take a backup from one of the replicas using the LVM snapshot feature.

Here are the steps that we are performing to take backup.

Connect to MySQL on the replica.
Run the FLUSH NO_WRITE_TO_BINLOG TABLES WITH READ LOCK command.
Record SHOW SLAVE STATUS\G command output.
Take lv snapshot: lvcreate --size 1G --snapshot --name n_snapshot /dev/vgroot/lv_mysqldata (/dev/vgroot/lv_mysqldata is LVM where MySQL datadir(/var/lib/mysql) is mounted)
UNLOCK TABLES
Upload snapshot to remote storage;

Backup restoration process in case we want to create another replica.

Restore the latest backup to VM
Start MySQL once the restore is completed
Configure replication using Relay_Master_Log_File and Exec_Master_Log_Pos from the SHOW SLAVE STATUS\G command recorded during backup process.

The issue that we are facing is whenever we restore the backup and setup replication, replication is breaking with an error: Error_code: 1062 Duplicate entry

MySQL configuration:

[mysqld]
  binlog_format = "ROW"
  relay_log_info_repository = "TABLE"
  relay_log_recovery = "ON"
  master_info_repository = "TABLE"
  log_slave_updates = "ON"
  skip_slave_start = "OFF"
  slave_parallel_type = "LOGICAL_CLOCK"
  slave_parallel_workers = "8"
  slave_preserve_commit_order = "1"

Is there any process/command in backup or restore that we are missing, because of which we are getting Duplicate entry issue.

matthewb · July 27, 2022, 4:01pm

Do you only get this issue once or twice? Or is this issue ongoing? If you only get this once, that means your ‘SHOW SLAVE STATUS’ is out of sync with what is currently flushed to disk. FLUSH TABLES does not guarantee (and has never guaranteed) that all currently dirty pages are written to disk.

You might try something more like this: On the source, run SHOW BINARY LOGS; and make note of the current binlog file name. Then go to the replica and run STOP SLAVE; then START SLAVE UNTIL MASTER_LOG_FILE = ‘’, MASTER_LOG_POS = 4; Then go back to source and do ‘FLUSH BINARY LOGS’ to rotate to the next binlog manually. This should then cause the replica to stop at a consistent state. Then run on replica, SET GLOBAL innodb_max_dirty_pages_pct=0, give that a few minutes to flush all pages to disk, then you can run FTWRL and take your snapshot.

OR… to simplify all of that, use Percona Xtrabackup.

sachin_japate · July 28, 2022, 7:05am

@matthewb Thanks for your response.

This issue is ongoing. Is this issue because we are using multi-thread replication or it’s not related to that.

This is what we see in the documentation - MySQL :: MySQL 5.7 Reference Manual :: 13.7.5.34 SHOW SLAVE STATUS Statement

Exec_Master_Log_Pos
Inconsistencies in the sequence of transactions from the relay log which have been executed can cause this value to be a “low-water mark”. In other words, transactions appearing before the position are guaranteed to have committed, but transactions after the position may have committed or not. If these gaps need to be corrected, use START SLAVE UNTIL SQL_AFTER_MTS_GAPS. See Section 16.4.1.32, “Replication and Transaction Inconsistencies” for more information.

This indicates that there might be some transactions committed post Exec_Master_Log_Pos. but not updated the SLAVE STATUS.

Will the below-mentioned steps help in making backup consistent?

Connect to MySQL on the replica.
STOP SLAVE;
START SLAVE UNTIL SQL_AFTER_MTS_GAPS;
Run the FLUSH NO_WRITE_TO_BINLOG TABLES WITH READ LOCK command.
Record SHOW SLAVE STATUS\G command output.
Take lv snapshot: lvcreate --size 1G --snapshot --name n_snapshot /dev/vgroot/lv_mysqldata (/dev/vgroot/lv_mysqldata is LVM where MySQL datadir(/var/lib/mysql) is mounted)
UNLOCK TABLES
START SLAVE;
Upload snapshot to remote storage;

We do use xtrabackup and innobackupex for rebuilding replicas. For some of the scenarios,

Cross DC rebuilds
Creating multiple replicas at the same time.
Cloning cluster for a hardware upgrade.

Restoring from remote backup will be faster in our case and also multiple nodes can be rebuilt at the same time, whereas with the xtrabackup method rebuilds will be sequential.

XtraBackup is based on InnoDB‘s crash-recovery functionality. Xtrabackup works by remembering the LSN when it starts copying data files. At the same time, Xtrabackup runs a background process to copy transactional logs. Once it finishes copying files, Xtrabackup issues FLUSH TABLES WITH READ LOCK to prevent further changes to MySQL‘s data and takes a dump of replication coordinates.

This is similar to what we are also doing to take a backup. when the backup is taken, the entire datadir (data + transactional logs) is copied to remote storage.

We do use this durability setting on all nodes in the cluster

[mysqld]
 innodb_flush_log_at_trx_commit = "1"
 sync_binlog = "1"

Ivan_Groenewold · July 28, 2022, 12:38pm

Hello, the steps you list should help you get past the issue. However please note that the process you are following may create a snapshot that is not consistent. MySQL might fails to start. Even if you flush the tables, there are many innodb background threads doing work. I suggest you consider a different approach as Matthew said, xtrabackup could be a good choice.

sachin_japate · August 5, 2022, 9:01am

Thanks for the suggestion. We will work towards using the Xtrabackup utility for the backup of our MySQL clusters.

We would like to understand how Xtrabackup captures/records the binlog position/GTID set, which is required to configure replication once the backup is restored. Can you point us to a document where it is been captured?

We have the understanding that Xtrabackup remembers LSN and starts copying data files, and at a regular interval it copies transactional log file content to xtrabackup_logfile. we had some more questions, please help us.

What is the frequency at which transactional log file content is copied to xtrabackup_logfile?
At what point does Xtrabackup record binlog position?
Recording binlog position step, does it have any dependency on the performance schema? or positions are captured from the “SHOW SLAVE STATUS” or from mysql.slave_relay_log_info tables.
Does Xtrabackup stop slaves before recording replication coordinates if the backup is taken from the slaves?

Ivan_Groenewold · August 5, 2022, 11:20am

Here’s a page that I believe answers your questions 1-3: How Percona XtraBackup Works - Percona XtraBackup
For question 4 yes, xtrabackup can do that as well see Taking Backups in Replication Environments - Percona XtraBackup

yves.trudeau · May 23, 2023, 9:00pm

Hi,
with multi-threaded replication, if you are not using gtids, you must stop replication around the time you take the snapshot. I had similar issue with EBS based snapshots. Essentially, when you FTWRL, you are blocking some transactions in group commit but, the first transactions may have been executed, hence the duplicate key errors.

sachin_japate · May 24, 2023, 11:28am

Thanks @yves.trudeau, As suggested by @Ivan_Groenewold We are using Xtrabackup tool

Topic		Replies	Views
use backup on slave to rebuild 5.7 slave ,replication report duplicate key error Percona XtraBackup	0	653	March 20, 2019
Unable to create a slave from backup Percona XtraBackup	2	724	November 4, 2014
Percona backup to create secondary mysql 5.7 instance now has many duplicate delete and write errors during replication Percona XtraBackup	2	409	February 5, 2024
New Approach for Backups Other MySQL® Questions	4	455	November 20, 2007
Giving up on xtrabackup Percona XtraBackup	3	633	February 17, 2022

Backup MySQL using LVM snapshot

Related topics