and when I restore this fully backed up db, in restore db, I can see a higher number of lsn values.
mysql> show global status like “%lsn%”;
±---------------------------±---------+
| Variable_name | Value |
±---------------------------±---------+
| Innodb_lsn_current | 12380925 |
| Innodb_lsn_flushed | 12380925 |
| Innodb_lsn_last_checkpoint | 12380916 |
±---------------------------±---------+
3 rows in set (0.00 sec)
I was expecting there will be the same lsn values when I took a backup but here I am seeing different values after restoring. Can anyone please help in understanding this?
We have a documentation section that explains how xtrabackup works. A small snippet from that:
This works because InnoDB maintains a redo log, also called the transaction log. This contains a record of every change to InnoDB data. When InnoDB starts, it inspects the data files and the transaction log, and performs two steps. It applies committed transaction log entries to the data files, and it performs an undo operation on any transactions that modified data but did not commit.
I understand your expectation of matching the LSN, but we need to know that backup is also collecting changes while it is executing. When we restore the backup, it will execute the changes from xtrabackup_log_file apply-log section.
If your database is idle and no changes are being made, you should see the LSN number matching. But if your data changes between start and end of the backup, those changes will need to be applied to get them in sync and hence you see a slightly different LSN.
Let me explain what I tried and what was my expectation
Application is not connected, everything we are doing in backend i.e., directly on DB nodes
Setup 3 node xtradb cluster
Generated some data into DB
Took backup using xtrabackup and noted down lsn values from checkpoints file
After this or during backup, we didn’t try to load any data into db, like it was idle
Now, in another cluster,
Stopped 3 nodes and emptied /var/lib/mysql folder
We restored the backed up files using prepare and copy back commands
Started all the 3 nodes
Observed for lsn values in DB using status values
Saw different values i.e., higher values of lsn than the backedup lsn
I guess my expectation of matching values is a genuine case right?
Actually, I need to do a data integrity check. As a part of this, I am thinking, my data whichever I restored is the same as when I took the backup. Definitely, I can’t go and check the count of individual tables after backup and after restore as we are doing hot backup. So, what I am planning here, take lsn count as a reference and use it for data integrity.
If lsn is not feasible then, do you have any alternate option Marc?
LSN will definitely not work on your case. A simple removal of redo logs and recreating them ( which is what xtrabackup does ) will advance the LSN of that datadir. Further more (see below) even a shutdown advances your LSN:
🐬 marcelo 📂 /work/ps/ins/8.0 ▶
╰▶ $ mysql -e "SHOW GLOBAL STATUS LIKE 'Innodb_redo_log_c%lsn'"
+--------------------------------+----------+
| Variable_name | Value |
+--------------------------------+----------+
| Innodb_redo_log_checkpoint_lsn | 19549482 |
| Innodb_redo_log_current_lsn | 19549482 |
+--------------------------------+----------+
🐬 marcelo 📂 /work/ps/ins/8.0 ▶
╰▶ $ mysql -e "shutdown"
🐬 marcelo 📂 /work/ps/ins/8.0 ▶
╰▶ $ bin/mysqld --defaults-file=my.cnf &
[1] 1155016
🐬 marcelo 📂 /work/ps/ins/8.0 ▶
╰▶ $ mysql -e "SHOW GLOBAL STATUS LIKE 'Innodb_redo_log_c%lsn'"
+--------------------------------+----------+
| Variable_name | Value |
+--------------------------------+----------+
| Innodb_redo_log_checkpoint_lsn | 19599786 |
| Innodb_redo_log_current_lsn | 19599786 |
+--------------------------------+----------+
A best practices will be to start one node as slave of the source server and run pt-table-checksum.
Does this tool really applicable in xtradb cluster (where all are primary)?
And I need to check the values of checksums from one DB node of cluster1 to at least one db node in another cluster say cluster2.
It works on pxc, however for that you will have to configure one of the nodes from cluster2 as a replica of cluster1. Which essentially means that cluster 2 will receive all new data from cluster1. If I understand correctly is not what you want, you want two independent clusters.
Another approach you can do is to just boot one node as replicate, run pt-table checksum and if it yields the replicate data is 100% consistent with source, you can then consider the backup consistent, ditch the replica node and use the backup to provision your new cluster.