I wonder how Percona Xtrabackup copy the data file

As far as I know, Percona Xtrabackup has a mechanism that allows users to continue working with the database while it copies data. I know it doesn’t lock the database. So how can it copy data while the files are still changing? I wonder what copy mechanism is used here? Or is there a mechanism to continue writing data elsewhere? This confuses me. Help me explain it.

@nguy_n_d_c_vinh_B20D Hello this should help you How Percona XtraBackup works - Percona XtraBackup

I’ve read it. But I haven’t seen any mention of Percona Xtrabackup’s copy mechanism. My question is how can it copy data while the files are still changing? Does it use the Linux copy mechanism or some other mechanism?

Hi @nguy_n_d_c_vinh_B20D

In a high level, we need to clarify how InnoDB works in terms of writing data.

  1. Data is always written in memory in the server and also written sequentially in disk into the WAL (redo log).
  2. Periodically, InnoDB page cleaner scans an internal list that contains all the pages that are dirty ( have been modified) writes those pages back to disk, and advances the WAL checkpoint.

Comming to Percona XtraBackup:

  1. PXB has a copy thread that copies the files, page by page (usually 16K)
  2. In order to prevent the situation where we are copying the page and at the exact same time the page cleaner is modifying the same page, PXB validates the data it has read with the checksum stored in the page. If they don’t match it either means the page is corrupted or the server was in the middle of writing this page. It attempts to recopy and revalidate the page 10 times before aborting the backup.
  3. At the same time, PXB has a redo follow thread, that keeps reading new WAL entries. Those entries are later applied during --prepare phase. This prevents the situation where PXB reads a dirty page that was not yet flushed to disk. It brings it back to a consistent state by applying the redos.

I hope it clarifies your questions.