Giving up on xtrabackup

After days of failing to track down the reason for “you may have a corrupt database” and “lsn is in the future” errors, I am throwing in the towel. This is a big disappointment as I have looked forward to switching to xtrabackup for years.

Instead, I am going to use the following strategy:

  1. flush tables with read lock
  2. flush logs
  3. take LVM snapshot of the whole MySQL directory structure
  4. release the table lock
  5. rsync the snapshot to a separate server.

This is fast, easy, results in almost zero downtime (the flush and snapshot complete in about 1 second), and produces a 100% perfect copy of the source MySQL instance.

Anybody see any issues with this approach?

1 Like

Hi @geek_prophet .
I’m sorry to hear you gave up on xtrabackup. If you still want to, you can raise a bug at jira.percona.com with a reproducible test case in order for us to investigate it.
Just to clear the expectations (I see you have 3 forums questions on the same subject in a day or two) we do try to be as active and engaged here, but Please note that we do not offer any SLA for community bugs. So regarding xtrabackup I would advise you to go down that road.

Regarding LVM please make sure to test it with significant load such as the one that you will have facing in production. For example:
If an user open a transaction, make some changes to tableA and run another operation that takes time to complete or just doesn’t commit/rollback the transaction, your FTWRL(step 1) will lock part of the tables and will wait for the ones that cannot be locked, this will cause a downtime.

There is a good blog post about this subject that still applies that I indicate you to read - Using LVM for MySQL Backup and Replication Setup - Percona Database Performance Blog

1 Like

Hi Marcelo,

We’d rather not give up on xtrabackup but we’re under time pressure. I’ve been messing with it for a week and can’t get it running reliably, which is why I have been bombarding the forum with questions. I don’t know if the problems are user error (typically true) or if there is something amiss in the code (seems unlikely) but I’m running out of time to figure it out. Like I said earlier, we can run a backup multiple times without a problem, and then suddenly it will start throwing “lsn is in the future” and “you may have a corrupt database” errors.

Regarding LVM, we’ve been using the method I described for years. Backups only kick off at night when user activity is minimal, but there are rare occasions when table locks may cause issues. That’s the main reason why we were eager to switch to xtrabackup. Thanks for the link.

1 Like

Quick clarification. The link states…

  1. Connect to MySQL and run FLUSH TABLES WITH READ LOCK
    Note – this command may take a while to complete if you have long running queries. The catch here is FLUSH TABLES WITH READ LOCK actually waits for all statements to complete, even selects. So be careful if you have any long running queries. If you’re using only Innodb tables and do not need to synchronize binary log position with backup you can skip this step.

Just to be clear, is this saying that if all tables are on InnoDB, and there is no slave in production, then it is not necessary to FTWRL?

1 Like