rsync went wrong, but server started clean, now what?

Hello,

I was moving a mysql datadir from one VM to another during an OS upgrade and I believe my rsync command did not copy the ib_logfile0 and ib_logfile1 files (hereon referred to as “the ib_logfiles”, but I still mean both). I use Xtrabackup for scheduled backups, but for this I wanted to get the entire datadir w/ binlogs in one pass so I’d stopped the server and rsync’ed.

The datadir in its home on the new VM is running fine and not putting anything in the error.log file, but the problem is that now when Xtrabackup runs I get this during the applylog step:


150115 23:00:10 innobackupex: Starting the apply-log operation

IMPORTANT: Please check that the apply-log run completes successfully.
At the end of a successful apply-log run innobackupex
prints "completed OK!".


150115 23:00:10 innobackupex: Starting ibbackup with command: xtrabackup --defaults-file="/opt/maginot/backup/dbnode/dbdir/backup-my.cnf" --defaults-group="mysqld" --prepare --target-dir=/opt/maginot/backup/dbnode/dbdir

xtrabackup version 2.2.8 based on MySQL server 5.6.22 Linux (x86_64) (revision id: )
xtrabackup: cd to /opt/maginot/backup/dbnode/dbdir
xtrabackup: This target seems to be not prepared yet.
xtrabackup: xtrabackup_logfile detected: size=2097152, start_lsn=(607055496)
xtrabackup: using the following InnoDB configuration for recovery:
xtrabackup: innodb_data_home_dir = ./
xtrabackup: innodb_data_file_path = ibdata1:12M:autoextend
xtrabackup: innodb_log_group_home_dir = ./
xtrabackup: innodb_log_files_in_group = 1
xtrabackup: innodb_log_file_size = 2097152
xtrabackup: using the following InnoDB configuration for recovery:
xtrabackup: innodb_data_home_dir = ./
xtrabackup: innodb_data_file_path = ibdata1:12M:autoextend
xtrabackup: innodb_log_group_home_dir = ./
xtrabackup: innodb_log_files_in_group = 1
xtrabackup: innodb_log_file_size = 2097152
xtrabackup: Starting InnoDB instance for recovery.
xtrabackup: Using 104857600 bytes for buffer pool (set by --use-memory parameter)
InnoDB: Using atomics to ref count buffer pool pages
InnoDB: The InnoDB memory heap is disabled
InnoDB: Mutexes and rw_locks use GCC atomic builtins
InnoDB: Memory barrier is not used
InnoDB: Compressed tables use zlib 1.2.8
InnoDB: Using CPU crc32 instructions
InnoDB: Initializing buffer pool, size = 100.0M
InnoDB: Completed initialization of buffer pool
InnoDB: Highest supported file format is Barracuda.
InnoDB: 128 rollback segment(s) are active.
InnoDB: Waiting for purge to start
2015-01-15 23:00:10 7f035221e700 InnoDB: Error: page 1 log sequence number 607758329
InnoDB: is in the future! Current system log sequence number 607056925.
InnoDB: Your database may be corrupt or you may have copied the InnoDB
InnoDB: tablespace but not the InnoDB log files. See
InnoDB: http://dev.mysql.com/doc/refman/5.6/en/forcing-innodb-recovery.html
InnoDB: for more information.
2015-01-15 23:00:10 7f035221e700 InnoDB: Error: page 97 log sequence number 609499062
InnoDB: is in the future! Current system log sequence number 607056925.
InnoDB: Your database may be corrupt or you may have copied the InnoDB
InnoDB: tablespace but not the InnoDB log files. See
InnoDB: http://dev.mysql.com/doc/refman/5.6/en/forcing-innodb-recovery.html
InnoDB: for more information.
2015-01-15 23:00:10 7f035221e700 InnoDB: Error: page 39 log sequence number 607758329
InnoDB: is in the future! Current system log sequence number 607056925.
InnoDB: Your database may be corrupt or you may have copied the InnoDB
InnoDB: tablespace but not the InnoDB log files. See
InnoDB: http://dev.mysql.com/doc/refman/5.6/en/forcing-innodb-recovery.html
InnoDB: for more information.
InnoDB: 5.6.22 started; log sequence number 607055496
[B]2015-01-15 23:00:10 7f035fcb3780 InnoDB: Error: page 0 log sequence number 607859942
InnoDB: is in the future! Current system log sequence number 607056925.
InnoDB: Your database may be corrupt or you may have copied the InnoDB
InnoDB: tablespace but not the InnoDB log files. See
InnoDB: http://dev.mysql.com/doc/refman/5.6/en/forcing-innodb-recovery.html
InnoDB: for more information.[/B]

[B] - about 15 more like these -[/B]

2015-01-15 23:00:10 7f035fcb3780 InnoDB: Error: page 0 log sequence number 607475883
InnoDB: is in the future! Current system log sequence number 607056925.
InnoDB: Your database may be corrupt or you may have copied the InnoDB
InnoDB: tablespace but not the InnoDB log files. See
InnoDB: http://dev.mysql.com/doc/refman/5.6/en/forcing-innodb-recovery.html
InnoDB: for more information.

[notice (again)]
If you use binary log and don't use any hack of group commit,
the binary log position seems to be:
InnoDB: Last MySQL binlog file position 0 911392, file name mysql-bin.001157

xtrabackup: starting shutdown with innodb_fast_shutdown = 1
InnoDB: FTS optimize thread exiting.
InnoDB: Starting shutdown...
InnoDB: Shutdown completed; log sequence number 607059834

[B] - starts second pass -[/B]

Trying to use the resulting backup on another test host results in similar InnoDB errors to those shown above and the processes of sessions crashing out. The clients error out with “the mysql server went away”.

I have also tried taking a filesystem-level snapshot of the datadir and moving it to the test host. When I start the server after doing that, I get the same “sequence number is in the future” errors in error.log; presumably it is trying to do crash-recovery and failing?.

Below I’ve copy-pasted the commands I used to copy the datadir from the old VM to the new VM by way of an intermediary host I’ll call opsVM:


root@oldVM:~# service stop mysql
-switch host-
root@opsVM:~# mkdir hmmfoo
root@opsVM:~# rsync -avz --rsync-path='sudo rsync' pordeploy@oldVM:/var/lib/mysql hmmfoo/ -e 'ssh -i /path/to/key'
root@opsVM:~# ls hmmfoo/
-switch host-
root@newVM:~# service mysql stop
-switch host-
root@opsVM:~# rsync -avz --rsync-path='sudo rsync' hmmfoo/mysql/ pordeploy@newVM:/var/lib/mysql/ -e 'ssh -i /path/to/key'
-switch host-
root@newVM:~# service mysql start

So my theory is that, because the destination server was stopped after the source server, its ib_logfiles had a newer modtime and the same size as the source’s. Because I did not include the -c option to rsync…
[URL]http://linux.die.net/man/1/rsync[/URL]


-c, --checksum skip based on checksum, not mod-time & size

…rsync skipped the ib_logfiles. There is only one DB on this server, for an internal production app. The destination host, newVM, had been built using the same automated process as the oldVM was; so even though the newVM copy was empty of data before I rsync’ed it had an identical my.cnf and an essentially-identical DB schema and on-disk layout in /var/lib/mysql. That said… I find it highly unlikely that any other file in the datadir would have been the exact same size even if it wasn’t entirely empty…

So…
[LIST=1]
[]I feel it is relatively safe to assume no other files were affected in this way.
[
]I’m not sure how to prove my theory (going to try and break another test VM the same way this morning).
[*]If the ib_logfiles were in fact skipped, I don’t know if I can simply remove them with the server stopped and then restart it so it can create blank ones.
[/LIST]
I hesitate on the 3rd point because this is now in production and, for now, working.

Thank you for reading, and thanks in advance if anyone has an insights here,
Mark

Okay. Further testing and refreshing myself on rsync’s operation suggests that since I did not pass the -u flag:
[URL]http://linux.die.net/man/1/rsync[/URL]


-u, --update skip files that are newer on the receiver

rsync would have still (correctly) copied over top of the ib_logfiles, even if the “old” ones on the newVM had a newer modtime.

Given that, I am now unsure what exactly transpired to get into this state.

Thank you,
Mark

/etc/mysql/debian.cnf got hosed. When I “stopped” mysql to rsync it didn’t stop and got rsynced over while it was running. It had JUST been rsynced with the same data without any binlogs before it had started up, so none of the tablespace got touched on-disk (this was all that saved it from being SO much worse).

Was able to run up our binlogs on an old good backup, sqldump that, and confirm that a sqldump out of the still-running instance matched.

Re-deployed to a new server.