binlog positions "off" a bit after restore

wdeviers · March 25, 2015, 10:22am

I have a few shards in an application that are approaching 600-800G on-disk, but aren’t heavily used. I’m spinning up new off-site backup & reporting copies of all shards and let four streaming xtrabackup runs go last night. I have a script that I use frequently to clone out new slaves. Two of the shards, with active customer bases, started right up as normal (150-200G). The two shards with larger data sizes appear to have slightly wrong (behind) master coordinates in xtrabackup_slave_info.

So, when I do a streaming backup from an existing slave, at the end I get:
CHANGE MASTER TO MASTER_LOG_FILE=‘mysql-bin.002238’, MASTER_LOG_POS=263779110

overnight, the master moved through to
| mysql-bin.002239 | 30261034 |

So it passes a quick sanity check. Built a CHANGE MASTER with the correct ip/user/etc, fire it off, and start replication. Duplicate key error. Oops!

I verified from the error log that the expected slave statement was issued:
Slave SQL thread initialized, starting replication in log ‘mysql-bin.002238’ at position 263779110, relay log ‘/mysql/binlog/mysqld-relay-bin.000001’ position: 4. I get:

Last_SQL_Error: Error ‘Duplicate entry ‘68407820’ for key ‘PRIMARY’’ on query… This is against a Rails sessions table.
mysql> select id, created_at from sessions where id = ‘68407821’;

±---------±--------------------+
| id | created_at |
±---------±--------------------+
| 68407821 | 2015-03-24 03:54:15 |
±---------±--------------------+

Using mysqlbinlog, I found that insert into the original binlogs. It seems to be halfway through a transaction for thread 841292:

#150325 2:37:43 server id 130161118 end_log_pos 263780113 CRC32 0xa869770f Query thread_id=841292 exec_time=0 error_code=0
SET TIMESTAMP=1427265463/!/;
INSERT INTO sessions (— redacted — )
/!/;

Thus, the xtrabackup_slave_info position should have been at least the next one:

#150325 2:37:43 server id 130161118 end_log_pos 263780693 CRC32 0x5b59698f Query thread_id=841292 exec_time=0 error_code=0

but as noted this appears to be splitting a transaction. So, the entire transaction for thread 841292 was committed to disk (I verified on the restore that the data is correct for the entire transaction) AND data from the next few transactions is present.

Info:

Source Slave:
:~$ dpkg -l|grep percona
ii libperconaserverclient18.1 5.6.19-67.0-618.wheezy amd64 Percona Server database client library
ii libperconaserverclient18.1-dev 5.6.19-67.0-618.wheezy amd64 Percona Server database development files
ii percona-server-client-5.6 5.6.19-67.0-618.wheezy amd64 Percona Server database client binaries
ii percona-server-common-5.6 5.6.19-67.0-618.wheezy amd64 Percona Server database common files (e.g. /etc/mysql/my.cnf)
ii percona-server-server 5.6.19-67.0-618.wheezy amd64 Percona Server database server
ii percona-server-server-5.6 5.6.19-67.0-618.wheezy amd64 Percona Server database server binaries
ii percona-xtrabackup 2.2.9-5067-1.wheezy amd64 Open source backup tool for InnoDB and XtraDB

Destination:
ii libperconaserverclient18.1 5.6.22-71.0-726.wheezy amd64 Percona Server database client library
ii libperconaserverclient18.1-dev 5.6.22-71.0-726.wheezy amd64 Percona Server database development files
ii percona-server-client-5.6 5.6.22-71.0-726.wheezy amd64 Percona Server database client binaries
ii percona-server-common-5.6 5.6.22-71.0-726.wheezy amd64 Percona Server database common files (e.g. /etc/mysql/my.cnf)
ii percona-server-server 5.6.22-71.0-726.wheezy amd64 Percona Server database server
ii percona-server-server-5.6 5.6.22-71.0-726.wheezy amd64 Percona Server database server binaries
ii percona-xtrabackup 2.2.9-5067-1.wheezy amd64 Open source backup tool for InnoDB and XtraDB

I feel like I’m missing something obvious here, like a failed roll-back or something. If it hadn’t happened on 2/4 of the servers overnight, I probably wouldn’t bother posting. What have I done wrong/misunderstood?

Thanks!

Wes

wagnerbianchi · April 4, 2015, 9:18am

What caught up my attention here was the position added to the xtrabackup_slave_info file. it seems that the wrong position caused the harm and when you start mysql with that backupset, it started replicating from a wrong position. So, what’s the command line you’re using to xtrabackup databases/shards? Are you using multi-threaded slaves?

Topic		Replies	Views
Getting the binlog position when backing up from a slave Percona XtraBackup	1	2312	May 29, 2012
binary log and position missing in xtrabackup_slave_info Percona XtraBackup	1	2639	September 10, 2015
new slave from xtrabackup of slave Percona XtraBackup	1	2884	November 25, 2013
Empty binlog_pos in xtrabackup_info? Percona XtraBackup	6	1466	December 20, 2016
Problem with master info when creating a slave from an existing slave with Xtrabackup Percona XtraBackup	4	1441	February 26, 2015

binlog positions "off" a bit after restore

Related topics