binlog positions "off" a bit after restore

I have a few shards in an application that are approaching 600-800G on-disk, but aren’t heavily used. I’m spinning up new off-site backup & reporting copies of all shards and let four streaming xtrabackup runs go last night. I have a script that I use frequently to clone out new slaves. Two of the shards, with active customer bases, started right up as normal (150-200G). The two shards with larger data sizes appear to have slightly wrong (behind) master coordinates in xtrabackup_slave_info.

So, when I do a streaming backup from an existing slave, at the end I get:
CHANGE MASTER TO MASTER_LOG_FILE=‘mysql-bin.002238’, MASTER_LOG_POS=263779110

overnight, the master moved through to
| mysql-bin.002239 | 30261034 |

So it passes a quick sanity check. Built a CHANGE MASTER with the correct ip/user/etc, fire it off, and start replication. Duplicate key error. Oops!

I verified from the error log that the expected slave statement was issued:
Slave SQL thread initialized, starting replication in log ‘mysql-bin.002238’ at position 263779110, relay log ‘/mysql/binlog/mysqld-relay-bin.000001’ position: 4. I get:

Last_SQL_Error: Error ‘Duplicate entry ‘68407820’ for key ‘PRIMARY’’ on query… This is against a Rails sessions table.
mysql> select id, created_at from sessions where id = ‘68407821’;

±---------±--------------------+
| id | created_at |
±---------±--------------------+
| 68407821 | 2015-03-24 03:54:15 |
±---------±--------------------+

Using mysqlbinlog, I found that insert into the original binlogs. It seems to be halfway through a transaction for thread 841292:

#150325 2:37:43 server id 130161118 end_log_pos 263780113 CRC32 0xa869770f Query thread_id=841292 exec_time=0 error_code=0
SET TIMESTAMP=1427265463/!/;
INSERT INTO sessions (— redacted — )
/!/;

Thus, the xtrabackup_slave_info position should have been at least the next one:

#150325 2:37:43 server id 130161118 end_log_pos 263780693 CRC32 0x5b59698f Query thread_id=841292 exec_time=0 error_code=0

but as noted this appears to be splitting a transaction. So, the entire transaction for thread 841292 was committed to disk (I verified on the restore that the data is correct for the entire transaction) AND data from the next few transactions is present.

Info:

Source Slave:
:~$ dpkg -l|grep percona
ii libperconaserverclient18.1 5.6.19-67.0-618.wheezy amd64 Percona Server database client library
ii libperconaserverclient18.1-dev 5.6.19-67.0-618.wheezy amd64 Percona Server database development files
ii percona-server-client-5.6 5.6.19-67.0-618.wheezy amd64 Percona Server database client binaries
ii percona-server-common-5.6 5.6.19-67.0-618.wheezy amd64 Percona Server database common files (e.g. /etc/mysql/my.cnf)
ii percona-server-server 5.6.19-67.0-618.wheezy amd64 Percona Server database server
ii percona-server-server-5.6 5.6.19-67.0-618.wheezy amd64 Percona Server database server binaries
ii percona-xtrabackup 2.2.9-5067-1.wheezy amd64 Open source backup tool for InnoDB and XtraDB

Destination:
ii libperconaserverclient18.1 5.6.22-71.0-726.wheezy amd64 Percona Server database client library
ii libperconaserverclient18.1-dev 5.6.22-71.0-726.wheezy amd64 Percona Server database development files
ii percona-server-client-5.6 5.6.22-71.0-726.wheezy amd64 Percona Server database client binaries
ii percona-server-common-5.6 5.6.22-71.0-726.wheezy amd64 Percona Server database common files (e.g. /etc/mysql/my.cnf)
ii percona-server-server 5.6.22-71.0-726.wheezy amd64 Percona Server database server
ii percona-server-server-5.6 5.6.22-71.0-726.wheezy amd64 Percona Server database server binaries
ii percona-xtrabackup 2.2.9-5067-1.wheezy amd64 Open source backup tool for InnoDB and XtraDB

I feel like I’m missing something obvious here, like a failed roll-back or something. If it hadn’t happened on 2/4 of the servers overnight, I probably wouldn’t bother posting. What have I done wrong/misunderstood?

Thanks!

Wes

What caught up my attention here was the position added to the xtrabackup_slave_info file. it seems that the wrong position caused the harm and when you start mysql with that backupset, it started replicating from a wrong position. So, what’s the command line you’re using to xtrabackup databases/shards? Are you using multi-threaded slaves?