Using Innobackupex to manage Fabric servers

Managing a MySQL Fabric setup requires making new replicants fairly frequently. As I set up for production with a 300G database, mysqldump is not a very practical method for creating replicants because, even on 20 cores & 96GB RAM, it takes basically for ever - about 16 hours - to make a new replicant (not to mention the additional time that gets tacked on to that while the new slave catches up to a master that has been updating aggressively for those 16 hours). On the other hand, if I use innobackupex snapshots to create replicants and then add them to the Fabric group, it only takes me about 1 hour total. Pretty sweet.

The problem I’m having is that when the databases are configured for Fabric (which mostly means with sync-binlog = 1 and GTIDs for everything), I can only create a new slave by taking an innobackupex snapshot from the master. It seems to me that slave snapshots should be functionally identical, and are more desirable for obvious practical reasons.

So here, for jrivera, who generously offered to have a look at the problem when I posted about it on Reddit, are some details:

  1. All Percona servers in this configuration are Linodes of varying sizes set up as Ubuntu 14.04LTS with Percona 72.1 R 0503478.
  2. they all run the same my.cnf (enclosed) with the innodb buffer pool changed to match the particular server, and the server id changed.
  3. I’m taking all snapshots using the same script, enclosed.
  4. When I set up a new server, I simply scp -rp the entire snapshot into a clean /var/lib/mysql, then chown -R mysql:mysql /var/lib/mysql, start the server, and once running I reset master on the new slave and then set gtid_purged to the appropriate value (from xtrabackup_info).

Then, I go to my Fabric controller and add the server. If the snapshot was taken form a slave, I get error 1872: Slave failed to initialize relay log info structure from the repository. On the fabric controller, this error is reported as:
ServerError: Error trying to configure server (0773b58e-f44e-11e4-8143-f23c91185b4c) as slave: Command (START SLAVE , ()) failed accessing (mynewslave:3306). 1872 (HY000): Slave failed to initialize relay log info structure from the repository…

So naturally I try logging on to mynewslave from the Fabric controller with the fabric mysql account, and it works fine. Then I go to mynewslave, repeat the reset master & set grid_purged operation, restart the slave, locally, and get the exact same error.

By contrast, if I do the exact same operation, except taking the innobackupex snapshot from the master, the whole procedure goes off without a hitch:

$mysqlfabric group add mygroup mynewslave
Fabric UUID: 5ca1ab1e-a007-feed-f00d-cab3fe13249e
Time-To-Live: 1

uuid finished success result

52602f60-eae6-42c6-b89a-48377124c75a 1 1 1

state success when description

3 2 1430962750.62 Triggered by < object at 0x7f8037995fd0>.
4 2 1430962750.62 Executing action (_add_server).
5 2 1430962751.72 Executed action (_add_server).

Oh - and, in case it helps, when the new slave fails, this is what its status looks like:
mysql> show slave status\G
*************************** 1. row ***************************
Master_Host: myoldmaster
Master_User: fabric
Master_Port: 3306
Connect_Retry: 60
Read_Master_Log_Pos: 4
Relay_Log_File: relay-bin.000010
Relay_Log_Pos: 603177759
Slave_IO_Running: No
Slave_SQL_Running: No
Last_Errno: 1872
Last_Error: Slave failed to initialize relay log info structure from the repository
Skip_Counter: 0
Exec_Master_Log_Pos: 0
Relay_Log_Space: 0
Until_Condition: None
Until_Log_Pos: 0
Master_SSL_Allowed: No
Seconds_Behind_Master: NULL
Master_SSL_Verify_Server_Cert: No
Last_IO_Errno: 0
Last_SQL_Errno: 1872
Last_SQL_Error: Slave failed to initialize relay log info structure from the repository
Master_Server_Id: 0
Master_UUID: f43a8a6b-c136-11e4-b41c-f23c9133c356
Master_Info_File: mysql.slave_master_info
SQL_Delay: 0
SQL_Remaining_Delay: NULL
Master_Retry_Count: 86400
Last_SQL_Error_Timestamp: 150506 19:18:58
Executed_Gtid_Set: f43a8a6b-c136-11e4-b41c-f23c9133c356:1-150207153
Auto_Position: 1

The attachment tool (paper clip) does not appear to be working in Safari 7.5.1 under OSX 10.9.5, so here’s the relevant configurations in-line:

#========= my.cnf:=========#


port = 3306
socket = /var/lib/mysql/mysql.sock



user = mysql
default-storage-engine = InnoDB
socket = /var/lib/mysql/mysql.sock
pid-file = /var/lib/mysql/


key-buffer-size = 32M
myisam-recover = FORCE,BACKUP


max-allowed-packet = 16M
max-connect-errors = 1000000
sysdate-is-now = 1
innodb = FORCE


datadir = /var/lib/mysql/


log-bin = /var/lib/mysql/mysql-bin
expire-logs-days = 14
sync-binlog = 1


server-id = 33
read-only = 0
#skip-slave-start = 1
log-slave-updates = 1
relay-log = /var/lib/mysql/relay-bin
slave-net-timeout = 60
sync-master-info = 1
sync-relay-log = 1
sync-relay-log-info = 1

#binlog-do-db = serpsalpha
binlog-format = ‘ROW’

#skip-slave-start (=0), binlog-do-db (empty), & binlog-format (ROW) above are changed for Fabric.
#additionally, sync-master-info must be on for Fabric, and the following:
gtid-mode = ‘ON’
enforce-gtid-consistency = ‘ON’
simplified_binlog_gtid_recovery = ‘ON’ #Note: this variable has been changing name a lot. It can fail under special circumstances. See details at
master-info-repository = TABLE
relay-log-info-repository = TABLE
#report-host= might want to set this for fabric
#report-port #no need to set, as it’s reported correctly as of 5.5 or 5.6…


tmp-table-size = 32M
max-heap-table-size = 32M
query-cache-type = 0
query-cache-size = 0
max-connections = 500
thread-cache-size = 50
open-files-limit = 65535
table-definition-cache = 1024
table-open-cache = 55


innodb-flush-method = O_DIRECT
innodb-log-files-in-group = 2
innodb-log-file-size = 512M
innodb-flush-log-at-trx-commit = 2
innodb-file-per-table = 1
innodb-buffer-pool-size = 14G
innodb-rollback-on-timeout = ON
transaction-isolation = READ-COMMITTED
low-priority-updates = 1


log-error = /var/log/mysql/mysql-error.log
log-queries-not-using-indexes = 0
slow-query-log = 1
slow-query-log-file = /var/log/mysql/mysql-slow.log
long-query-time = 10


if [ $SHLVL -eq 1 ]
echo “Please run $0 from within a screen… it just makes sense.” >&2
exit 1

cd /var/backups/mysql/daily_snapshot
echo “Please enter mysql root password:”
read -s p
mysql -h -u root -p$p -ABNe “select NULL;” 2>/dev/null >/dev/null
if [ $? -ne 0 ]
echo “Wrong root password.” >&2
exit 2

echo “Starting at $(date)”
sdate=$(date +%Y-%m-%d_%H)
date | tee -a /var/log/daily_snapshot_$sdate.log

innobackupex --host= --user=root --password=$p
–rsync --safe-slave-backup --slave-info /var/backups/mysql/daily_snapshot
2>&1 | egrep “innobackupex|xtrabackup” | tee -a /var/log/daily_snapshot_$sdate.log

sdir=$(ls -1 | egrep “^$sdate”)
innobackupex --host= --user=root --password=$p
–apply-log /var/backups/mysql/daily_snapshot/$( ls -1 /var/backups/mysql/daily_snapshot/ | grep $sdir)
2>&1 | egrep “innobackupex|xtrabackup” | tee -a /var/log/daily_snapshot_$sdate.log
echo “daily snapshot complete at $(date) after $SECONDS sec” | tee -a /var/log/daily_snapshot_$sdate.log


I finally had the chance to test this out.

On the new slave
a ] start up mysql
b ] reset slave all;
c ] reset master;
d ] SET GLOBAL gtid_purged="<gtid_on_xtrabackup_binlog_info_file>";
f ] start slave; show slave status\G

Make sure to execute “reset slave all”. Try it out!
Reset Slave VS Reset Slave All

Thank you! Problem solved!

Here’s what I found so far:

  • Installing Percona server and innobackupex/xtrabackup on OS X Yosemite is not quite a slam-dunk, but well worth it (for local testing).

  • On a Fabric test set-up under Yosemite running the aforementioned Percona (5.6.23-72.1-log compiled in Mavericks - doesn’t show any version comment, though… ??), the original method works fine.
    that is: configure some mysql instances & manage them from a fabric, then get a snapshot with innobackupex & install it on a clean set-up, and add this set-up to the Fabric - works fine (with just reset master followed by set global grid_purged).

  • your suggested method works fine too - preceding reset master with reset slave all.

  • On Ubuntu 14.04LTS running Percona 5.6.23-72.1-log - restoring the snapshot from a slave works correctly when I add the rest slave all.

So you figured something out, and it runs a little differently on fresh compiles in different OSes - can you give me some insight or point me to the right docs to read to understand why you suggested that change - what makes it work?

Here’s the blog from a colleague explaining about RESET SLAVE vs RESET SLAVE ALL

And from MySQL Docs

With regards to differences on OS, I can’t test with OS X but I have tested (previously) that RESET SLAVE ALL is needed across all Linux OS’s.