Can't get xtrabackup SST working

Hello,

I’m trying to set up Percona XtraDB Cluster but the xtrabackup state transfer seems to not be working.

The version I use is from the generic binary, Percona-XtraDB-Cluster-5.5.24-23.6.342.Linux.x86_64.
The system is Linux 3.3.1-gentoo x86_64.
xtrabackup is from the Gentoo’s ebuild repositories:

xtrabackup version 2.0.0 for Percona Server 5.5.16 Linux (x86_64) (revision id: undefined)

innobackupex is from the generic binary since it isn’t installed with xtrabackup in Gentoo:

InnoDB Backup Utility v1.5.1-xtrabackup; Copyright 2003, 2009 Innobase Oyand Percona Inc 2009-2012. All Rights Reserved.

I’ve had to modify the wsrep_sst_xtrabackup script so it calls the Gentoo’s nc like this: nc -l -p .

The Cluster is installed in /opt/mysql and the startup script has the following in it:

export PATH=/opt/mysql/bin:$PATH

so that the wsrep_sst_xtrabackup script is found by mysqld.

So, I have two servers:
web28, 192.168.0.28
and web90, 192.168.0.93.

The config goes like this:
web28:

[mysqld]port = 10001bind = 192.168.0.29wsrep_provider = /opt/mysql/lib/libgalera_smm.sowsrep_provider_options = "gmcast.listen_addr=tcp://192.168.0.29:10101"wsrep_sst_method = xtrabackupwsrep_sst_receive_address = 192.168.0.29:10151innodb_autoinc_lock_mode = 2innodb_locks_unsafe_for_binlog = 1max_binlog_size = 4Mlog-bin = /mysql-bin/galera1/mysql-bin.000000log-bin-index = /mysql-bin/galera1/mysql-bin.indexserver-id = 1binlog_format = ROW

web90:

[mysqld]port = 10002bind = 192.168.0.93wsrep_provider = /opt/mysql/lib/libgalera_smm.sowsrep_cluster_address = "gcomm://192.168.0.29:10101"wsrep_provider_options = “gmcast.listen_addr=tcp://192.168.0.93:10102”;wsrep_sst_method = xtrabackupwsrep_sst_receive_address = 192.168.0.93:10152innodb_autoinc_lock_mode = 2innodb_locks_unsafe_for_binlog = 1max_binlog_size = 4Mlog-bin = /mysql-bin/galera2/mysql-bin.000000log-bin-index = /mysql-bin/galera2/mysql-bin.indexserver-id = 1binlog_format = ROW

On web28, I start one instance and make it a cluster with
SET GLOBAL wsrep_cluster_address=‘gcomm://’
.
On web90, I start the second instance which has
wsrep_cluster_address = “gcomm://192.168.0.29:10101”
in the config file (because I had problems bringing the instance to the cluster in runtime with SET GLOBAL).

The second node seems to plug to the cluster ok. Then, it fires up nc which receives nothing from the first node. And I can’t find any traces of transfer activity on the first node: the wsrep_sst_xtrabackup script is not run, sst.log is not written, even the mysql.err log doesn’t mention the script.

The first node’s error log goes like this:

120710 17:35:37 [Note] WSREP: Read nil XID from storage engines, skipping position init120710 17:35:37 [Note] WSREP: wsrep_load(): loading provider library '/opt/mysql/lib/libgalera_smm.so’120710 17:35:37 [Note] WSREP: wsrep_load(): Galera 2.1(r113) by Codership Oy <info@codership.com> loaded succesfully.120710 17:35:37 [Note] WSREP: Found saved state: 00000000-0000-0000-0000-000000000000:-1120710 17:35:37 [Note] WSREP: Reusing existing ‘/var/db/mysql/galera1//galera.cache’.120710 17:35:37 [Note] WSREP: Passing config to GCS: base_host = xx.xxx.xxx.xx; gcache.dir = /var/db/mysql/galera1/; gcache.keep_pages_size = 0; gcache.mem_size = 0; gcache.name = /var/db/mysql/galera1//galera.cache; gcache.page_size = 128M; gcache.size = 128M; gcs.fc_debug = 0; gcs.fc_factor = 0.5; gcs.fc_limit = 16; gcs.fc_master_slave = NO; gcs.max_packet_size = 64500; gcs.max_throttle = 0.25; gcs.recv_q_hard_limit = 9223372036854775807; gcs.recv_q_soft_limit = 0.25; gcs.sync_donor = NO; gmcast.listen_addr = tcp://192.168.0.29:10101; replicator.causal_read_timeout = PT30S; replicator.commit_order = 3120710 17:35:37 [Note] WSREP: Assign initial position for certification: -1, protocol version: -1120710 17:35:37 [Note] Plugin ‘FEDERATED’ is disabled.120710 17:35:37 InnoDB: The InnoDB memory heap is disabled120710 17:35:37 InnoDB: Mutexes and rw_locks use GCC atomic builtins120710 17:35:37 InnoDB: Compressed tables use zlib 1.2.3120710 17:35:37 InnoDB: Using Linux native AIO120710 17:35:37 InnoDB: Initializing buffer pool, size = 2.2G120710 17:35:37 InnoDB: Completed initialization of buffer pool120710 17:35:37 InnoDB: highest supported file format is Barracuda.120710 17:35:37 InnoDB: Waiting for the background threads to start120710 17:35:38 Percona XtraDB (http://www.percona.com) 1.1.8-rel25.3 started; log sequence number 1597938120710 17:35:38 [Note] Recovering after a crash using /mysql-bin/galera1/mysql-bin120710 17:35:38 [Note] Starting crash recovery…120710 17:35:38 [Note] Crash recovery finished.120710 17:35:38 [Note] Server hostname (bind-address): ‘192.168.0.29’; port: 10001120710 17:35:38 [Note] - ‘192.168.0.29’ resolves to ‘192.168.0.29’;120710 17:35:38 [Note] Server socket created on IP: ‘192.168.0.29’.120710 17:35:38 [Note] Event Scheduler: Loaded 0 events120710 17:35:38 [Note] /opt/mysql/bin/mysqld: ready for connections.Version: ‘5.5.24-23.6-log’ socket: ‘/var/db/mysql/galera1/mysql.sock’ port: 10001 Percona XtraDB Cluster (GPL) 5.5.24-23.6, Revision 342, wsrep_23.6.r342120710 17:35:45 [Note] WSREP: Stop replication120710 17:35:47 [Note] WSREP: Start replication120710 17:35:47 [Note] WSREP: Setting initial position to 00000000-0000-0000-0000-000000000000:-1120710 17:35:47 [Note] WSREP: protonet asio version 0120710 17:35:47 [Note] WSREP: backend: asio120710 17:35:47 [Note] WSREP: GMCast version 0120710 17:35:47 [Note] WSREP: (286ac061-ca94-11e1-0800-4872d6f9a2d8, ‘tcp://192.168.0.29:10101’) listening at tcp://192.168.0.29:10101120710 17:35:47 [Note] WSREP: (286ac061-ca94-11e1-0800-4872d6f9a2d8, ‘tcp://192.168.0.29:10101’) multicast: , ttl: 1120710 17:35:47 [Note] WSREP: EVS version 0120710 17:35:47 [Note] WSREP: PC version 0120710 17:35:47 [Note] WSREP: gcomm: connecting to group ‘my_wsrep_cluster’, peer ''120710 17:35:47 [Note] WSREP: view(view_id(PRIM,286ac061-ca94-11e1-0800-4872d6f9a2d8,1) memb { 286ac061-ca94-11e1-0800-4872d6f9a2d8,} joined {} left {} partitioned {})120710 17:35:47 [Note] WSREP: gcomm: connected120710 17:35:47 [Note] WSREP: Changing maximum packet size to 64500, resulting msg size: 32636120710 17:35:47 [Note] WSREP: Shifting CLOSED → OPEN (TO: 0)120710 17:35:47 [Note] WSREP: Opened channel 'my_wsrep_cluster’120710 17:35:47 [Note] WSREP: New COMPONENT: primary = yes, bootstrap = no, my_idx = 0, memb_num = 1120710 17:35:47 [Note] WSREP: Starting new group from scratch: 286b1abc-ca94-11e1-0800-af44ddd69c3b120710 17:35:47 [Note] WSREP: STATE_EXCHANGE: sent state UUID: 286b28df-ca94-11e1-0800-df69e22a5b35120710 17:35:47 [Note] WSREP: STATE EXCHANGE: sent state msg: 286b28df-ca94-11e1-0800-df69e22a5b35120710 17:35:47 [Note] WSREP: STATE EXCHANGE: got state msg: 286b28df-ca94-11e1-0800-df69e22a5b35 from 0 (web28)120710 17:35:47 [Note] WSREP: Quorum results: version = 2, component = PRIMARY, conf_id = 0, members = 1/1 (joined/total), act_id = 0, last_appl. = -1, protocols = 0/4/2 (gcs/repl/appl), group UUID = 286b1abc-ca94-11e1-0800-af44ddd69c3b120710 17:35:47 [Note] WSREP: Flow-control interval: [8, 16]120710 17:35:47 [Note] WSREP: Restored state OPEN → JOINED (0)120710 17:35:47 [Note] WSREP: Member 0 (web28) synced with group.120710 17:35:47 [Note] WSREP: Shifting JOINED → SYNCED (TO: 0)120710 17:35:47 [Note] WSREP: New cluster view: global state: 286b1abc-ca94-11e1-0800-af44ddd69c3b:0, view# 1: Primary, number of nodes: 1, my index: 0, protocol version 2120710 17:36:15 [Note] WSREP: declaring 38659211-ca94-11e1-0800-4aa997371ac8 stable120710 17:36:15 [Note] WSREP: view(view_id(PRIM,286ac061-ca94-11e1-0800-4872d6f9a2d8,2) memb { 286ac061-ca94-11e1-0800-4872d6f9a2d8, 38659211-ca94-11e1-0800-4aa997371ac8,} joined {} left {} partitioned {})120710 17:36:15 [Note] WSREP: New COMPONENT: primary = yes, bootstrap = no, my_idx = 0, memb_num = 2120710 17:36:15 [Note] WSREP: STATE_EXCHANGE: sent state UUID: 38ae23a6-ca94-11e1-0800-32a24d285f26120710 17:36:15 [Note] WSREP: STATE EXCHANGE: sent state msg: 38ae23a6-ca94-11e1-0800-32a24d285f26120710 17:36:15 [Note] WSREP: STATE EXCHANGE: got state msg: 38ae23a6-ca94-11e1-0800-32a24d285f26 from 0 (web28)120710 17:36:15 [Note] WSREP: STATE EXCHANGE: got state msg: 38ae23a6-ca94-11e1-0800-32a24d285f26 from 1 (web90)120710 17:36:15 [Note] WSREP: Quorum results: version = 2, component = PRIMARY, conf_id = 1, members = 1/2 (joined/total), act_id = 0, last_appl. = 0, protocols = 0/4/2 (gcs/repl/appl), group UUID = 286b1abc-ca94-11e1-0800-af44ddd69c3b120710 17:36:15 [Note] WSREP: Flow-control interval: [12, 23]120710 17:36:17 [Note] WSREP: Node 1 (web90) requested state transfer from ‘any’. Selected 0 (web28)(SYNCED) as donor.120710 17:36:17 [Note] WSREP: Shifting SYNCED → DONOR/DESYNCED (TO: 0)

And the second’s is like:

120710 17:36:14 [Note] WSREP: Read nil XID from storage engines, skipping position init120710 17:36:14 [Note] WSREP: wsrep_load(): loading provider library '/opt/mysql/lib/libgalera_smm.so’120710 17:36:14 [Note] WSREP: wsrep_load(): Galera 2.1(r113) by Codership Oy <info@codership.com> loaded succesfully.120710 17:36:14 [Note] WSREP: Found saved state: 00000000-0000-0000-0000-000000000000:-1120710 17:36:14 [Note] WSREP: Reusing existing ‘/var/db/mysql/galera2//galera.cache’.120710 17:36:14 [Note] WSREP: Passing config to GCS: “gmcast.listen_addr = tcp://192.168.0.93:10102”; base_host = xx.xxx.xxx.xx; gcache.dir = /var/db/mysql/galera2/; gcache.keep_pages_size = 0; gcache.mem_size = 0; gcache.name = /var/db/mysql/galera2//galera.cache; gcache.page_size = 128M; gcache.size = 128M; gcs.fc_debug = 0; gcs.fc_factor = 0.5; gcs.fc_limit = 16; gcs.fc_master_slave = NO; gcs.max_packet_size = 64500; gcs.max_throttle = 0.25; gcs.recv_q_hard_limit = 9223372036854775807; gcs.recv_q_soft_limit = 0.25; gcs.sync_donor = NO; replicator.causal_read_timeout = PT30S; replicator.commit_order = 3120710 17:36:14 [Note] WSREP: Assign initial position for certification: -1, protocol version: -1120710 17:36:14 [Note] WSREP: wsrep_sst_grab()120710 17:36:14 [Note] WSREP: Start replication120710 17:36:14 [Note] WSREP: Setting initial position to 00000000-0000-0000-0000-000000000000:-1120710 17:36:14 [Note] WSREP: protonet asio version 0120710 17:36:14 [Note] WSREP: backend: asio120710 17:36:14 [Note] WSREP: GMCast version 0120710 17:36:14 [Note] WSREP: (38659211-ca94-11e1-0800-4aa997371ac8, ‘tcp://0.0.0.0:10101’) listening at tcp://0.0.0.0:10101120710 17:36:14 [Note] WSREP: (38659211-ca94-11e1-0800-4aa997371ac8, ‘tcp://0.0.0.0:10101’) multicast: , ttl: 1120710 17:36:14 [Note] WSREP: EVS version 0120710 17:36:14 [Note] WSREP: PC version 0120710 17:36:14 [Note] WSREP: gcomm: connecting to group ‘my_wsrep_cluster’, peer '192.168.0.29:10101’120710 17:36:15 [Note] WSREP: declaring 286ac061-ca94-11e1-0800-4872d6f9a2d8 stable120710 17:36:15 [Note] WSREP: view(view_id(PRIM,286ac061-ca94-11e1-0800-4872d6f9a2d8,2) memb { 286ac061-ca94-11e1-0800-4872d6f9a2d8, 38659211-ca94-11e1-0800-4aa997371ac8,} joined {} left {} partitioned {})120710 17:36:15 [Note] WSREP: gcomm: connected120710 17:36:15 [Note] WSREP: Changing maximum packet size to 64500, resulting msg size: 32636120710 17:36:15 [Note] WSREP: Shifting CLOSED → OPEN (TO: 0)120710 17:36:15 [Note] WSREP: Opened channel 'my_wsrep_cluster’120710 17:36:15 [Note] WSREP: New COMPONENT: primary = yes, bootstrap = no, my_idx = 1, memb_num = 2120710 17:36:15 [Note] WSREP: STATE EXCHANGE: Waiting for state UUID.120710 17:36:15 [Note] WSREP: Waiting for SST to complete.120710 17:36:15 [Note] WSREP: STATE EXCHANGE: sent state msg: 38ae23a6-ca94-11e1-0800-32a24d285f26120710 17:36:15 [Note] WSREP: STATE EXCHANGE: got state msg: 38ae23a6-ca94-11e1-0800-32a24d285f26 from 0 (web28)120710 17:36:15 [Note] WSREP: STATE EXCHANGE: got state msg: 38ae23a6-ca94-11e1-0800-32a24d285f26 from 1 (web90)120710 17:36:15 [Note] WSREP: Quorum results: version = 2, component = PRIMARY, conf_id = 1, members = 1/2 (joined/total), act_id = 0, last_appl. = -1, protocols = 0/4/2 (gcs/repl/appl), group UUID = 286b1abc-ca94-11e1-0800-af44ddd69c3b120710 17:36:15 [Note] WSREP: Flow-control interval: [12, 23]120710 17:36:15 [Note] WSREP: Shifting OPEN → PRIMARY (TO: 0)120710 17:36:15 [Note] WSREP: State transfer required: Group state: 286b1abc-ca94-11e1-0800-af44ddd69c3b:0 Local state: 00000000-0000-0000-0000-000000000000:-1120710 17:36:15 [Note] WSREP: New cluster view: global state: 286b1abc-ca94-11e1-0800-af44ddd69c3b:0, view# 2: Primary, number of nodes: 2, my index: 1, protocol version 2120710 17:36:15 [Warning] WSREP: Gap in state sequence. Need state transfer.120710 17:36:17 [Note] WSREP: Running: 'wsrep_sst_xtrabackup ‘joiner’ ‘192.168.0.93:10152’ ‘’ ‘/var/db/mysql/galera2/’ ‘/etc/mysql/galera2.cnf’ ‘2944’ 2>sst.err’120710 17:36:17 [Note] WSREP: Prepared SST request: xtrabackup|192.168.0.93:10152/xtrabackup_sst120710 17:36:17 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.120710 17:36:17 [Note] WSREP: Assign initial position for certification: 0, protocol version: 2120710 17:36:17 [Warning] WSREP: Failed to prepare for incremental state transfer: Local state UUID (00000000-0000-0000-0000-000000000000) does not match group state UUID (286b1abc-ca94-11e1-0800-af44ddd69c3b): 1 (Operation not permitted) at galera/src/replicator_str.cpp:prepare_for_IST():439. IST will be unavailable.120710 17:36:17 [Note] WSREP: Node 1 (web90) requested state transfer from ‘any’. Selected 0 (web28)(SYNCED) as donor.120710 17:36:17 [Note] WSREP: Shifting PRIMARY → JOINER (TO: 0)120710 17:36:17 [Note] WSREP: Requesting state transfer: success, donor: 0

At this point, nc is running on the second node but receiving nothing. I have to kill it with kill -9, and after that, the second mysql instance goes down since the transfer failed. The sst.log on the second node is ok up to this moment.

I must mention that the first mysql instance can’t shutdown cleanly after the failed transfer, I have to kill it with kill -9 too. That’s the reason for the “crash recovery” in the log. Don’t know if this problem is connected to the transfer problem in any way.

If I don’t kill the first instance and start the second one for another time, the result is different:

120710 16:44:20 [Warning] WSREP: Gap in state sequence. Need state transfer.120710 16:44:22 [Note] WSREP: Running: 'wsrep_sst_xtrabackup ‘joiner’ ‘192.168.0.93:10152’ ‘’ ‘/var/db/mysql/galera2/’ ‘/etc/mysql/galera2.cnf’ ‘1517’ 2>sst.err’120710 16:44:22 [Note] WSREP: Prepared SST request: xtrabackup|192.168.0.93:10152/xtrabackup_sst120710 16:44:22 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.120710 16:44:22 [Note] WSREP: Assign initial position for certification: 0, protocol version: 2120710 16:44:22 [Warning] WSREP: Failed to prepare for incremental state transfer: Local state UUID (00000000-0000-0000-0000-000000000000) does not match group state UUID (f04716a1-ca8a-11e1-0800-83a47ff7bdc5): 1 (Operation not permitted) at galera/src/replicator_str.cpp:prepare_for_IST():439. IST will be unavailable.120710 16:44:22 [Note] WSREP: Requesting state transfer failed: -11(Resource temporarily unavailable). Will keep retrying every 1 second(s)120710 17:01:18 [ERROR] WSREP: Process completed with error: wsrep_sst_xtrabackup ‘joiner’ ‘192.168.0.93:10152’ ‘’ ‘/var/db/mysql/galera2/’ ‘/etc/mysql/galera2.cnf’ ‘1517’ 2>sst.err: 32 (Broken pipe)120710 17:01:18 [ERROR] WSREP: Failed to read uuid:seqno from joiner script.120710 17:01:18 [ERROR] WSREP: SST failed: 32 (Broken pipe)120710 17:01:18 [ERROR] Aborting

Don’t know if it is related to nc not receiving the transfer. Would be glad to solve the transfer problem first.

I suspect this is something to do with binary compatibility. Try to recompile both mysql and galera locally.

I am seeing nearly identical behavior on CentOS 6.3 with XtraDB Cluster 5.5.24 on x86_64. I am testing using VirtualBox.

My initial build of the cluster appeared to be correct, all nodes running. I performed a clean shutdown of all 3 nodes to reboot my VM box and the powered each one back on starting with node 1. Node 1 started correctly and became available. However when node 2 is started it appears to join the cluster correctly but the SST fails with no data going to node 2. The MySQL service on node 2 never actually finishes coming up or is available as the .pid file is never written, so MySQL has to be forcefully killed and restarted or the system rebooted. Also, since node 1 becomes the donor the entire system becomes unavailable.

Here is some more data, hopefully it helps troubleshoot the problem as I see our problems to be related.

Logs from node 1 after restart:

120731 11:23:30 mysqld_safe Starting mysqld daemon with databases from /var/lib/mysql120731 11:23:30 [Note] Flashcache bypass: disabled120731 11:23:30 [Note] Flashcache setup error is : ioctl failed120731 11:23:30 [Note] WSREP: Read nil XID from storage engines, skipping position init120731 11:23:30 [Note] WSREP: wsrep_load(): loading provider library '/usr/lib64/libgalera_smm.so’120731 11:23:31 [Note] WSREP: wsrep_load(): Galera 2.1(r113) by Codership Oy <info@codership.com> loaded succesfully.120731 11:23:31 [Note] WSREP: Found saved state: 28caa7c4-da8b-11e1-0800-4250fd8b1b8f:4120731 11:23:31 [Note] WSREP: Reusing existing ‘/var/lib/mysql//galera.cache’.120731 11:23:31 [Note] WSREP: Passing config to GCS: base_host = 192.168.56.51; gcache.dir = /var/lib/mysql/; gcache.keep_pages_size = 0; gcache.mem_size = 0; gcache.name = /var/lib/mysql//galera.cache; gcache.page_size = 128M; gcache.size = 128M; gcs.fc_debug = 0; gcs.fc_factor = 0.5; gcs.fc_limit = 16; gcs.fc_master_slave = NO; gcs.max_packet_size = 64500; gcs.max_throttle = 0.25; gcs.recv_q_hard_limit = 9223372036854775807; gcs.recv_q_soft_limit = 0.25; gcs.sync_donor = NO; replicator.causal_read_timeout = PT30S; replicator.commit_order = 3120731 11:23:31 [Note] WSREP: Assign initial position for certification: 4, protocol version: -1120731 11:23:31 [Note] WSREP: wsrep_sst_grab()120731 11:23:31 [Note] WSREP: Start replication120731 11:23:31 [Note] WSREP: Setting initial position to 28caa7c4-da8b-11e1-0800-4250fd8b1b8f:4120731 11:23:31 [Note] WSREP: protonet asio version 0120731 11:23:31 [Note] WSREP: backend: asio120731 11:23:31 [Note] WSREP: GMCast version 0120731 11:23:31 [Note] WSREP: (72fe223a-db34-11e1-0800-75f38fd9825f, ‘tcp://0.0.0.0:4567’) listening at tcp://0.0.0.0:4567120731 11:23:31 [Note] WSREP: (72fe223a-db34-11e1-0800-75f38fd9825f, ‘tcp://0.0.0.0:4567’) multicast: , ttl: 1120731 11:23:31 [Note] WSREP: EVS version 0120731 11:23:31 [Note] WSREP: PC version 0120731 11:23:31 [Note] WSREP: gcomm: connecting to group ‘mssptest’, peer ''120731 11:23:31 [Note] WSREP: view(view_id(PRIM,72fe223a-db34-11e1-0800-75f38fd9825f,1) memb { 72fe223a-db34-11e1-0800-75f38fd9825f,} joined {} left {} partitioned {})120731 11:23:31 [Note] WSREP: gcomm: connected120731 11:23:31 [Note] WSREP: Changing maximum packet size to 64500, resulting msg size: 32636120731 11:23:31 [Note] WSREP: Shifting CLOSED → OPEN (TO: 0)120731 11:23:31 [Note] WSREP: Opened channel 'mssptest’120731 11:23:31 [Note] WSREP: Waiting for SST to complete.120731 11:23:31 [Note] WSREP: New COMPONENT: primary = yes, bootstrap = no, my_idx = 0, memb_num = 1120731 11:23:31 [Note] WSREP: STATE_EXCHANGE: sent state UUID: 72fecb05-db34-11e1-0800-14c5617b370a120731 11:23:31 [Note] WSREP: STATE EXCHANGE: sent state msg: 72fecb05-db34-11e1-0800-14c5617b370a120731 11:23:31 [Note] WSREP: STATE EXCHANGE: got state msg: 72fecb05-db34-11e1-0800-14c5617b370a from 0 (test-data1)120731 11:23:31 [Note] WSREP: Quorum results: version = 2, component = PRIMARY, conf_id = 0, members = 1/1 (joined/total), act_id = 4, last_appl. = -1, protocols = 0/4/2 (gcs/repl/appl), group UUID = 28caa7c4-da8b-11e1-0800-4250fd8b1b8f120731 11:23:31 [Note] WSREP: Flow-control interval: [8, 16]120731 11:23:31 [Note] WSREP: Restored state OPEN → JOINED (4)120731 11:23:31 [Note] WSREP: Member 0 (test-data1) synced with group.120731 11:23:31 [Note] WSREP: Shifting JOINED → SYNCED (TO: 4)120731 11:23:31 [Note] WSREP: New cluster view: global state: 28caa7c4-da8b-11e1-0800-4250fd8b1b8f:4, view# 1: Primary, number of nodes: 1, my index: 0, protocol version 2120731 11:23:31 [Note] WSREP: SST complete, seqno: 4120731 11:23:31 [Note] Plugin ‘FEDERATED’ is disabled.120731 11:23:31 InnoDB: The InnoDB memory heap is disabled120731 11:23:31 InnoDB: Mutexes and rw_locks use GCC atomic builtins120731 11:23:31 InnoDB: Compressed tables use zlib 1.2.3120731 11:23:31 InnoDB: Using Linux native AIO120731 11:23:31 InnoDB: Initializing buffer pool, size = 400.0M120731 11:23:31 InnoDB: Completed initialization of buffer pool120731 11:23:31 InnoDB: highest supported file format is Barracuda.120731 11:23:31 InnoDB: Waiting for the background threads to start120731 11:23:32 Percona XtraDB (http://www.percona.com) 1.1.8-rel25.3 started; log sequence number 1598911120731 11:23:32 [Note] Server hostname (bind-address): ‘192.168.56.51’; port: 3306120731 11:23:32 [Note] - ‘192.168.56.51’ resolves to ‘192.168.56.51’;120731 11:23:32 [Note] Server socket created on IP: ‘192.168.56.51’.120731 11:23:32 [Note] Event Scheduler: Loaded 0 events120731 11:23:32 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.120731 11:23:32 [Note] WSREP: Assign initial position for certification: 4, protocol version: 2120731 11:23:32 [Note] WSREP: Synchronized with group, ready for connections120731 11:23:32 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.120731 11:23:32 [Note] /usr/sbin/mysqld: ready for connections.Version: ‘5.5.24-log’ socket: ‘/var/lib/mysql/mysql.sock’ port: 3306 Percona XtraDB Cluster (GPL), wsrep_23.6.r340120731 11:31:44 [Note] WSREP: declaring 9889ff28-db35-11e1-0800-3fd5558cac34 stable120731 11:31:44 [Note] WSREP: view(view_id(PRIM,72fe223a-db34-11e1-0800-75f38fd9825f,2) memb { 72fe223a-db34-11e1-0800-75f38fd9825f, 9889ff28-db35-11e1-0800-3fd5558cac34,} joined {} left {} partitioned {})120731 11:31:44 [Note] WSREP: New COMPONENT: primary = yes, bootstrap = no, my_idx = 0, memb_num = 2120731 11:31:44 [Note] WSREP: STATE_EXCHANGE: sent state UUID: 99112eef-db35-11e1-0800-030b8e7757e6120731 11:31:44 [Note] WSREP: STATE EXCHANGE: sent state msg: 99112eef-db35-11e1-0800-030b8e7757e6120731 11:31:44 [Note] WSREP: STATE EXCHANGE: got state msg: 99112eef-db35-11e1-0800-030b8e7757e6 from 0 (test-data1)120731 11:31:45 [Note] WSREP: STATE EXCHANGE: got state msg: 99112eef-db35-11e1-0800-030b8e7757e6 from 1 (test-data2)120731 11:31:45 [Note] WSREP: Quorum results: version = 2, component = PRIMARY, conf_id = 1, members = 1/2 (joined/total), act_id = 4, last_appl. = 0, protocols = 0/4/2 (gcs/repl/appl), group UUID = 28caa7c4-da8b-11e1-0800-4250fd8b1b8f120731 11:31:45 [Note] WSREP: Flow-control interval: [12, 23]120731 11:31:45 [Note] WSREP: New cluster view: global state: 28caa7c4-da8b-11e1-0800-4250fd8b1b8f:4, view# 2: Primary, number of nodes: 2, my index: 0, protocol version 2120731 11:31:45 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.120731 11:31:45 [Note] WSREP: Assign initial position for certification: 4, protocol version: 2120731 11:31:47 [Note] WSREP: Node 1 (test-data2) requested state transfer from ‘any’. Selected 0 (test-data1)(SYNCED) as donor.120731 11:31:47 [Note] WSREP: Shifting SYNCED → DONOR/DESYNCED (TO: 4)120731 11:31:47 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.120731 11:31:47 [Note] WSREP: Running: 'wsrep_sst_xtrabackup ‘donor’ ‘192.168.56.52:4444/xtrabackup_sst’ ‘(null)’ ‘/var/lib/mysql/’ ‘/etc/my.cnf’ ‘28caa7c4-da8b-11e1-0800-4250fd8b1b8f’ ‘4’ ‘0’'120731 11:31:47 [Note] WSREP: sst_donor_thread signaled with 0

Logs from node 2 after restart:

120731 11:31:42 mysqld_safe Starting mysqld daemon with databases from /var/lib/mysql120731 11:31:43 [Note] Flashcache bypass: disabled120731 11:31:43 [Note] Flashcache setup error is : ioctl failed120731 11:31:43 [Note] WSREP: Read nil XID from storage engines, skipping position init120731 11:31:43 [Note] WSREP: wsrep_load(): loading provider library '/usr/lib64/libgalera_smm.so’120731 11:31:43 [Note] WSREP: wsrep_load(): Galera 2.1(r113) by Codership Oy <info@codership.com> loaded succesfully.120731 11:31:43 [Note] WSREP: Found saved state: 00000000-0000-0000-0000-000000000000:-1120731 11:31:43 [Note] WSREP: Reusing existing ‘/var/lib/mysql//galera.cache’.120731 11:31:43 [Note] WSREP: Passing config to GCS: base_host = 192.168.56.52; gcache.dir = /var/lib/mysql/; gcache.keep_pages_size = 0; gcache.mem_size = 0; gcache.name = /var/lib/mysql//galera.cache; gcache.page_size = 128M; gcache.size = 128M; gcs.fc_debug = 0; gcs.fc_factor = 0.5; gcs.fc_limit = 16; gcs.fc_master_slave = NO; gcs.max_packet_size = 64500; gcs.max_throttle = 0.25; gcs.recv_q_hard_limit = 9223372036854775807; gcs.recv_q_soft_limit = 0.25; gcs.sync_donor = NO; replicator.causal_read_timeout = PT30S; replicator.commit_order = 3120731 11:31:43 [Note] WSREP: Assign initial position for certification: -1, protocol version: -1120731 11:31:43 [Note] WSREP: wsrep_sst_grab()120731 11:31:43 [Note] WSREP: Start replication120731 11:31:43 [Note] WSREP: Setting initial position to 00000000-0000-0000-0000-000000000000:-1120731 11:31:43 [Note] WSREP: protonet asio version 0120731 11:31:43 [Note] WSREP: backend: asio120731 11:31:43 [Note] WSREP: GMCast version 0120731 11:31:43 [Note] WSREP: (9889ff28-db35-11e1-0800-3fd5558cac34, ‘tcp://0.0.0.0:4567’) listening at tcp://0.0.0.0:4567120731 11:31:43 [Note] WSREP: (9889ff28-db35-11e1-0800-3fd5558cac34, ‘tcp://0.0.0.0:4567’) multicast: , ttl: 1120731 11:31:43 [Note] WSREP: EVS version 0120731 11:31:43 [Note] WSREP: PC version 0120731 11:31:43 [Note] WSREP: gcomm: connecting to group ‘mssptest’, peer '192.168.56.51:'120731 11:31:44 [Note] WSREP: declaring 72fe223a-db34-11e1-0800-75f38fd9825f stable120731 11:31:44 [Note] WSREP: view(view_id(PRIM,72fe223a-db34-11e1-0800-75f38fd9825f,2) memb { 72fe223a-db34-11e1-0800-75f38fd9825f, 9889ff28-db35-11e1-0800-3fd5558cac34,} joined {} left {} partitioned {})120731 11:31:44 [Note] WSREP: gcomm: connected120731 11:31:44 [Note] WSREP: Changing maximum packet size to 64500, resulting msg size: 32636120731 11:31:44 [Note] WSREP: Shifting CLOSED → OPEN (TO: 0)120731 11:31:44 [Note] WSREP: Opened channel 'mssptest’120731 11:31:44 [Note] WSREP: Waiting for SST to complete.120731 11:31:44 [Note] WSREP: New COMPONENT: primary = yes, bootstrap = no, my_idx = 1, memb_num = 2120731 11:31:44 [Note] WSREP: STATE EXCHANGE: Waiting for state UUID.120731 11:31:44 [Note] WSREP: STATE EXCHANGE: sent state msg: 99112eef-db35-11e1-0800-030b8e7757e6120731 11:31:44 [Note] WSREP: STATE EXCHANGE: got state msg: 99112eef-db35-11e1-0800-030b8e7757e6 from 0 (test-data1)120731 11:31:44 [Note] WSREP: STATE EXCHANGE: got state msg: 99112eef-db35-11e1-0800-030b8e7757e6 from 1 (test-data2)120731 11:31:44 [Note] WSREP: Quorum results: version = 2, component = PRIMARY, conf_id = 1, members = 1/2 (joined/total), act_id = 4, last_appl. = -1, protocols = 0/4/2 (gcs/repl/appl), group UUID = 28caa7c4-da8b-11e1-0800-4250fd8b1b8f120731 11:31:44 [Note] WSREP: Flow-control interval: [12, 23]120731 11:31:44 [Note] WSREP: Shifting OPEN → PRIMARY (TO: 4)120731 11:31:44 [Note] WSREP: State transfer required: Group state: 28caa7c4-da8b-11e1-0800-4250fd8b1b8f:4 Local state: 00000000-0000-0000-0000-000000000000:-1120731 11:31:44 [Note] WSREP: New cluster view: global state: 28caa7c4-da8b-11e1-0800-4250fd8b1b8f:4, view# 2: Primary, number of nodes: 2, my index: 1, protocol version 2120731 11:31:44 [Warning] WSREP: Gap in state sequence. Need state transfer.120731 11:31:46 [Note] WSREP: Running: 'wsrep_sst_xtrabackup ‘joiner’ ‘192.168.56.52’ ‘’ ‘/var/lib/mysql/’ ‘/etc/my.cnf’ ‘1635’ 2>sst.err’120731 11:31:46 [Note] WSREP: Prepared SST request: xtrabackup|192.168.56.52:4444/xtrabackup_sst120731 11:31:46 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.120731 11:31:46 [Note] WSREP: Assign initial position for certification: 4, protocol version: 2120731 11:31:46 [Warning] WSREP: Failed to prepare for incremental state transfer: Local state UUID (00000000-0000-0000-0000-000000000000) does not match group state UUID (28caa7c4-da8b-11e1-0800-4250fd8b1b8f): 1 (Operation not permitted) at galera/src/replicator_str.cpp:prepare_for_IST():439. IST will be unavailable.120731 11:31:46 [Note] WSREP: Node 1 (test-data2) requested state transfer from ‘any’. Selected 0 (test-data1)(SYNCED) as donor.120731 11:31:46 [Note] WSREP: Shifting PRIMARY → JOINER (TO: 4)120731 11:31:46 [Note] WSREP: Requesting state transfer: success, donor: 0

Node 1 my.cnf:

[mysqld]server_id=1binlog_format=ROWlog_bin=mysql-binwsrep_cluster_address=gcomm://wsrep_provider=/usr/lib64/libgalera_smm.sodatadir=/var/lib/mysqlwsrep_slave_threads=2wsrep_cluster_name=mssptestwsrep_sst_method=xtrabackupwsrep_node_name=test-data1bind_address=192.168.56.51wsrep_node_address=192.168.56.51log_slave_updatesinnodb_locks_unsafe_for_binlog=1innodb_autoinc_lock_mode=2innodb_buffer_pool_size=400Minnodb_log_file_size=64M

Node 2 my.cnf:

[mysqld]server_id=2binlog_format=ROWlog_bin=mysql-binwsrep_cluster_address=gcomm://192.168.56.51wsrep_provider=/usr/lib64/libgalera_smm.sodatadir=/var/lib/mysqlwsrep_slave_threads=2wsrep_cluster_name=mssptestwsrep_sst_method=xtrabackupwsrep_node_name=test-data2bind_address=192.168.56.52wsrep_node_address=192.168.56.52log_slave_updatesinnodb_locks_unsafe_for_binlog=1innodb_autoinc_lock_mode=2innodb_buffer_pool_size=400Minnodb_log_file_size=64M

tcpdump from node 1 after node 2 is up and awaiting data:

[root@test-data1 mysql]# tcpdump -nn host 192.168.56.52tcpdump: verbose output suppressed, use -v or -vv for full protocol decodelistening on eth0, link-type EN10MB (Ethernet), capture size 65535 bytes11:48:04.094683 IP 192.168.56.51.4567 > 192.168.56.52.55626: Flags [.], ack 2416899323, win 361, options [nop,nop,TS val 1198106 ecr 706701], length 011:48:04.094933 IP 192.168.56.52.55626 > 192.168.56.51.4567: Flags [.], ack 0, win 346, options [nop,nop,TS val 706741 ecr 1198066], length 011:48:05.054775 IP 192.168.56.51.4567 > 192.168.56.52.55626: Flags [P.], seq 0:80, ack 1, win 361, options [nop,nop,TS val 1199066 ecr 706741], length 8011:48:05.055118 IP 192.168.56.52.55626 > 192.168.56.51.4567: Flags [.], ack 80, win 346, options [nop,nop,TS val 707701 ecr 1199066], length 011:48:05.055346 IP 192.168.56.52.55626 > 192.168.56.51.4567: Flags [P.], seq 1:81, ack 80, win 346, options [nop,nop,TS val 707701 ecr 1199066], length 8011:48:05.055354 IP 192.168.56.51.4567 > 192.168.56.52.55626: Flags [.], ack 81, win 361, options [nop,nop,TS val 1199066 ecr 707701], length 011:48:05.055376 IP 192.168.56.52.55626 > 192.168.56.51.4567: Flags [P.], seq 81:161, ack 80, win 346, options [nop,nop,TS val 707701 ecr 1199066], length 8011:48:05.055380 IP 192.168.56.51.4567 > 192.168.56.52.55626: Flags [.], ack 161, win 361, options [nop,nop,TS val 1199066 ecr 707701], length 011:48:05.055506 IP 192.168.56.51.4567 > 192.168.56.52.55626: Flags [P.], seq 80:188, ack 161, win 361, options [nop,nop,TS val 1199066 ecr 707701], length 10811:48:05.055729 IP 192.168.56.51.4567 > 192.168.56.52.55626: Flags [P.], seq 188:268, ack 161, win 361, options [nop,nop,TS val 1199067 ecr 707701], length 8011:48:05.056103 IP 192.168.56.52.55626 > 192.168.56.51.4567: Flags [.], ack 268, win 346, options [nop,nop,TS val 707702 ecr 1199066], length 011:48:05.056110 IP 192.168.56.52.55626 > 192.168.56.51.4567: Flags [P.], seq 161:269, ack 268, win 346, options [nop,nop,TS val 707702 ecr 1199066], length 10811:48:05.096815 IP 192.168.56.51.4567 > 192.168.56.52.55626: Flags [.], ack 269, win 361, options [nop,nop,TS val 1199108 ecr 707702], length 011:48:06.055030 IP 192.168.56.51.4567 > 192.168.56.52.55626: Flags [P.], seq 268:348, ack 269, win 361, options [nop,nop,TS val 1200066 ecr 707702], length 8011:48:06.055651 IP 192.168.56.52.55626 > 192.168.56.51.4567: Flags [P.], seq 269:349, ack 348, win 346, options [nop,nop,TS val 708701 ecr 1200066], length 8011:48:06.055666 IP 192.168.56.51.4567 > 192.168.56.52.55626: Flags [.], ack 349, win 361, options [nop,nop,TS val 1200067 ecr 708701], length 011:48:06.055685 IP 192.168.56.52.55626 > 192.168.56.51.4567: Flags [P.], seq 349:429, ack 348, win 346, options [nop,nop,TS val 708701 ecr 1200066], length 8011:48:06.055688 IP 192.168.56.51.4567 > 192.168.56.52.55626: Flags [.], ack 429, win 361, options [nop,nop,TS val 1200067 ecr 708701], length 011:48:06.056119 IP 192.168.56.51.4567 > 192.168.56.52.55626: Flags [P.], seq 348:456, ack 429, win 361, options [nop,nop,TS val 1200067 ecr 708701], length 10811:48:06.056272 IP 192.168.56.51.4567 > 192.168.56.52.55626: Flags [P.], seq 456:536, ack 429, win 361, options [nop,nop,TS val 1200067 ecr 708701], length 8011:48:06.056667 IP 192.168.56.52.55626 > 192.168.56.51.4567: Flags [.], ack 536, win 346, options [nop,nop,TS val 708702 ecr 1200067], length 011:48:06.056837 IP 192.168.56.52.55626 > 192.168.56.51.4567: Flags [P.], seq 429:537, ack 536, win 346, options [nop,nop,TS val 708702 ecr 1200067], length 10811:48:06.096701 IP 192.168.56.51.4567 > 192.168.56.52.55626: Flags [.], ack 537, win 361, options [nop,nop,TS val 1200108 ecr 708702], length 011:48:07.055299 IP 192.168.56.51.4567 > 192.168.56.52.55626: Flags [P.], seq 536:616, ack 537, win 361, options [nop,nop,TS val 1201066 ecr 708702], length 8011:48:07.055468 IP 192.168.56.52.55626 > 192.168.56.51.4567: Flags [P.], seq 537:617, ack 536, win 346, options [nop,nop,TS val 709702 ecr 1200108], length 8011:48:07.055479 IP 192.168.56.51.4567 > 192.168.56.52.55626: Flags [.], ack 617, win 361, options [nop,nop,TS val 1201066 ecr 709702], length 011:48:07.055778 IP 192.168.56.51.4567 > 192.168.56.52.55626: Flags [P.], seq 616:724, ack 617, win 361, options [nop,nop,TS val 1201067 ecr 709702], length 10811:48:07.055953 IP 192.168.56.52.55626 > 192.168.56.51.4567: Flags [P.], seq 617:725, ack 616, win 346, options [nop,nop,TS val 709702 ecr 1201066], length 10811:48:07.095670 IP 192.168.56.51.4567 > 192.168.56.52.55626: Flags [.], ack 725, win 361, options [nop,nop,TS val 1201107 ecr 709702], length 011:48:07.095700 IP 192.168.56.52.55626 > 192.168.56.51.4567: Flags [.], ack 724, win 346, options [nop,nop,TS val 709742 ecr 1201067], length 011:48:08.055553 IP 192.168.56.51.4567 > 192.168.56.52.55626: Flags [P.], seq 724:804, ack 725, win 361, options [nop,nop,TS val 1202066 ecr 709742], length 8011:48:08.055902 IP 192.168.56.52.55626 > 192.168.56.51.4567: Flags [.], ack 804, win 346, options [nop,nop,TS val 710702 ecr 1202066], length 011:48:08.056049 IP 192.168.56.52.55626 > 192.168.56.51.4567: Flags [P.], seq 725:805, ack 804, win 346, options [nop,nop,TS val 710702 ecr 1202066], length 8011:48:08.056057 IP 192.168.56.51.4567 > 192.168.56.52.55626: Flags [.], ack 805, win 361, options [nop,nop,TS val 1202067 ecr 710702], length 011:48:08.056210 IP 192.168.56.52.55626 > 192.168.56.51.4567: Flags [P.], seq 805:885, ack 804, win 346, options [nop,nop,TS val 710702 ecr 1202066], length 8011:48:08.056215 IP 192.168.56.51.4567 > 192.168.56.52.55626: Flags [.], ack 885, win 361, options [nop,nop,TS val 1202067 ecr 710702], length 011:48:08.056440 IP 192.168.56.51.4567 > 192.168.56.52.55626: Flags [P.], seq 804:912, ack 885, win 361, options [nop,nop,TS val 1202067 ecr 710702], length 10811:48:08.056607 IP 192.168.56.51.4567 > 192.168.56.52.55626: Flags [P.], seq 912:992, ack 885, win 361, options [nop,nop,TS val 1202067 ecr 710702], length 8011:48:08.056905 IP 192.168.56.52.55626 > 192.168.56.51.4567: Flags [.], ack 992, win 346, options [nop,nop,TS val 710703 ecr 1202067], length 011:48:08.058675 IP 192.168.56.52.55626 > 192.168.56.51.4567: Flags [P.], seq 885:993, ack 992, win 346, options [nop,nop,TS val 710703 ecr 1202067], length 108

I’m having a similar problem, but with a bit more data. I have wsrep_sst_auth defined, and on the Joiner side, the log shows the user:pass defined (xtrabackup:xxxxxxxx). But on the Donor side, it appears to be blank B[/B], and the innobackup.backup.log seems to support that. This is on CentOS 6.3, using Percona-XtraDB-Cluster-galera 2.0-1.114 and percona-xtrabackup 2.0.2-461.

Joiner:

120927 12:36:55 [Note] WSREP: Node 2 (phx-wc-db03) requested state transfer from ‘any’. Selected 0 (phx-wc-db01)(SYNCED) as donor.120927 12:36:55 [Note] WSREP: Shifting PRIMARY → JOINER (TO: 1069182)120927 12:36:55 [Note] WSREP: Requesting state transfer: success, donor: 0120927 12:37:00 [ERROR] WSREP: Process completed with error: wsrep_sst_xtrabackup ‘joiner’ ‘10.200.2.123’ ‘xtrabackup:xxxxxxxx’ ‘/var/lib/mysql/’ ‘/etc/my.cnf’ ‘17705’ 2>sst.err: 32 (Broken pipe)

Donor:

120927 12:36:55 [Note] WSREP: Running: 'wsrep_sst_xtrabackup ‘donor’ ‘10.200.2.123:4444/xtrabackup_sst’ ‘(null)’ ‘/var/lib/mysql/’ ‘/etc/my.cnf’ ‘925af2fa-fd03-11e1-0800-27e59d4c5f19’ ‘1069182’ ‘0’'120927 12:36:55 [Note] WSREP: sst_donor_thread signaled with 0innobackupex finished with error: 255. Check /var/lib/mysql//innobackup.backup.log120927 12:37:00 [ERROR] WSREP: Failed to read from: wsrep_sst_xtrabackup ‘donor’ ‘10.200.2.123:4444/xtrabackup_sst’ ‘(null)’ ‘/var/lib/mysql/’ ‘/etc/my.cnf’ ‘925af2fa-fd03-11e1-0800-27e59d4c5f19’ ‘1069182’ ‘0’

Donor’s innobackup.backup.log:

InnoDB Backup Utility v1.5.1-xtrabackup; Copyright 2003, 2009 Innobase Oyand Percona Inc 2009-2012. All Rights Reserved.This software is published underthe GNU GENERAL PUBLIC LICENSE Version 2, June 1991.120927 12:36:58 innobackupex: Starting mysql with options: --unbuffered --120927 12:36:58 innobackupex: Connected to database with mysql child process (pid=4504)innobackupex: Error: mysql child process has died: ERROR 1045 (28000): Access denied for user ‘root’@‘localhost’ (using password: NO)