Replicate data from normal Master-Slave setup into a new PXC-cluster

Hi all :slight_smile:
I would like to use Percona XtraDB Cluster instead of our current Master-Slave setup. Currently I’m evaluating the whole stuff and already set up the PXC cluster with 3 nodes. Here are some hardware informations about the nodes: 3 x HP ProLiant DL380 G5 with

  • 8 CPU-Cores (2 x Intel E5410)
  • 32 GB RAM
  • 2 x 72G HDD (15k RPM / RAID 1) as root partition
  • 5 x 146G HDD (15k RPM / RAID 10 / 1 hotspare) mounted on /var/lib/mysql
  • OS: Debian Wheezy
  • PXC-Version: 5.6.19-25

The problem is getting the data from the master-slave setup into the pxc-cluster.
the current database size is ~ 35G.
The pxc-cluster is already set up and running well. So the first pxc-node (let’s call him pxc-01) node is also set up as a mysql-slave replicating from the current master-slave cluster. That also works well.
I have set log-slave-updates in my.cnf.

The /etc/mysql/my.cnf of pxc-01 looks like this:

Slave settings:

skip-slave-start
server-id = 1234
log-slave-updates = 1
log_bin = /var/log/mysql/mysql-bin.log
relay-log = mysqld-relay-bin
expire_logs_days = 1
max_binlog_size = 100M

Peromance settings:

innodb_buffer_pool_size = 24G
innodb_flush_log_at_trx_commit = 2

I’ve tried many times to get the whole pxc-cluster working but until now without any success.

pxc-01 is able to replicate without any problems, as long as the other pxc-nodes do not join the cluster. Whenever I start them or make them joining the cluster (they’re getting the data via SST), the replication on pxc-01 fails/stops working.
I’m always checking replication status on pxc-01 with SHOW SLAVE STATUS NOLOCK\G, and I’m getting the information as fast as on every other slave → As long as the other pxc-nodes are not in the cluster. If they are in the cluster, the replication stops working. SHOW SLAVE STATUS NOLOCK\G on pxc-01 doesn’t print anything, master.info is not being updated anymore and the “SHOW SLAVE STATUS NOLOCK\G”-process has to be killed or I have to restart MySQL on pxc-01. I can also see in the mysql-processlist, that nothing is being done by the slave anymore. The error-log doesn’t give any information about that and I found nobody having the same problem.

The current master is running with MySQL-5.1.49. But this can’t be the problem, because the MySQL-Slaves of that master are running with MySQL 5.5 and everything works fine. Is it possible that the throughput of the replication in combination with the stuff that the pxc-cluster does is simply too much for the system?But I guess that there are more complex use cases than this one, with more throughput than in this case.
I can also ensure, that the pxc-01 replicates from the master without any errors. But only as long as the other nodes don’t join the pxc-cluster.
I’ve also tried to import the mysqldump on pxc-01 while the whole pxc-cluster was running. The import worked, but as soon as I start the replication on pxc-01, the slave hangs. I’ve really tried that several times. Sometimes the replication worked 1 hour, but never longer than that.

Please let me know if you need more informations or something.

I would really like to use PXC, but I have to ensure that everything will work as expected and I would like to have no (or just a very very short) downtime.

What is happening in the PXC cluster once you add second node? Is the cluster actually formed or failed? Paste the log error fragment showing the moment when second node joins and slave node stops working.
Also, do you have MyISAM tables that may be replicated from async master?

Hi przemek!
Thanks for your fast reply :slight_smile:

The cluster is formend although the slave is not working.
There are only innodb-tables replicated and here is the log fragment:

Additional informations: pxc-01 is named backenddb-01 and the second pxc-node is backenddb-02


2014-07-31 19:14:33 1503 [Note] WSREP: (fe9842b9, 'tcp://0.0.0.0:4567') turning message relay requesting on, nonlive peers:
2014-07-31 19:14:34 1503 [Note] WSREP: declaring 244a7896 at tcp://172.20.0.9:4567 stable
2014-07-31 19:14:34 1503 [Note] WSREP: Node fe9842b9 state prim
2014-07-31 19:14:34 1503 [Note] WSREP: save pc into disk
2014-07-31 19:14:34 1503 [Note] WSREP: New COMPONENT: primary = yes, bootstrap = no, my_idx = 1, memb_num = 2
2014-07-31 19:14:34 1503 [Note] WSREP: STATE EXCHANGE: Waiting for state UUID.
2014-07-31 19:14:34 1503 [Note] WSREP: STATE EXCHANGE: sent state msg: 24e38cd1-18d6-11e4-abbc-af0bcdbb236f
2014-07-31 19:14:34 1503 [Note] WSREP: STATE EXCHANGE: got state msg: 24e38cd1-18d6-11e4-abbc-af0bcdbb236f from 0 (backenddb-02)
2014-07-31 19:14:34 1503 [Note] WSREP: STATE EXCHANGE: got state msg: 24e38cd1-18d6-11e4-abbc-af0bcdbb236f from 1 (backenddb-01)
2014-07-31 19:14:34 1503 [Note] WSREP: Quorum results:
version = 3,
component = PRIMARY,
conf_id = 1,
members = 1/2 (joined/total),
act_id = 3662350,
last_appl. = 3662240,
protocols = 0/6/3 (gcs/repl/appl),
group UUID = a296889f-13fd-11e4-9807-f2390d3c4de7
2014-07-31 19:14:34 1503 [Note] WSREP: Flow-control interval: [23, 23]
2014-07-31 19:14:34 1503 [Note] WSREP: New cluster view: global state: a296889f-13fd-11e4-9807-f2390d3c4de7:3662350, view# 2: Primary, number of nodes: 2, my index: 1, protocol version 3
2014-07-31 19:14:34 1503 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
2014-07-31 19:14:34 1503 [Note] WSREP: REPL Protocols: 6 (3, 2)
2014-07-31 19:14:34 1503 [Note] WSREP: Service thread queue flushed.
2014-07-31 19:14:34 1503 [Note] WSREP: Assign initial position for certification: 3662350, protocol version: 3
2014-07-31 19:14:34 1503 [Note] WSREP: Service thread queue flushed.
2014-07-31 19:14:37 1503 [Note] WSREP: (fe9842b9, 'tcp://0.0.0.0:4567') turning message relay requesting off
2014-07-31 19:14:37 1503 [Note] WSREP: Member 0.0 (backenddb-02) requested state transfer from '*any*'. Selected 1.0 (backenddb-01)(SYNCED) as donor.
2014-07-31 19:14:37 1503 [Note] WSREP: Shifting SYNCED -> DONOR/DESYNCED (TO: 3662350)
2014-07-31 19:14:37 1503 [Note] WSREP: IST request: a296889f-13fd-11e4-9807-f2390d3c4de7:2201852-3662350|tcp://172.20.0.9:4568
2014-07-31 19:14:37 1503 [Note] WSREP: IST first seqno 2201853 not found from cache, falling back to SST
2014-07-31 19:14:37 1503 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
2014-07-31 19:14:37 1503 [Note] WSREP: Running: 'wsrep_sst_xtrabackup-v2 --role 'donor' --address '172.20.0.9:4444/xtrabackup_sst' --auth 'sstuser:s3cretPass' --socket '/var/run/mysqld/mysqld.sock' --data
dir '/var/lib/mysql/' --defaults-file '/etc/mysql/my.cnf' '' --gtid 'a296889f-13fd-11e4-9807-f2390d3c4de7:3662350''
2014-07-31 19:14:37 1503 [Note] WSREP: sst_donor_thread signaled with 0
WSREP_SST: [INFO] Streaming with xbstream (20140731 19:14:37.753)
WSREP_SST: [INFO] Using socat as streamer (20140731 19:14:37.756)
WSREP_SST: [INFO] Using /tmp/tmp.ACIfiRagij as innobackupex temporary directory (20140731 19:14:38.163)
WSREP_SST: [INFO] Streaming GTID file before SST (20140731 19:14:38.168)
WSREP_SST: [INFO] Evaluating xbstream -c ${INFO_FILE} | socat -u stdio TCP:172.20.0.9:4444; RC=( ${PIPESTATUS[@]} ) (20140731 19:14:38.171)
WSREP_SST: [INFO] Sleeping before data transfer for SST (20140731 19:14:38.201)
WSREP_SST: [INFO] Streaming the backup to joiner at 172.20.0.9 4444 (20140731 19:14:48.204)
WSREP_SST: [INFO] Evaluating innobackupex --defaults-file=/etc/mysql/my.cnf --no-version-check $tmpopts $INNOEXTRA --galera-info --stream=$sfmt $itmpdir 2>${DATA}/innobackup.backup.log | socat -u stdio T
CP:172.20.0.9:4444; RC=( ${PIPESTATUS[@]} ) (20140731 19:14:48.208)
2014-07-31 19:21:57 1503 [Note] WSREP: Provider paused at a296889f-13fd-11e4-9807-f2390d3c4de7:3702665 (48769)
2014-07-31 19:22:00 1503 [Note] WSREP: resuming provider at 48769
2014-07-31 19:22:00 1503 [Note] WSREP: Provider resumed.
2014-07-31 19:22:08 1503 [Note] WSREP: 1.0 (backenddb-01): State transfer to 0.0 (backenddb-02) complete.
2014-07-31 19:22:08 1503 [Note] WSREP: Shifting DONOR/DESYNCED -> JOINED (TO: 3703347)
2014-07-31 19:22:08 1503 [Note] WSREP: Member 1.0 (backenddb-01) synced with group.
2014-07-31 19:22:08 1503 [Note] WSREP: Shifting JOINED -> SYNCED (TO: 3703347)
2014-07-31 19:22:08 1503 [Note] WSREP: Synchronized with group, ready for connections
2014-07-31 19:22:08 1503 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
WSREP_SST: [INFO] Total time on donor: 0 seconds (20140731 19:22:08.571)
WSREP_SST: [INFO] Cleaning up temporary directories (20140731 19:22:08.574)
2014-07-31 19:24:48 1503 [Note] WSREP: 0.0 (backenddb-02): State transfer from 1.0 (backenddb-01) complete.

After stopping MySQL on backenddb-02 the slave started working again and everything was back to normal.

I’m getting more and more desperate :smiley:

Update:

I did the same stuff again. Stopped MySQL on backenddb-02, waited a few minutes and started again. Here is the result of the log from backenddb-01:


140731 20:10:47 mysqld_safe Starting mysqld daemon with databases from /var/lib/mysql
140731 20:10:47 mysqld_safe Skipping wsrep-recover for a296889f-13fd-11e4-9807-f2390d3c4de7:3878257 pair
140731 20:10:47 mysqld_safe Assigning a296889f-13fd-11e4-9807-f2390d3c4de7:3878257 to wsrep_start_position
2014-07-31 20:10:47 0 [Warning] Using unique option prefix key_buffer instead of key_buffer_size is deprecated and will be removed in a future release. Please use the full name instead.
2014-07-31 20:10:47 0 [Note] WSREP: wsrep_start_position var submitted: 'a296889f-13fd-11e4-9807-f2390d3c4de7:3878257'
2014-07-31 20:10:47 0 [Warning] TIMESTAMP with implicit DEFAULT value is deprecated. Please use --explicit_defaults_for_timestamp server option (see documentation for more details).
2014-07-31 20:10:47 31574 [Note] WSREP: Read nil XID from storage engines, skipping position init
2014-07-31 20:10:47 31574 [Note] WSREP: wsrep_load(): loading provider library '/usr/lib/libgalera_smm.so'
2014-07-31 20:10:47 31574 [Note] WSREP: wsrep_load(): Galera 3.6(r3a949e6) by Codership Oy <info&#64;codership.com> loaded successfully.
2014-07-31 20:10:47 31574 [Note] WSREP: CRC-32C: using "slicing-by-8" algorithm.
2014-07-31 20:10:47 31574 [Note] WSREP: Found saved state: a296889f-13fd-11e4-9807-f2390d3c4de7:3878257
2014-07-31 20:10:47 31574 [Note] WSREP: Passing config to GCS: base_host = 172.20.0.8; base_port = 4567; cert.log_conflicts = no; debug = no; evs.inactive_check_period = PT0.5S; evs.inactive_timeout = PT15S; evs.join_retrans_period = PT1S; evs.max_install_timeouts = 3; evs.send_window = 4; evs.stats_report_period = PT1M; evs.suspect_timeout = PT5S; evs.user_send_window = 2; evs.view_forget_timeout = PT24H; gcache.dir = /var/lib/mysql/; gcache.keep_pages_size = 0; gcache.mem_size = 0; gcache.name = /var/lib/mysql//galera.cache; gcache.page_size = 128M; gcache.size = 50G; gcs.fc_debug = 0; gcs.fc_factor = 1.0; gcs.fc_limit = 16; gcs.fc_master_slave = no; gcs.max_packet_size = 64500; gcs.max_throttle = 0.25; gcs.recv_q_hard_limit = 9223372036854775807; gcs.recv_q_soft_limit = 0.25; gcs.sync_donor = no; gmcast.segment = 0; gmcast.version = 0; pc.announce_timeout = PT3S; pc.checksum = false; pc.ignore_quorum = false; pc.ignore_sb = false; pc.npvo = false; pc.recovery = 1; pc.version = 0; pc.wait_prim = true; pc.wait_prim_timeout = P30S; pc.weigh
2014-07-31 20:10:47 31574 [Note] WSREP: Service thread queue flushed.
2014-07-31 20:10:47 31574 [Note] WSREP: Assign initial position for certification: 3878257, protocol version: -1
2014-07-31 20:10:47 31574 [Note] WSREP: wsrep_sst_grab()
2014-07-31 20:10:47 31574 [Note] WSREP: Start replication
2014-07-31 20:10:47 31574 [Note] WSREP: Setting initial position to a296889f-13fd-11e4-9807-f2390d3c4de7:3878257
2014-07-31 20:10:47 31574 [Note] WSREP: protonet asio version 0
2014-07-31 20:10:47 31574 [Note] WSREP: Using CRC-32C (optimized) for message checksums.
2014-07-31 20:10:47 31574 [Note] WSREP: backend: asio
2014-07-31 20:10:47 31574 [Warning] WSREP: access file(gvwstate.dat) failed(No such file or directory)
2014-07-31 20:10:47 31574 [Note] WSREP: restore pc from disk failed
2014-07-31 20:10:47 31574 [Note] WSREP: GMCast version 0
2014-07-31 20:10:47 31574 [Note] WSREP: (ff66a051, 'tcp://0.0.0.0:4567') listening at tcp://0.0.0.0:4567
2014-07-31 20:10:47 31574 [Note] WSREP: (ff66a051, 'tcp://0.0.0.0:4567') multicast: , ttl: 1
2014-07-31 20:10:47 31574 [Note] WSREP: EVS version 0
2014-07-31 20:10:47 31574 [Note] WSREP: PC version 0
2014-07-31 20:10:47 31574 [Note] WSREP: gcomm: bootstrapping new group 'backenddb_cluster'
2014-07-31 20:10:47 31574 [Note] WSREP: start_prim is enabled, turn off pc_recovery
2014-07-31 20:10:47 31574 [Note] WSREP: Node ff66a051 state prim
2014-07-31 20:10:47 31574 [Note] WSREP: save pc into disk
2014-07-31 20:10:47 31574 [Note] WSREP: discarding pending addr without UUID: tcp://172.20.0.10:4567
2014-07-31 20:10:47 31574 [Note] WSREP: discarding pending addr proto entry 0x2d71a50
2014-07-31 20:10:47 31574 [Note] WSREP: discarding pending addr without UUID: tcp://172.20.0.8:4567
2014-07-31 20:10:47 31574 [Note] WSREP: discarding pending addr proto entry 0x2d7a110
2014-07-31 20:10:47 31574 [Note] WSREP: discarding pending addr without UUID: tcp://172.20.0.9:4567
2014-07-31 20:10:47 31574 [Note] WSREP: discarding pending addr proto entry 0x2d82910
2014-07-31 20:10:47 31574 [Note] WSREP: gcomm: connected
2014-07-31 20:10:47 31574 [Note] WSREP: Changing maximum packet size to 64500, resulting msg size: 32636
2014-07-31 20:10:47 31574 [Note] WSREP: Shifting CLOSED -> OPEN (TO: 0)
2014-07-31 20:10:47 31574 [Note] WSREP: Opened channel 'backenddb_cluster'
2014-07-31 20:10:47 31574 [Note] WSREP: New COMPONENT: primary = yes, bootstrap = no, my_idx = 0, memb_num = 1
2014-07-31 20:10:47 31574 [Note] WSREP: Waiting for SST to complete.
2014-07-31 20:10:47 31574 [Note] WSREP: STATE_EXCHANGE: sent state UUID: ff675558-18dd-11e4-9055-6fd8d8ced1a3
2014-07-31 20:10:47 31574 [Note] WSREP: STATE EXCHANGE: sent state msg: ff675558-18dd-11e4-9055-6fd8d8ced1a3
2014-07-31 20:10:47 31574 [Note] WSREP: STATE EXCHANGE: got state msg: ff675558-18dd-11e4-9055-6fd8d8ced1a3 from 0 (backenddb-01)
2014-07-31 20:10:47 31574 [Note] WSREP: Quorum results:
version = 3,
component = PRIMARY,
conf_id = 0,
members = 1/1 (joined/total),
act_id = 3878257,
last_appl. = -1,
protocols = 0/6/3 (gcs/repl/appl),
group UUID = a296889f-13fd-11e4-9807-f2390d3c4de7
2014-07-31 20:10:47 31574 [Note] WSREP: Flow-control interval: [16, 16]
2014-07-31 20:10:47 31574 [Note] WSREP: Restored state OPEN -> JOINED (3878257)
2014-07-31 20:10:47 31574 [Note] WSREP: Member 0.0 (backenddb-01) synced with group.
2014-07-31 20:10:47 31574 [Note] WSREP: Shifting JOINED -> SYNCED (TO: 3878257)
2014-07-31 20:10:47 31574 [Note] WSREP: New cluster view: global state: a296889f-13fd-11e4-9807-f2390d3c4de7:3878257, view# 1: Primary, number of nodes: 1, my index: 0, protocol version 3
2014-07-31 20:10:47 31574 [Note] WSREP: SST complete, seqno: 3878257
2014-07-31 20:10:47 31574 [Warning] Using unique option prefix myisam-recover instead of myisam-recover-options is deprecated and will be removed in a future release. Please use the full name instead.
2014-07-31 20:10:47 31574 [Note] Plugin 'FEDERATED' is disabled.
2014-07-31 20:10:47 31574 [Note] InnoDB: Using atomics to ref count buffer pool pages
2014-07-31 20:10:47 31574 [Note] InnoDB: The InnoDB memory heap is disabled
2014-07-31 20:10:47 31574 [Note] InnoDB: Mutexes and rw_locks use GCC atomic builtins
2014-07-31 20:10:47 31574 [Note] InnoDB: Compressed tables use zlib 1.2.7
2014-07-31 20:10:47 31574 [Note] InnoDB: Using Linux native AIO
2014-07-31 20:10:47 31574 [Note] InnoDB: Not using CPU crc32 instructions
2014-07-31 20:10:47 31574 [Note] InnoDB: Initializing buffer pool, size = 24.0G
2014-07-31 20:10:50 31574 [Note] InnoDB: Completed initialization of buffer pool
2014-07-31 20:10:50 31574 [Note] InnoDB: Highest supported file format is Barracuda.
2014-07-31 20:10:51 31574 [Note] InnoDB: 128 rollback segment(s) are active.
2014-07-31 20:10:51 31574 [Note] InnoDB: Waiting for purge to start
2014-07-31 20:10:51 31574 [Note] InnoDB: Percona XtraDB (http://www.percona.com) 5.6.19-67.0 started; log sequence number 381227495665
2014-07-31 20:10:51 31574 [Note] RSA private key file not found: /var/lib/mysql//private_key.pem. Some authentication plugins will not work.
2014-07-31 20:10:51 31574 [Note] RSA public key file not found: /var/lib/mysql//public_key.pem. Some authentication plugins will not work.
2014-07-31 20:10:51 31574 [Note] Server hostname (bind-address): '172.20.0.8'; port: 3306
2014-07-31 20:10:51 31574 [Note] - '172.20.0.8' resolves to '172.20.0.8';
2014-07-31 20:10:51 31574 [Note] Server socket created on IP: '172.20.0.8'.
2014-07-31 20:10:51 31574 [Note] Event Scheduler: Loaded 0 events
2014-07-31 20:10:51 31574 [Note] /usr/sbin/mysqld: ready for connections.
Version: '5.6.19-67.0-56-log' socket: '/var/run/mysqld/mysqld.sock' port: 3306 Percona XtraDB Cluster (GPL), Release 25.6, wsrep_25.6.r4111


2014-07-31 20:10:51 31574 [Note] WSREP: inited wsrep sidno 2
2014-07-31 20:10:51 31574 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
2014-07-31 20:10:51 31574 [Note] WSREP: REPL Protocols: 6 (3, 2)
2014-07-31 20:10:51 31574 [Note] WSREP: Service thread queue flushed.
2014-07-31 20:10:51 31574 [Note] WSREP: Assign initial position for certification: 3878257, protocol version: 3
2014-07-31 20:10:51 31574 [Note] WSREP: Service thread queue flushed.
2014-07-31 20:10:51 31574 [Note] WSREP: Synchronized with group, ready for connections
2014-07-31 20:10:51 31574 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
2014-07-31 20:11:21 31574 [Warning] Storing MySQL user name or password information in the master info repository is not secure and is therefore not recommended. Please consider using the USER and PASSWORD connection options for START SLAVE; see the 'START SLAVE Syntax' in the MySQL Manual for more information.
2014-07-31 20:11:21 31574 [Warning] Slave SQL: If a crash happens this configuration does not guarantee that the relay log info will be consistent, Error_code: 0
2014-07-31 20:11:21 31574 [Note] WSREP: ready state reached
2014-07-31 20:11:21 31574 [Note] Slave SQL thread initialized, starting replication in log 'mysql-bin.235894' at position 79854239, relay log './mysqld-relay-bin.001643' position: 79854398
2014-07-31 20:11:21 31574 [Note] Slave I/O thread: connected to master 'replication&#64;172.20.2.3:3306',replication started in log 'mysql-bin.235895' at position 52096169
2014-07-31 20:11:21 31574 [Warning] Slave I/O: Notifying master by SET &#64;master_binlog_checksum= &#64;&#64;global.binlog_checksum failed with error: Unknown system variable 'binlog_checksum', Error_code: 1193
2014-07-31 20:11:21 31574 [Warning] Slave I/O: Unknown system variable 'SERVER_UUID' on master. A probable cause is that the variable is not supported on the master (version: 5.1.49-3-log), even though it is on the slave (version: 5.6.19-67.0-56-log), Error_code: 1193
2014-07-31 20:11:55 31574 [Note] WSREP: (ff66a051, 'tcp://0.0.0.0:4567') turning message relay requesting on, nonlive peers:
2014-07-31 20:11:55 31574 [Note] WSREP: declaring 27ef482c at tcp://172.20.0.9:4567 stable
2014-07-31 20:11:55 31574 [Note] WSREP: Node ff66a051 state prim
2014-07-31 20:11:55 31574 [Note] WSREP: save pc into disk
2014-07-31 20:11:55 31574 [Note] WSREP: New COMPONENT: primary = yes, bootstrap = no, my_idx = 1, memb_num = 2
2014-07-31 20:11:55 31574 [Note] WSREP: STATE EXCHANGE: Waiting for state UUID.
2014-07-31 20:11:56 31574 [Note] WSREP: STATE EXCHANGE: sent state msg: 283c0b18-18de-11e4-9378-7b8ab369abcd
2014-07-31 20:11:56 31574 [Note] WSREP: STATE EXCHANGE: got state msg: 283c0b18-18de-11e4-9378-7b8ab369abcd from 0 (backenddb-02)
2014-07-31 20:11:56 31574 [Note] WSREP: STATE EXCHANGE: got state msg: 283c0b18-18de-11e4-9378-7b8ab369abcd from 1 (backenddb-01)
2014-07-31 20:11:56 31574 [Note] WSREP: Quorum results:
version = 3,
component = PRIMARY,
conf_id = 1,
members = 1/2 (joined/total),
act_id = 3878257,
last_appl. = 0,
protocols = 0/6/3 (gcs/repl/appl),
group UUID = a296889f-13fd-11e4-9807-f2390d3c4de7
2014-07-31 20:11:56 31574 [Note] WSREP: Flow-control interval: [23, 23]
2014-07-31 20:11:56 31574 [Note] WSREP: New cluster view: global state: a296889f-13fd-11e4-9807-f2390d3c4de7:3878257, view# 2: Primary, number of nodes: 2, my index: 1, protocol version 3
2014-07-31 20:11:56 31574 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
2014-07-31 20:11:56 31574 [Note] WSREP: REPL Protocols: 6 (3, 2)
2014-07-31 20:11:56 31574 [Note] WSREP: Service thread queue flushed.
2014-07-31 20:11:56 31574 [Note] WSREP: Assign initial position for certification: 3878257, protocol version: 3
2014-07-31 20:11:56 31574 [Note] WSREP: Service thread queue flushed.
2014-07-31 20:11:58 31574 [Note] WSREP: (ff66a051, 'tcp://0.0.0.0:4567') turning message relay requesting off
2014-07-31 20:11:59 31574 [Note] WSREP: Member 0.0 (backenddb-02) requested state transfer from '*any*'. Selected 1.0 (backenddb-01)(SYNCED) as donor.
2014-07-31 20:11:59 31574 [Note] WSREP: Shifting SYNCED -> DONOR/DESYNCED (TO: 3878257)
2014-07-31 20:11:59 31574 [Note] WSREP: IST request: a296889f-13fd-11e4-9807-f2390d3c4de7:3702665-3878257|tcp://172.20.0.9:4568
2014-07-31 20:11:59 31574 [Note] WSREP: IST first seqno 3702666 not found from cache, falling back to SST
2014-07-31 20:11:59 31574 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
2014-07-31 20:11:59 31574 [Note] WSREP: Running: 'wsrep_sst_xtrabackup-v2 --role 'donor' --address '172.20.0.9:4444/xtrabackup_sst' --auth 'sstuser:s3cretPass' --socket '/var/run/mysqld/mysqld.sock' --datadir '/var/lib/mysql/' --defaults-file '/etc/mysql/my.cnf' '' --gtid 'a296889f-13fd-11e4-9807-f2390d3c4de7:3878257''
2014-07-31 20:11:59 31574 [Note] WSREP: sst_donor_thread signaled with 0
WSREP_SST: [INFO] Streaming with xbstream (20140731 20:11:59.315)
WSREP_SST: [INFO] Using socat as streamer (20140731 20:11:59.317)
WSREP_SST: [INFO] Using /tmp/tmp.6i6qzI1XGo as innobackupex temporary directory (20140731 20:11:59.599)
WSREP_SST: [INFO] Streaming GTID file before SST (20140731 20:11:59.605)
WSREP_SST: [INFO] Evaluating xbstream -c ${INFO_FILE} | socat -u stdio TCP:172.20.0.9:4444; RC=( ${PIPESTATUS[&#64;]} ) (20140731 20:11:59.608)
WSREP_SST: [INFO] Sleeping before data transfer for SST (20140731 20:11:59.614)
WSREP_SST: [INFO] Streaming the backup to joiner at 172.20.0.9 4444 (20140731 20:12:09.617)
WSREP_SST: [INFO] Evaluating innobackupex --defaults-file=/etc/mysql/my.cnf --no-version-check $tmpopts $INNOEXTRA --galera-info --stream=$sfmt $itmpdir 2>${DATA}/innobackup.backup.log | socat -u stdio TCP:172.20.0.9:4444; RC=( ${PIPESTATUS[&#64;]} ) (20140731 20:12:09.621)
2014/07/31 20:12:11 socat[32352] E write(3, 0x14732b0, 8192): Broken pipe
WSREP_SST: [ERROR] innobackupex finished with error: 1. Check /var/lib/mysql//innobackup.backup.log (20140731 20:12:11.888)
WSREP_SST: [ERROR] Cleanup after exit with status:22 (20140731 20:12:11.890)
WSREP_SST: [INFO] Cleaning up temporary directories (20140731 20:12:11.893)
2014-07-31 20:12:11 31574 [ERROR] WSREP: Failed to read from: wsrep_sst_xtrabackup-v2 --role 'donor' --address '172.20.0.9:4444/xtrabackup_sst' --auth 'sstuser:s3cretPass' --socket '/var/run/mysqld/mysqld.sock' --datadir '/var/lib/mysql/' --defaults-file '/etc/mysql/my.cnf' '' --gtid 'a296889f-13fd-11e4-9807-f2390d3c4de7:3878257'
2014-07-31 20:12:11 31574 [ERROR] WSREP: Process completed with error: wsrep_sst_xtrabackup-v2 --role 'donor' --address '172.20.0.9:4444/xtrabackup_sst' --auth 'sstuser:s3cretPass' --socket '/var/run/mysqld/mysqld.sock' --datadir '/var/lib/mysql/' --defaults-file '/etc/mysql/my.cnf' '' --gtid 'a296889f-13fd-11e4-9807-f2390d3c4de7:3878257': 22 (Invalid argument)
2014-07-31 20:12:11 31574 [ERROR] WSREP: Command did not run: wsrep_sst_xtrabackup-v2 --role 'donor' --address '172.20.0.9:4444/xtrabackup_sst' --auth 'sstuser:s3cretPass' --socket '/var/run/mysqld/mysqld.sock' --datadir '/var/lib/mysql/' --defaults-file '/etc/mysql/my.cnf' '' --gtid 'a296889f-13fd-11e4-9807-f2390d3c4de7:3878257'
2014-07-31 20:12:11 31574 [Warning] WSREP: 1.0 (backenddb-01): State transfer to 0.0 (backenddb-02) failed: -22 (Invalid argument)
2014-07-31 20:12:11 31574 [Note] WSREP: Shifting DONOR/DESYNCED -> JOINED (TO: 3879661)
2014-07-31 20:12:11 31574 [Note] WSREP: Member 1.0 (backenddb-01) synced with group.
2014-07-31 20:12:11 31574 [Note] WSREP: Shifting JOINED -> SYNCED (TO: 3879661)
2014-07-31 20:12:11 31574 [Note] WSREP: Synchronized with group, ready for connections
2014-07-31 20:12:11 31574 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
2014-07-31 20:12:12 31574 [Note] WSREP: forgetting 27ef482c (tcp://172.20.0.9:4567)
2014-07-31 20:12:12 31574 [Note] WSREP: Node ff66a051 state prim
2014-07-31 20:12:12 31574 [Note] WSREP: save pc into disk
2014-07-31 20:12:12 31574 [Note] WSREP: forgetting 27ef482c (tcp://172.20.0.9:4567)
2014-07-31 20:12:12 31574 [Note] WSREP: New COMPONENT: primary = yes, bootstrap = no, my_idx = 0, memb_num = 1
2014-07-31 20:12:12 31574 [Note] WSREP: STATE_EXCHANGE: sent state UUID: 32067541-18de-11e4-ae17-bb9f72bd71e4
2014-07-31 20:12:12 31574 [Note] WSREP: STATE EXCHANGE: sent state msg: 32067541-18de-11e4-ae17-bb9f72bd71e4
2014-07-31 20:12:12 31574 [Note] WSREP: STATE EXCHANGE: got state msg: 32067541-18de-11e4-ae17-bb9f72bd71e4 from 0 (backenddb-01)
2014-07-31 20:12:12 31574 [Note] WSREP: Quorum results:
version = 3,
component = PRIMARY,
conf_id = 2,
members = 1/1 (joined/total),
act_id = 3879661,
last_appl. = 3879607,
protocols = 0/6/3 (gcs/repl/appl),
group UUID = a296889f-13fd-11e4-9807-f2390d3c4de7
2014-07-31 20:12:12 31574 [Note] WSREP: Flow-control interval: [16, 16]
2014-07-31 20:12:12 31574 [Note] WSREP: New cluster view: global state: a296889f-13fd-11e4-9807-f2390d3c4de7:3879661, view# 3: Primary, number of nodes: 1, my index: 0, protocol version 3


2014-07-31 20:12:12 31574 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
2014-07-31 20:12:12 31574 [Note] WSREP: REPL Protocols: 6 (3, 2)
2014-07-31 20:12:12 31574 [Note] WSREP: Service thread queue flushed.
2014-07-31 20:12:12 31574 [Note] WSREP: Assign initial position for certification: 3879661, protocol version: 3
2014-07-31 20:12:12 31574 [Note] WSREP: Service thread queue flushed.
2014-07-31 20:12:13 31574 [Note] WSREP: remote endpoint tcp://172.20.0.9:4567 changed identity 27ef482c -> 32873135
2014-07-31 20:12:13 31574 [Note] WSREP: (ff66a051, 'tcp://0.0.0.0:4567') turning message relay requesting on, nonlive peers:
2014-07-31 20:12:13 31574 [Note] WSREP: declaring 32873135 at tcp://172.20.0.9:4567 stable
2014-07-31 20:12:13 31574 [Note] WSREP: Node ff66a051 state prim
2014-07-31 20:12:13 31574 [Note] WSREP: save pc into disk
2014-07-31 20:12:13 31574 [Note] WSREP: New COMPONENT: primary = yes, bootstrap = no, my_idx = 1, memb_num = 2
2014-07-31 20:12:13 31574 [Note] WSREP: STATE EXCHANGE: Waiting for state UUID.
2014-07-31 20:12:14 31574 [Note] WSREP: STATE EXCHANGE: sent state msg: 32d3fc47-18de-11e4-a846-8bde725628ce
2014-07-31 20:12:14 31574 [Note] WSREP: STATE EXCHANGE: got state msg: 32d3fc47-18de-11e4-a846-8bde725628ce from 0 (backenddb-02)
2014-07-31 20:12:14 31574 [Note] WSREP: STATE EXCHANGE: got state msg: 32d3fc47-18de-11e4-a846-8bde725628ce from 1 (backenddb-01)
2014-07-31 20:12:14 31574 [Note] WSREP: Quorum results:
version = 3,
component = PRIMARY,
conf_id = 3,
members = 1/2 (joined/total),
act_id = 3879871,
last_appl. = 3879787,
protocols = 0/6/3 (gcs/repl/appl),
group UUID = a296889f-13fd-11e4-9807-f2390d3c4de7
2014-07-31 20:12:14 31574 [Note] WSREP: Flow-control interval: [23, 23]
2014-07-31 20:12:14 31574 [Note] WSREP: New cluster view: global state: a296889f-13fd-11e4-9807-f2390d3c4de7:3879871, view# 4: Primary, number of nodes: 2, my index: 1, protocol version 3
2014-07-31 20:12:14 31574 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
2014-07-31 20:12:14 31574 [Note] WSREP: REPL Protocols: 6 (3, 2)
2014-07-31 20:12:14 31574 [Note] WSREP: Service thread queue flushed.
2014-07-31 20:12:14 31574 [Note] WSREP: Assign initial position for certification: 3879871, protocol version: 3
2014-07-31 20:12:14 31574 [Note] WSREP: Service thread queue flushed.
2014-07-31 20:12:16 31574 [Note] WSREP: Member 0.0 (backenddb-02) requested state transfer from '*any*'. Selected 1.0 (backenddb-01)(SYNCED) as donor.
2014-07-31 20:12:16 31574 [Note] WSREP: Shifting SYNCED -> DONOR/DESYNCED (TO: 3880283)
2014-07-31 20:12:16 31574 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
2014-07-31 20:12:16 31574 [Note] WSREP: Running: 'wsrep_sst_xtrabackup-v2 --role 'donor' --address '172.20.0.9:4444/xtrabackup_sst' --auth 'sstuser:s3cretPass' --socket '/var/run/mysqld/mysqld.sock' --datadir '/var/lib/mysql/' --defaults-file '/etc/mysql/my.cnf' '' --gtid 'a296889f-13fd-11e4-9807-f2390d3c4de7:3880283''
2014-07-31 20:12:16 31574 [Note] WSREP: sst_donor_thread signaled with 0
2014-07-31 20:12:16 31574 [Note] WSREP: (ff66a051, 'tcp://0.0.0.0:4567') turning message relay requesting off
WSREP_SST: [INFO] Streaming with xbstream (20140731 20:12:17.094)
WSREP_SST: [INFO] Using socat as streamer (20140731 20:12:17.096)
WSREP_SST: [INFO] Using /tmp/tmp.QwS3wr9Hw4 as innobackupex temporary directory (20140731 20:12:17.391)
WSREP_SST: [INFO] Streaming GTID file before SST (20140731 20:12:17.397)
WSREP_SST: [INFO] Evaluating xbstream -c ${INFO_FILE} | socat -u stdio TCP:172.20.0.9:4444; RC=( ${PIPESTATUS[&#64;]} ) (20140731 20:12:17.400)
WSREP_SST: [INFO] Sleeping before data transfer for SST (20140731 20:12:17.406)
WSREP_SST: [INFO] Streaming the backup to joiner at 172.20.0.9 4444 (20140731 20:12:27.410)
WSREP_SST: [INFO] Evaluating innobackupex --defaults-file=/etc/mysql/my.cnf --no-version-check $tmpopts $INNOEXTRA --galera-info --stream=$sfmt $itmpdir 2>${DATA}/innobackup.backup.log | socat -u stdio TCP:172.20.0.9:4444; RC=( ${PIPESTATUS[&#64;]} ) (20140731 20:12:27.413)
2014-07-31 20:16:33 7fe5e69e8700 InnoDB: unable to purge a record
InnoDB: tuple DATA TUPLE: 2 fields;
0: len 4; hex 002bad15; asc + ;;
1: len 4; hex 000c8529; asc );;

InnoDB: record PHYSICAL RECORD: n_fields 2; compact format; info bits 0
0: len 4; hex 002bad15; asc + ;;
1: len 4; hex 000c8529; asc );;

space 551 offset 7760 (1095 records, index id 728)
InnoDB: Submit a detailed bug report to [URL]http://bugs.mysql.com[/URL]
2014-07-31 20:16:52 31574 [ERROR] Transaction not registered for MySQL 2PC, but transaction is active
2014-07-31 20:18:46 7fe5e59e6700 InnoDB: unable to purge a record
InnoDB: tuple DATA TUPLE: 2 fields;
0: len 4; hex 001604de; asc ;;
1: len 4; hex 0005951e; asc ;;

InnoDB: record PHYSICAL RECORD: n_fields 2; compact format; info bits 0
0: len 4; hex 001604de; asc ;;
1: len 4; hex 0005951e; asc ;;

space 551 offset 4544 (937 records, index id 728)
InnoDB: Submit a detailed bug report to [URL]http://bugs.mysql.com[/URL]
2014-07-31 20:19:04 31574 [Note] WSREP: Provider paused at a296889f-13fd-11e4-9807-f2390d3c4de7:3914349 (36433)
2014-07-31 20:19:07 31574 [Note] WSREP: resuming provider at 36433
2014-07-31 20:19:07 31574 [Note] WSREP: Provider resumed.
2014-07-31 20:19:21 31574 [Note] WSREP: 1.0 (backenddb-01): State transfer to 0.0 (backenddb-02) complete.
2014-07-31 20:19:21 31574 [Note] WSREP: Shifting DONOR/DESYNCED -> JOINED (TO: 3916957)
2014-07-31 20:19:21 31574 [Note] WSREP: Member 1.0 (backenddb-01) synced with group.
2014-07-31 20:19:21 31574 [Note] WSREP: Shifting JOINED -> SYNCED (TO: 3916957)
2014-07-31 20:19:21 31574 [Note] WSREP: Synchronized with group, ready for connections
2014-07-31 20:19:21 31574 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
WSREP_SST: [INFO] Total time on donor: 0 seconds (20140731 20:19:21.425)
WSREP_SST: [INFO] Cleaning up temporary directories (20140731 20:19:21.429)
2014-07-31 20:23:36 31574 [Note] WSREP: 0.0 (backenddb-02): State transfer from 1.0 (backenddb-01) complete.
2014-07-31 20:27:53 31574 [Note] WSREP: Member 0.0 (backenddb-02) synced with group.
2014-07-31 20:29:38 7fe5e71e9700 InnoDB: unable to purge a record
InnoDB: tuple DATA TUPLE: 2 fields;
0: len 4; hex 000d5d7c; asc ]|;;
1: len 4; hex 0003f422; asc ";;

InnoDB: record PHYSICAL RECORD: n_fields 2; compact format; info bits 0
0: len 4; hex 000d5d7c; asc ]|;;
1: len 4; hex 0003f422; asc ";;

space 551 offset 2319 (806 records, index id 728)
InnoDB: Submit a detailed bug report to [URL]http://bugs.mysql.com[/URL]
2014-07-31 20:31:50 31574 [ERROR] Error reading packet from server: Lost connection to MySQL server during query ( server_errno=2013)
2014-07-31 20:31:50 31574 [Note] Slave I/O thread: Failed reading log event, reconnecting to retry, log 'mysql-bin.235900' at position 64232560
2014-07-31 20:31:50 31574 [Warning] Storing MySQL user name or password information in the master info repository is not secure and is therefore not recommended. Please consider using the USER and PASSWORD connection options for START SLAVE; see the 'START SLAVE Syntax' in the MySQL Manual for more information.
2014-07-31 20:31:50 31574 [Warning] Slave I/O: Notifying master by SET &#64;master_binlog_checksum= &#64;&#64;global.binlog_checksum failed with error: Unknown system variable 'binlog_checksum', Error_code: 1193
2014-07-31 20:31:50 31574 [Warning] Slave I/O: Unknown system variable 'SERVER_UUID' on master. A probable cause is that the variable is not supported on the master (version: 5.1.49-3-log), even though it is on the slave (version: 5.6.19-67.0-56-log), Error_code: 1193

I did a SHOW SLAVE STATUS NOLOCK\G on backenddb-01 and the slave was hanging again - No output.
Here is the output of show processlist;


mysql> show processlist;
+-----+-------------+-----------+--------------------+---------+------+----------------------------------------+--------------------------+-----------+---------------+
| Id | User | Host | db | Command | Time | State | Info | Rows_sent | Rows_examined |
+-----+-------------+-----------+--------------------+---------+------+----------------------------------------+--------------------------+-----------+---------------+
| 1 | system user | | NULL | Sleep | 1882 | wsrep aborter idle | NULL | 0 | 0 |
| 2 | system user | | NULL | Sleep | 1882 | NULL | NULL | 0 | 0 |
| 4 | root | localhost | NULL | Query | 403 | init | SHOW SLAVE STATUS NOLOCK | 0 | 0 |
| 5 | system user | | NULL | Connect | 1848 | Queueing master event to the relay log | NULL | 0 | 0 |
| 6 | system user | | NULL | Connect | 674 | wsrep in pre-commit stage | NULL | 0 | 0 |
+-----+-------------+-----------+--------------------+---------+------+----------------------------------------+--------------------------+-----------+---------------+

And thats the output on backenddb-02:


mysql> show status like 'wsrep%';
+------------------------------+--------------------------------------+
| Variable_name | Value |
+------------------------------+--------------------------------------+
| wsrep_local_state_uuid | a296889f-13fd-11e4-9807-f2390d3c4de7 |
| wsrep_protocol_version | 6 |
| wsrep_last_committed | 3959061 |
| wsrep_replicated | 0 |
| wsrep_replicated_bytes | 0 |
| wsrep_repl_keys | 0 |
| wsrep_repl_keys_bytes | 0 |
| wsrep_repl_data_bytes | 0 |
| wsrep_repl_other_bytes | 0 |
| wsrep_received | 79654 |
| wsrep_received_bytes | 452958986 |
| wsrep_local_commits | 0 |
| wsrep_local_cert_failures | 0 |
| wsrep_local_replays | 0 |
| wsrep_local_send_queue | 0 |
| wsrep_local_send_queue_avg | 0.001200 |
| wsrep_local_recv_queue | 25 |
| wsrep_local_recv_queue_avg | 21744.541661 |
| wsrep_local_cached_downto | 3879872 |
| wsrep_flow_control_paused_ns | 537270776867 |
| wsrep_flow_control_paused | 0.371022 |
| wsrep_flow_control_sent | 133 |
| wsrep_flow_control_recv | 133 |
| wsrep_cert_deps_distance | 2551.067470 |
| wsrep_apply_oooe | 0.000000 |
| wsrep_apply_oool | 0.000000 |
| wsrep_apply_window | 1.000000 |
| wsrep_commit_oooe | 0.000000 |
| wsrep_commit_oool | 0.000000 |
| wsrep_commit_window | 1.000000 |
| wsrep_local_state | 4 |
| wsrep_local_state_comment | Synced |
| wsrep_cert_index_size | 83147 |
| wsrep_causal_reads | 0 |
| wsrep_cert_interval | 0.000000 |
| wsrep_incoming_addresses | 172.20.0.9:3306,172.20.0.8:3306 |
| wsrep_evs_repl_latency | 0/0/0/0/0 |
| wsrep_cluster_conf_id | 4 |
| wsrep_cluster_size | 2 |
| wsrep_cluster_state_uuid | a296889f-13fd-11e4-9807-f2390d3c4de7 |
| wsrep_cluster_status | Primary |
| wsrep_connected | ON |
| wsrep_local_bf_aborts | 0 |
| wsrep_local_index | 0 |
| wsrep_provider_name | Galera |
| wsrep_provider_vendor | Codership Oy <info&#64;codership.com> |
| wsrep_provider_version | 3.6(r3a949e6) |
| wsrep_ready | ON |
+------------------------------+--------------------------------------+
48 rows in set (0.00 sec

Hi, sorry for getting back so late to this. I was busy and then I forgot.
I can see several problems there. In some log the SST fails:

WSREP: Failed to read from: wsrep_sst_xtrabackup-v2 --role ‘donor’ --address ‘172.20.0.9:4444/xtrabackup_sst’ --auth ‘sstuser:s3cretPass’ --socket ‘/var/run/mysqld/mysqld.sock’ --datadir ‘/var/lib/mysql/’ --defaults-file ‘/etc/mysql/my.cnf’ ‘’ --gtid ‘a296889f-13fd-11e4-9807-f2390d3c4de7:3878257’
2014-07-31 20:12:11 31574 [ERROR] WSREP: Process completed with error: wsrep_sst_xtrabackup-v2 --role ‘donor’ --address ‘172.20.0.9:4444/xtrabackup_sst’ --auth ‘sstuser:s3cretPass’ --socket ‘/var/run/mysqld/mysqld.sock’ --datadir ‘/var/lib/mysql/’ --defaults-file ‘/etc/mysql/my.cnf’ ‘’ --gtid ‘a296889f-13fd-11e4-9807-f2390d3c4de7:3878257’: 22 (Invalid argument)
2014-07-31 20:12:11 31574 [ERROR] WSREP: Command did not run: wsrep_sst_xtrabackup-v2 --role ‘donor’ --address ‘172.20.0.9:4444/xtrabackup_sst’ --auth ‘sstuser:s3cretPass’ --socket ‘/var/run/mysqld/mysqld.sock’ --datadir ‘/var/lib/mysql/’ --defaults-file ‘/etc/mysql/my.cnf’ ‘’ --gtid ‘a296889f-13fd-11e4-9807-f2390d3c4de7:3878257’
2014-07-31 20:12:11 31574 [Warning] WSREP: 1.0 (backenddb-01): State transfer to 0.0 (backenddb-02) failed: -22 (Invalid argument)











| wsrep_local_recv_queue | 25 |
| wsrep_local_recv_queue_avg | 21744.541661 |
| wsrep_local_cached_downto | 3879872 |
| wsrep_flow_control_paused_ns | 537270776867 |
| wsrep_flow_control_paused | 0.371022 |
| wsrep_flow_control_sent | 133 |
| wsrep_flow_control_recv | 133

No user created. please read: https://www.percona.com/forums/quest…2689#post42689

my.cnf

# wsrep SST Authentication
wsrep_sst_auth = user:passwd

please read: https://mariadb.com/kb/en/mariadb/ga…wsrep_sst_auth