wsrep_sst_xtrabackup fails after update

Hi all,

I just built a new cluster with Percona XtraDB Cluster 5.5.27-23.6.356 with Galera 2.0-1.114 and after noticed there is an update for Galera and Percona and updated via yum. But It didn’t work after update. (Galera to 2.0-1.117 and Percona to 5.5.28-23.7.369)

I saw error on mysqld error log and added -x parameter to wsrep_sst_xtrabackup script for detailed output because of there is no sst.err file. The error file looks like this;

121123 03:55:49 mysqld_safe Starting mysqld daemon with databases from /var/lib/mysql121123 03:55:49 mysqld_safe WSREP: Running position recovery with --log_error=/tmp/tmp.EbKnfl9ZbJ121123 03:56:01 mysqld_safe WSREP: Recovered position 00000000-0000-0000-0000-000000000000:-1121123 3:56:01 [Note] WSREP: wsrep_start_position var submitted: '00000000-0000-0000-0000-000000000000:-1’121123 3:56:01 [Note] WSREP: Setting wsrep_ready to 0121123 3:56:01 [Note] WSREP: Read nil XID from storage engines, skipping position init121123 3:56:01 [Note] WSREP: wsrep_load(): loading provider library '/usr/lib64/libgalera_smm.so’121123 3:56:01 [Note] WSREP: wsrep_load(): Galera 2.2(r137) by Codership Oy <info@codership.com> loaded succesfully.121123 3:56:01 [Warning] WSREP: Could not open saved state file for reading: /var/lib/mysql//grastate.dat121123 3:56:01 [Note] WSREP: Found saved state: 00000000-0000-0000-0000-000000000000:-1121123 3:56:01 [Note] WSREP: Preallocating 134219048/134219048 bytes in ‘/var/lib/mysql//galera.cache’…121123 3:56:02 [Note] WSREP: Passing config to GCS: base_host = 10.10.10.97; base_port = 4567; cert.log_conflicts = no; gcache.dir = /var/lib/mysql/; gcache.keep_pages_size = 0; gcache.mem_size = 0; gcache.name = /var/lib/mysql//galera.cache; gcache.page_size = 128M; gcache.size = 128M; gcs.fc_debug = 0; gcs.fc_factor = 1; gcs.fc_limit = 16; gcs.fc_master_slave = NO; gcs.max_packet_size = 64500; gcs.max_throttle = 0.25; gcs.recv_q_hard_limit = 9223372036854775807; gcs.recv_q_soft_limit = 0.25; gcs.sync_donor = NO; gmcast.listen_addr = tcp://10.10.10.97:4567; replicator.causal_read_timeout = PT30S; replicator.commit_order = 3121123 3:56:02 [Note] WSREP: Assign initial position for certification: -1, protocol version: -1121123 3:56:02 [Note] WSREP: wsrep_sst_grab()121123 3:56:02 [Note] WSREP: Start replication121123 3:56:02 [Note] WSREP: Setting initial position to 00000000-0000-0000-0000-000000000000:-1121123 3:56:02 [Note] WSREP: protonet asio version 0121123 3:56:02 [Note] WSREP: backend: asio121123 3:56:02 [Note] WSREP: GMCast version 0121123 3:56:02 [Note] WSREP: (fd2457ea-3553-11e2-0800-74f8f44a0399, ‘tcp://10.10.10.97:4567’) listening at tcp://10.10.10.97:4567121123 3:56:02 [Note] WSREP: (fd2457ea-3553-11e2-0800-74f8f44a0399, ‘tcp://10.10.10.97:4567’) multicast: , ttl: 1121123 3:56:02 [Note] WSREP: EVS version 0121123 3:56:02 [Note] WSREP: PC version 0121123 3:56:02 [Note] WSREP: gcomm: connecting to group ‘mycluster’, peer '10.10.10.93:4567’121123 3:56:02 [Note] WSREP: declaring dca119ce-3553-11e2-0800-c1735c61782a stable121123 3:56:02 [Note] WSREP: view(view_id(PRIM,dca119ce-3553-11e2-0800-c1735c61782a,2) memb { dca119ce-3553-11e2-0800-c1735c61782a, fd2457ea-3553-11e2-0800-74f8f44a0399,} joined {} left {} partitioned {})121123 3:56:03 [Note] WSREP: gcomm: connected121123 3:56:03 [Note] WSREP: Changing maximum packet size to 64500, resulting msg size: 32636121123 3:56:03 [Note] WSREP: Shifting CLOSED → OPEN (TO: 0)121123 3:56:03 [Note] WSREP: Opened channel 'mycluster’121123 3:56:03 [Note] WSREP: New COMPONENT: primary = yes, bootstrap = no, my_idx = 1, memb_num = 2121123 3:56:03 [Note] WSREP: Waiting for SST to complete.121123 3:56:03 [Note] WSREP: STATE EXCHANGE: Waiting for state UUID.121123 3:56:03 [Note] WSREP: STATE EXCHANGE: sent state msg: fd714063-3553-11e2-0800-a3bd813314a4121123 3:56:03 [Note] WSREP: STATE EXCHANGE: got state msg: fd714063-3553-11e2-0800-a3bd813314a4 from 0 (db1)121123 3:56:03 [Note] WSREP: STATE EXCHANGE: got state msg: fd714063-3553-11e2-0800-a3bd813314a4 from 1 (db2)121123 3:56:03 [Note] WSREP: Quorum results: version = 2, component = PRIMARY, conf_id = 1, members = 1/2 (joined/total), act_id = 0, last_appl. = -1, protocols = 0/4/2 (gcs/repl/appl), group UUID = dca1ac1e-3553-11e2-0800-02b6027fc6c1121123 3:56:03 [Note] WSREP: Flow-control interval: [23, 23]121123 3:56:03 [Note] WSREP: Shifting OPEN → PRIMARY (TO: 0)121123 3:56:03 [Note] WSREP: State transfer required: Group state: dca1ac1e-3553-11e2-0800-02b6027fc6c1:0 Local state: 00000000-0000-0000-0000-000000000000:-1121123 3:56:03 [Note] WSREP: New cluster view: global state: dca1ac1e-3553-11e2-0800-02b6027fc6c1:0, view# 2: Primary, number of nodes: 2, my index: 1, protocol version 2121123 3:56:03 [Warning] WSREP: Gap in state sequence. Need state transfer.121123 3:56:03 [Note] WSREP: Setting wsrep_ready to 0121123 3:56:03 [Note] WSREP: [debug]: closing client connections for PRIM121123 3:56:05 [Note] WSREP: waiting for client connections to close: 2121123 3:56:05 [Note] WSREP: Running: ‘wsrep_sst_xtrabackup --role ‘joiner’ --address ‘10.10.10.97’ --auth ‘sstuser:sstpwd’ --datadir ‘/var/lib/mysql/’ --defaults-file ‘/etc/mysql/wsrep.cnf’ --parent ‘8621’’+ TMPDIR=/tmp++ dirname /usr//bin/wsrep_sst_xtrabackup+ . /usr//bin/wsrep_sst_common++ set -u++ WSREP_SST_OPT_BYPASS=0++ ‘[’ 12 -gt 0 ‘]’++ case “$1” in++ readonly WSREP_SST_OPT_ROLE=joiner++ WSREP_SST_OPT_ROLE=joiner++ shift++ shift++ ‘[’ 10 -gt 0 ‘]’++ case “$1” in++ readonly WSREP_SST_OPT_ADDR=10.10.10.97++ WSREP_SST_OPT_ADDR=10.10.10.97++ shift++ shift++ ‘[’ 8 -gt 0 ‘]’++ case “$1” in++ readonly WSREP_SST_OPT_AUTH=sstuser:sstpwd++ WSREP_SST_OPT_AUTH=sstuser:sstpwd++ shift++ shift++ ‘[’ 6 -gt 0 ‘]’++ case “$1” in++ readonly WSREP_SST_OPT_DATA=/var/lib/mysql/++ WSREP_SST_OPT_DATA=/var/lib/mysql/++ shift++ shift++ ‘[’ 4 -gt 0 ‘]’++ case “$1” in++ readonly WSREP_SST_OPT_CONF=/etc/mysql/wsrep.cnf++ WSREP_SST_OPT_CONF=/etc/mysql/wsrep.cnf++ shift++ shift++ ‘[’ 2 -gt 0 ‘]’++ case “$1” in++ readonly WSREP_SST_OPT_PARENT=8621++ WSREP_SST_OPT_PARENT=8621++ shift++ shift++ ‘[’ 0 -gt 0 ‘]’++ readonly WSREP_SST_OPT_BYPASS+ INNOBACKUPEX_BIN=innobackupex+ INNOBACKUPEX_ARGS=+ NC_BIN=nc+ for TOOL_BIN in INNOBACKUPEX_BIN NC_BIN+ which innobackupex+ for TOOL_BIN in INNOBACKUPEX_BIN NC_BIN+ which nc+ AUTH=(${WSREP_SST_OPT_AUTH//:confused: })+ readonly AUTH+ readonly DATA=/var/lib/mysql/+ DATA=/var/lib/mysql/+ INFO_FILE=xtrabackup_galera_info+ IST_FILE=xtrabackup_ist+ MAGIC_FILE=/var/lib/mysql//xtrabackup_galera_info+ rm -f /var/lib/mysql//xtrabackup_galera_info+ ‘[’ joiner = donor ‘]’+ ‘[’ joiner = joiner ‘]’+ MODULE=xtrabackup_sst+ rm -f ‘/var/lib/mysql//xtrabackup_'+ ADDR=10.10.10.97++ echo 10.10.10.97++ awk -F : ‘{ print $2 }’+ NC_PORT=+ ‘[’ -z ‘’ ‘]’+ NC_PORT=4444++ echo 10.10.10.97++ awk -F : ‘{ print $1 }’+ ADDR=10.10.10.97:4444+ trap cleanup_joiner HUP PIPE INT TERM+ set +e+ wait_for_nc 4444 10.10.10.97:4444 xtrabackup_sst+ local PORT=4444+ local ADDR=10.10.10.97:4444+ local MODULE=xtrabackup_sst+ nc -dl 4444+ tar xfi - -C /var/lib/mysql/++ seq 1 50+ for i in ‘$(seq 1 50)’+ netstat -nptl+ grep '/nc\s$’+ awk ‘{ print $4 }’+ sed ‘s/.*://’+ grep ‘^4444$’+ break+ echo 'ready 10.10.10.97:4444/xtrabackup_sst’121123 3:56:05 [Note] WSREP: Prepared SST request: xtrabackup|10.10.10.97:4444/xtrabackup_sst121123 3:56:05 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.121123 3:56:05 [Note] WSREP: Assign initial position for certification: 0, protocol version: 2121123 3:56:05 [Warning] WSREP: Failed to prepare for incremental state transfer: Local state UUID (00000000-0000-0000-0000-000000000000) does not match group state UUID (dca1ac1e-3553-11e2-0800-02b6027fc6c1): 1 (Operation not permitted) at galera/src/replicator_str.cpp:prepare_for_IST():440. IST will be unavailable.121123 3:56:05 [Note] WSREP: Node 1 (db2) requested state transfer from ‘any’. Selected 0 (db1)(SYNCED) as donor.121123 3:56:05 [Note] WSREP: Shifting PRIMARY → JOINER (TO: 0)121123 3:56:05 [Note] WSREP: Requesting state transfer: success, donor: 0tar: Skipping to next headertar: Exiting with failure status due to previous errors+ RC=(“${PIPESTATUS[@]}”)+ set -e+ wait %%+ ‘[’ 0 -ne 0 -o 2 -ne 0 ‘]’+ wsrep_log_error 'Error while getting st data from donor node: ’ ‘0, 2’+ wsrep_log ‘[ERROR] Error while getting st data from donor node: 0, 2’++ date ‘+%Y%m%d %H:%M:%S.%N’++ cut -b -21+ local readonly ‘tst=20121123 03:56:15.216’+ echo 'WSREP_SST: [ERROR] Error while getting st data from donor node: 0, 2 (20121123 03:56:15.216)'WSREP_SST: [ERROR] Error while getting st data from donor node: 0, 2 (20121123 03:56:15.216)+ exit 32121123 3:56:15 [ERROR] WSREP: Process completed with error: wsrep_sst_xtrabackup --role ‘joiner’ --address ‘10.10.10.97’ --auth ‘sstuser:sstpwd’ --datadir ‘/var/lib/mysql/’ --defaults-file ‘/etc/mysql/wsrep.cnf’ --parent ‘8621’: 32 (Broken pipe)121123 3:56:15 [ERROR] WSREP: Failed to read uuid:seqno from joiner script.121123 3:56:15 [ERROR] WSREP: SST failed: 32 (Broken pipe)121123 3:56:15 [ERROR] Aborting121123 3:56:15 [Warning] WSREP: 0 (db1): State transfer to 1 (db2) failed: -1 (Operation not permitted)121123 3:56:15 [ERROR] WSREP: gcs/src/gcs_group.c:gcs_group_handle_join_msg():712: Will never receive state. Need to abort.121123 3:56:15 [Note] WSREP: gcomm: terminating thread121123 3:56:15 [Note] WSREP: gcomm: joining thread121123 3:56:15 [Note] WSREP: gcomm: closing backend121123 3:56:16 [Note] WSREP: view(view_id(NON_PRIM,dca119ce-3553-11e2-0800-c1735c61782a,2) memb { fd2457ea-3553-11e2-0800-74f8f44a0399,} joined {} left {} partitioned { dca119ce-3553-11e2-0800-c1735c61782a,})121123 3:56:16 [Note] WSREP: view((empty))121123 3:56:16 [Note] WSREP: gcomm: closed121123 3:56:16 [Note] WSREP: /usr/sbin/mysqld: Terminated.121123 03:56:16 mysqld_safe mysqld from pid file /var/lib/mysql/db2.localhost.pid ended

My wsrep galera configuration is on 2nd node;

[mysqld]binlog_format=ROWdefault-storage-engine=innodbinnodb_autoinc_lock_mode=2innodb_locks_unsafe_for_binlog=1query_cache_size=0query_cache_type=0wsrep_provider=/usr/lib64/libgalera_smm.sowsrep_provider_options="gmcast.listen_addr=tcp://10.10.10.97:4567"wsrep_cluster_name="mycluster"wsrep_cluster_address="gcomm://10.10.10.93:4567"wsrep_node_name=db2#can’t detect bonded interfaceswsrep_node_address=10.70.148.97#wsrep_node_incoming_address=wsrep_slave_threads=1wsrep_certify_nonPK=1wsrep_max_ws_rows=131072wsrep_max_ws_size=1073741824wsrep_debug=1wsrep_convert_LOCK_to_trx=0wsrep_retry_autocommit=1wsrep_auto_increment_control=1wsrep_drupal_282555_workaround=0wsrep_causal_reads=0wsrep_notify_cmd=wsrep_sst_method=xtrabackupwsrep_sst_receive_address=10.10.10.97wsrep_sst_auth=sstuser:sstpwd#wsrep_sst_donor=#wsrep_sst_donor_rejects_queries=0# wsrep_protocol_version=

First node1 IP is 10.10.10.93 and node2 IP is 10.10.10.97

Actually same configuration works well with sst_rsync but doesn’t work with xtrabackup.

I think the problem depends ethernet bonding but I couldn’t find what’s wrong with me.

Is there anybody can see something wrong?

Thanks.