wsrep rsync IST Operation not permitted - how to troubleshoot?

Hi all,

I have a node that can’t join a cluster as wsrep rsync for IST is not permitted. How can I troubleshoot the issue, there isn’t much else in the logs.

The nodes are on the same IP subnet with no firewall. Both are new Centos 6.3 boxes with the latest Percona RPM’s.

Here is the config and log from the joiner (cluster name, hostname and IP addresses removed for security)

[client]
port=3306
socket=/var/lib/mysql/mysql.sock

[mysqld]
user=mysql
datadir=/db/mysql
port=3306
socket=/var/lib/mysql/mysql.sock
log-error=/db/logs/node2.err
log_slow_queries=/db/logs/node2-slow.log
tmpdir=/db/mysqltmp
binlog_format=ROW
wsrep_provider=/usr/lib64/libgalera_smm.so

wsrep_cluster_address=gcomm://10.10.10.10

wsrep_node_address=gcomm://10.10.10.11
wsrep_slave_threads=2
wsrep_cluster_name=test
wsrep_sst_method=rsync
wsrep_node_name=node2

innodb_locks_unsafe_for_binlog=1
innodb_autoinc_lock_mode=2

[mysqld_safe]
datadir=/db/mysql

121201 22:08:48 [Note] WSREP: Assign initial position for certification: -1, protocol version: -1
121201 22:08:48 [Note] WSREP: wsrep_sst_grab()
121201 22:08:48 [Note] WSREP: Start replication
121201 22:08:48 [Note] WSREP: Setting initial position to 00000000-0000-0000-0000-000000000000:-1
121201 22:08:48 [Note] WSREP: protonet asio version 0
121201 22:08:48 [Note] WSREP: backend: asio
121201 22:08:48 [Note] WSREP: GMCast version 0
121201 22:08:48 [Note] WSREP: (ae6e2cf2-3c03-11e2-0800-c9f2a1c90f4c, ‘tcp://0.0.0.0:4567’) listening at tcp://0.0.0.0:4567
121201 22:08:48 [Note] WSREP: (ae6e2cf2-3c03-11e2-0800-c9f2a1c90f4c, ‘tcp://0.0.0.0:4567’) multicast: , ttl: 1
121201 22:08:48 [Note] WSREP: EVS version 0
121201 22:08:48 [Note] WSREP: PC version 0
121201 22:08:48 [Note] WSREP: gcomm: connecting to group ‘test’, peer ‘10.10.10.10:’
121201 22:08:48 [Note] WSREP: declaring 77ade5d7-3c00-11e2-0800-5270a3596162 stable
121201 22:08:48 [Note] WSREP: view(view_id(PRIM,77ade5d7-3c00-11e2-0800-5270a3596162,2) memb {
77ade5d7-3c00-11e2-0800-5270a3596162,
ae6e2cf2-3c03-11e2-0800-c9f2a1c90f4c,
} joined {
} left {
} partitioned {
})
121201 22:08:49 [Note] WSREP: gcomm: connected
121201 22:08:49 [Note] WSREP: Changing maximum packet size to 64500, resulting msg size: 32636
121201 22:08:49 [Note] WSREP: Shifting CLOSED → OPEN (TO: 0)
121201 22:08:49 [Note] WSREP: Opened channel ‘test’
121201 22:08:49 [Note] WSREP: Waiting for SST to complete.
121201 22:08:49 [Note] WSREP: New COMPONENT: primary = yes, bootstrap = no, my_idx = 1, memb_num = 2
121201 22:08:49 [Note] WSREP: STATE EXCHANGE: Waiting for state UUID.
121201 22:08:49 [Note] WSREP: STATE EXCHANGE: sent state msg: af3f5ae2-3c03-11e2-0800-3f9d70376806
121201 22:08:49 [Note] WSREP: STATE EXCHANGE: got state msg: af3f5ae2-3c03-11e2-0800-3f9d70376806 from 0 (node1)
121201 22:08:49 [Note] WSREP: STATE EXCHANGE: got state msg: af3f5ae2-3c03-11e2-0800-3f9d70376806 from 1 (node2)
121201 22:08:49 [Note] WSREP: Quorum results:
version = 2,
component = PRIMARY,
conf_id = 1,
members = 1/2 (joined/total),
act_id = 1,
last_appl. = -1,
protocols = 0/4/2 (gcs/repl/appl),
group UUID = 77ae90f5-3c00-11e2-0800-eee19d0a75f0
121201 22:08:49 [Note] WSREP: Flow-control interval: [23, 23]
121201 22:08:49 [Note] WSREP: Shifting OPEN → PRIMARY (TO: 1)
121201 22:08:49 [Note] WSREP: State transfer required:
Group state: 77ae90f5-3c00-11e2-0800-eee19d0a75f0:1
Local state: 00000000-0000-0000-0000-000000000000:-1
121201 22:08:49 [Note] WSREP: New cluster view: global state: 77ae90f5-3c00-11e2-0800-eee19d0a75f0:1, view# 2: Primary, number of nodes: 2, my index: 1, protocol version 2
121201 22:08:49 [Warning] WSREP: Gap in state sequence. Need state transfer.
121201 22:08:51 [Note] WSREP: Running: ‘wsrep_sst_rsync --role ‘joiner’ --address ‘gcomm’ --auth ‘’ --datadir ‘/db/mysql/’ --defaults-file ‘/etc/my.cnf’ --parent ‘25005’’
cat: /db/mysql//rsync_sst.pid: No such file or directory
121201 22:08:51 [Note] WSREP: Prepared SST request: rsync|gcomm:4444/rsync_sst
121201 22:08:51 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
121201 22:08:51 [Note] WSREP: Assign initial position for certification: 1, protocol version: 2
121201 22:08:51 [Warning] WSREP: Failed to prepare for incremental state transfer: Local state UUID (00000000-0000-0000-0000-000000000000) does not match group state UUID (77ae90f5-3c00-11e2-0800-eee19d0a75f0): 1 (Operation not permitted)
at galera/src/replicator_str.cpp:prepare_for_IST():440. IST will be unavailable.
121201 22:08:51 [Note] WSREP: Node 1 (node2) requested state transfer from ‘any’. Selected 0 (node1)(SYNCED) as donor.
121201 22:08:51 [Note] WSREP: Shifting PRIMARY → JOINER (TO: 1)
121201 22:08:51 [Note] WSREP: Requesting state transfer: success, donor: 0
121201 22:08:51 [Warning] WSREP: 0 (node1): State transfer to 1 (node2) failed: -1 (Operation not permitted)
121201 22:08:51 [ERROR] WSREP: gcs/src/gcs_group.c:gcs_group_handle_join_msg():712: Will never receive state. Need to abort.
121201 22:08:51 [Note] WSREP: gcomm: terminating thread
121201 22:08:51 [Note] WSREP: gcomm: joining thread
121201 22:08:51 [Note] WSREP: gcomm: closing backend
121201 22:08:52 [Note] WSREP: view(view_id(NON_PRIM,77ade5d7-3c00-11e2-0800-5270a3596162,2 ) memb {
ae6e2cf2-3c03-11e2-0800-c9f2a1c90f4c,
} joined {
} left {
} partitioned {
77ade5d7-3c00-11e2-0800-5270a3596162,
})
121201 22:08:52 [Note] WSREP: view((empty))
121201 22:08:52 [Note] WSREP: gcomm: closed
121201 22:08:52 [Note] WSREP: /usr/sbin/mysqld: Terminated.
121201 22:08:52 mysqld_safe mysqld from pid file /db/mysql/mysql.pid ended
Parent mysqld process (PID:25005) terminated unexpectedly.
WSREP_SST: [INFO] Joiner cleanup. (20121201 22:08:53.613)
done.

Here is the info from the donor:

[client]
port=3306
socket=/var/lib/mysql/mysql.sock

[mysqld]
user=mysql
datadir=/db/mysql
port=3306
socket=/var/lib/mysql/mysql.sock
log-error=/db/logs/node1.err
log_slow_queries=/db/logs/node1-slow.log
tmpdir=/db/mysqltmp
binlog_format=ROW
server-id=1

wsrep_provider=/usr/lib64/libgalera_smm.so

wsrep_cluster_address=gcomm://

wsrep_node_address=gcomm://10.10.10.10
wsrep_slave_threads=2
wsrep_cluster_name=test
wsrep_sst_method=rsync
wsrep_node_name=node1

innodb_locks_unsafe_for_binlog=1
innodb_autoinc_lock_mode=2

[mysqld_safe]
datadir=/db/mysql

121201 22:08:50 [Note] WSREP: New cluster view: global state: 77ae90f5-3c00-11e2-0800-eee19d0a75f0:1, view# 2: Primary, number of nodes: 2, my index: 0, protocol version 2
121201 22:08:50 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
121201 22:08:50 [Note] WSREP: Assign initial position for certification: 1, protocol version: 2
121201 22:08:52 [Note] WSREP: Node 1 (node2) requested state transfer from ‘any’. Selected 0 (node1)(SYNCED) as donor.
121201 22:08:52 [Note] WSREP: Shifting SYNCED → DONOR/DESYNCED (TO: 1)
121201 22:08:52 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
121201 22:08:52 [Note] WSREP: Running: ‘wsrep_sst_rsync --role ‘donor’ --address ‘gcomm:4444/rsync_sst’ --auth ‘(null)’ --socket ‘/var/lib/mysql/mysql.sock’ --datadir ‘/db/mysql/’ --defaults-file ‘/etc/my.cnf’ --gtid ‘77ae90f5-3c00-11e2-0800-eee19d0a75f0:1’’
121201 22:08:52 [Note] WSREP: sst_donor_thread signaled with 0
121201 22:08:52 [Note] WSREP: Flushing tables for SST…
121201 22:08:52 [Note] WSREP: Provider paused at 77ae90f5-3c00-11e2-0800-eee19d0a75f0:1
121201 22:08:52 [Note] WSREP: Tables flushed.
rsync: failed to connect to gcomm: Connection refused (111)
rsync error: error in socket IO (code 10) at clientserver.c(124) [sender=3.0.6]
rsync returned code 10:
121201 22:08:52 [ERROR] WSREP: Failed to read from: wsrep_sst_rsync --role ‘donor’ --address ‘gcomm:4444/rsync_sst’ --auth ‘(null)’ --socket ‘/var/lib/mysql/mysql.sock’ --datadir ‘/db/mysql/’ --defaults-file ‘/etc/my.cnf’ --gtid ‘77ae90f5-3c00-11e2-0800-eee19d0a75f0:1’
121201 22:08:52 [Note] WSREP: Provider resumed.
121201 22:08:52 [ERROR] WSREP: Process completed with error: wsrep_sst_rsync --role ‘donor’ --address ‘gcomm:4444/rsync_sst’ --auth ‘(null)’ --socket ‘/var/lib/mysql/mysql.sock’ --datadir ‘/db/mysql/’ --defaults-file ‘/etc/my.cnf’ --gtid ‘77ae90f5-3c00-11e2-0800-eee19d0a75f0:1’: 255 (Unknown error 255)
121201 22:08:52 [Warning] WSREP: 0 (node1): State transfer to 1 (node2) failed: -1 (Operation not permitted)
121201 22:08:52 [Note] WSREP: Shifting DONOR/DESYNCED → JOINED (TO: 1)
121201 22:08:53 [Note] WSREP: view(view_id(PRIM,77ade5d7-3c00-11e2-0800-5270a3596162,3) memb {
77ade5d7-3c00-11e2-0800-5270a3596162,
} joined {
} left {
} partitioned {
ae6e2cf2-3c03-11e2-0800-c9f2a1c90f4c,
})
121201 22:08:53 [Note] WSREP: forgetting ae6e2cf2-3c03-11e2-0800-c9f2a1c90f4c (tcp://10.10.10.11:4567)
121201 22:08:53 [Note] WSREP: Member 0 (node1) synced with group.
121201 22:08:53 [Note] WSREP: Shifting JOINED → SYNCED (TO: 1)
121201 22:08:53 [Note] WSREP: New COMPONENT: primary = yes, bootstrap = no, my_idx = 0, memb_num = 1
121201 22:08:53 [Note] WSREP: STATE_EXCHANGE: sent state UUID: b1ad3ae3-3c03-11e2-0800-9d0489fd1a38
121201 22:08:53 [Note] WSREP: STATE EXCHANGE: sent state msg: b1ad3ae3-3c03-11e2-0800-9d0489fd1a38
121201 22:08:53 [Note] WSREP: STATE EXCHANGE: got state msg: b1ad3ae3-3c03-11e2-0800-9d0489fd1a38 from 0 (node1)
121201 22:08:53 [Note] WSREP: Quorum results:
version = 2,
component = PRIMARY,
conf_id = 2,
members = 1/1 (joined/total),
act_id = 1,
last_appl. = 0,
protocols = 0/4/2 (gcs/repl/appl),
group UUID = 77ae90f5-3c00-11e2-0800-eee19d0a75f0
121201 22:08:53 [Note] WSREP: Flow-control interval: [16, 16]
121201 22:08:53 [Note] WSREP: Synchronized with group, ready for connections
121201 22:08:53 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
121201 22:08:53 [Note] WSREP: New cluster view: global state: 77ae90f5-3c00-11e2-0800-eee19d0a75f0:1, view# 3: Primary, number of nodes: 1, my index: 0, protocol version 2
121201 22:08:53 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
121201 22:08:53 [Note] WSREP: Assign initial position for certification: 1, protocol version: 2
121201 22:08:58 [Note] WSREP: cleaning up ae6e2cf2-3c03-11e2-0800-c9f2a1c90f4c (tcp://10.10.10.11:4567)

Thanks,
Frank

I think this was resolved by removing

wsrep_cluster_name