WSREP: 1.0 State transfer to 0.0 failed: -1 (Operation not permitted)

I’m not exactly sure where to post this… I have 3 servers in a galera cluster using mariadb and xtrabackup. Basically, everything starts out just fine. with a fresh install, I can get all 3 servers up and running and replicating, but when i stop the mysql (mariadb) service on one node, add a database on one of the two remaining nodes, and start the mysql service up again, replication fails. to my understanding, the third node should see that its contents differ from the other two in the cluster and start an sst transfer.

server config files and error logs are below. I’ve been scouring the web and can’t figure out what the problem is. could someone with more experience or knowledge lend me some advice? thank you in advance. if you need any more info let me know.

the firewall is allowing traffic between the servers on ports 3066, 4444, 4567, and 4568 and selinux is either setenforce 0 or disabled.
the cluster is hosted on centos 6 machines and the /etc/my.cnf.d/server.cnf is identical on all servers save for ip address information.

node1:
wsrep_node_address=10.1.102.163
wsrep_cluster_address=gcomm://10.1.102.164,10.1.102.165

node2:
wsrep_node_address=10.1.102.164
wsrep_cluster_address=gcomm://10.1.102.163,10.1.102.165

node3:
wsrep_node_address=10.1.102.165
wsrep_cluster_address=gcomm://10.1.102.163,10.1.102.164


/etc/my.cnf.d/server.cnf

this is read by the standalone daemon and embedded servers

[server]

[sst]
streamfmt=xbstream

this is only for the mysqld standalone daemon

[mysqld]
datadir=/var/lib/mysql
user=mysql
wsrep_provider=/usr/lib64/galera/libgalera_smm.so
wsrep_cluster_address=gcomm://10.1.102.163,10.1.102.164
binlog_format=ROW
default_storage_engine=InnoDB
innodb_autoinc_lock_mode=2
wsrep_node_address=10.1.102.165
wsrep_cluster_name=“galera_test”
wsrep_sst_auth=galera:password
wsrep_sst_method=xtrabackup

this is only for embedded server

[embedded]
[mysqld-5.5]

These two groups are only read by MariaDB servers, not by MySQL.

If you use the same .cnf file for MySQL and MariaDB,

you can put MariaDB-only options here

[mariadb]

[mariadb-5.5]

/var/lib/mysql/galera*err is attached.

galera.txt (7.95 KB)

I can see several problems here, first the global and local state does not match, hence the IST is impossible:

140417 15:27:22 [Note] WSREP: State transfer required:
Group state: 49954f2d-c4b4-11e3-bf16-7ead4aa0c168:24
Local state: 00000000-0000-0000-0000-000000000000:-1

Why the local state GTID is empty? We can see earlier:
140417 15:27:21 mysqld_safe WSREP: Recovered position 49954f2d-c4b4-11e3-bf16-7ead4aa0c168:23
but then:
140417 15:27:21 [Note] WSREP: Found saved state: 00000000-0000-0000-0000-000000000000:-1
which is weird and maybe an effect of some bug. Not sure about MariaDB Cluster but haven’t seen such issue in PXC.

Then, we can see also SST fails:

^Gxbstream: Can’t create/write to file ‘./backup-my.cnf’ (Errcode: 17)
xbstream: failed to create file.
WSREP_SST: [ERROR] Xbstream failed (20140417 15:27:32.933)
WSREP_SST: [ERROR] Data directory /var/lib/mysql/ needs to be empty for SST: lp:1193240 Manual intervention required in that case (20140417 15:27:32.936)
WSREP_SST: [ERROR] Cleanup after exit with status:32 (20140417 15:27:32.938)
WSREP_SST: [ERROR] Killing nc pid (20140417 15:27:32.963)
WSREP_SST: [INFO] Removing the sst_in_progress file (20140417 15:27:32.966)
140417 15:27:32 [ERROR] WSREP: Process completed with error: wsrep_sst_xtrabackup --role ‘joiner’ --address ‘10.1.102.165’ --auth ‘galera:password’ --datadir ‘/var/lib/mysql/’ --defaults-file ‘/etc/my.cnf’ --parent ‘31147’: 32 (Broken pipe)

Check the /var/lib/mysql/ directory permissions and also the user/password used for SST auth on all donors.
After those are checked/fixed, clear out the datadir and try to restart that node again - SST should succeed.