Node refuses to re-enter cluster

Hello,

I have setup up a 5 node Galera cluster with Percona xtradb. The names of the servers are galera01-05. I bootstrapped the cluster, and added all the nodes. I then proceeded to dump back in a mysql dump. Everything seemed to be working fine. To test ( as I had problems before ) I decided to restart a node ( service mysql restart ) to see if it could survive the falling out and joining back. It was not able to and now there is nothing I can do to get it back in. Even removing everything from the server and reseting it backup it will not be able to join the cluster again.

So from galera02 ( 10.173) to galera05 ( it chose it as the doner 10.177 ) : here is the error log on galera05
140602 10:13:16 [Note] WSREP: Quorum results:
version = 2,
component = PRIMARY,
conf_id = 76,
members = 4/5 (joined/total),
act_id = 15422,
last_appl. = 15147,
protocols = 0/4/2 (gcs/repl/appl),
group UUID = 655d5286-e9f4-11e3-9ad3-a7361a15dc8a
140602 10:13:16 [Note] WSREP: Flow-control interval: [36, 36]
140602 10:13:16 [Note] WSREP: New cluster view: global state: 655d5286-e9f4-11e3-9ad3-a7361a15dc8a:15422, view# 77: Primary, number of nodes: 5, my index: 1, protocol version 2
140602 10:13:16 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
140602 10:13:16 [Note] WSREP: Assign initial position for certification: 15422, protocol version: 2
140602 10:13:18 [Note] WSREP: Node 0 (Galera02) requested state transfer from ‘any’. Selected 1 (galera05)(SYNCED) as donor.
140602 10:13:18 [Note] WSREP: Shifting SYNCED → DONOR/DESYNCED (TO: 15422)
140602 10:13:18 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
140602 10:13:18 [Note] WSREP: Running: ‘wsrep_sst_xtrabackup --role ‘donor’ --address ‘10.10.10.173:4444/xtrabackup_sst’ --auth ‘root:goingforbroke’ --socket ‘/var/lib/mysql/mysql.sock’ --datadir ‘/var/lib/mysql/’ --defaults-file ‘/etc/my.cnf’ --gtid ‘655d5286-e9f4-11e3-9ad3-a7361a15dc8a:15422’’
140602 10:13:18 [Note] WSREP: sst_donor_thread signaled with 0
WSREP_SST: [INFO] Streaming with tar (20140602 10:13:18.504)
WSREP_SST: [INFO] Using socat as streamer (20140602 10:13:18.505)
WSREP_SST: [INFO] Streaming the backup to joiner at 10.10.10.173 4444 (20140602 10:13:18.511)
WSREP_SST: [INFO] Evaluating innobackupex --defaults-file=/etc/my.cnf $INNOEXTRA --galera-info --stream=$sfmt ${TMPDIR} 2>${DATA}/innobackup.backup.log | socat -u stdio TCP:10.10.10.173:4444; RC=( ${PIPESTATUS[@]} ) (20140602 10:13:18.512)
140602 10:13:20 [Note] WSREP: declaring 5a4b91d1-e9fa-11e3-ac44-82a35eb07b16 stable
140602 10:13:20 [Note] WSREP: declaring 6cea84c0-e9fa-11e3-902a-c37fd410e57e stable
140602 10:13:20 [Note] WSREP: declaring d645de99-e9f7-11e3-8294-eec3f6cb9a92 stable
140602 10:13:20 [Note] WSREP: (5327be86-e9fa-11e3-8fa6-f628e7172e46, ‘tcp://0.0.0.0:4567’) address ‘tcp://10.10.10.177:4567’ pointing to uuid 5327be86-e9fa-11e3-8fa6-f628e7172e46 is blacklisted, skipping
140602 10:13:20 [Note] WSREP: (5327be86-e9fa-11e3-8fa6-f628e7172e46, ‘tcp://0.0.0.0:4567’) address ‘tcp://10.10.10.177:4567’ pointing to uuid 5327be86-e9fa-11e3-8fa6-f628e7172e46 is blacklisted, skipping
140602 10:13:20 [Note] WSREP: (5327be86-e9fa-11e3-8fa6-f628e7172e46, ‘tcp://0.0.0.0:4567’) address ‘tcp://10.10.10.177:4567’ pointing to uuid 5327be86-e9fa-11e3-8fa6-f628e7172e46 is blacklisted, skipping
140602 10:13:20 [Note] WSREP: (5327be86-e9fa-11e3-8fa6-f628e7172e46, ‘tcp://0.0.0.0:4567’) address ‘tcp://10.10.10.177:4567’ pointing to uuid 5327be86-e9fa-11e3-8fa6-f628e7172e46 is blacklisted, skipping
140602 10:13:20 [Note] WSREP: (5327be86-e9fa-11e3-8fa6-f628e7172e46, ‘tcp://0.0.0.0:4567’) address ‘tcp://10.10.10.177:4567’ pointing to uuid 5327be86-e9fa-11e3-8fa6-f628e7172e46 is blacklisted, skipping
140602 10:13:20 [Note] WSREP: (5327be86-e9fa-11e3-8fa6-f628e7172e46, ‘tcp://0.0.0.0:4567’) address ‘tcp://10.10.10.177:4567’ pointing to uuid 5327be86-e9fa-11e3-8fa6-f628e7172e46 is blacklisted, skipping
140602 10:13:20 [Note] WSREP: (5327be86-e9fa-11e3-8fa6-f628e7172e46, ‘tcp://0.0.0.0:4567’) turning message relay requesting on, nonlive peers: tcp://10.10.10.173:4567
140602 10:13:20 [Note] WSREP: (5327be86-e9fa-11e3-8fa6-f628e7172e46, ‘tcp://0.0.0.0:4567’) address ‘tcp://10.10.10.177:4567’ pointing to uuid 5327be86-e9fa-11e3-8fa6-f628e7172e46 is blacklisted, skipping
140602 10:13:20 [Note] WSREP: (5327be86-e9fa-11e3-8fa6-f628e7172e46, ‘tcp://0.0.0.0:4567’) address ‘tcp://10.10.10.177:4567’ pointing to uuid 5327be86-e9fa-11e3-8fa6-f628e7172e46 is blacklisted, skipping
140602 10:13:20 [Note] WSREP: (5327be86-e9fa-11e3-8fa6-f628e7172e46, ‘tcp://0.0.0.0:4567’) address ‘tcp://10.10.10.177:4567’ pointing to uuid 5327be86-e9fa-11e3-8fa6-f628e7172e46 is blacklisted, skipping
140602 10:13:20 [Note] WSREP: (5327be86-e9fa-11e3-8fa6-f628e7172e46, ‘tcp://0.0.0.0:4567’) address ‘tcp://10.10.10.177:4567’ pointing to uuid 5327be86-e9fa-11e3-8fa6-f628e7172e46 is blacklisted, skipping
140602 10:13:20 [Note] WSREP: (5327be86-e9fa-11e3-8fa6-f628e7172e46, ‘tcp://0.0.0.0:4567’) address ‘tcp://10.10.10.177:4567’ pointing to uuid 5327be86-e9fa-11e3-8fa6-f628e7172e46 is blacklisted, skipping
140602 10:13:20 [Note] WSREP: (5327be86-e9fa-11e3-8fa6-f628e7172e46, ‘tcp://0.0.0.0:4567’) address ‘tcp://10.10.10.177:4567’ pointing to uuid 5327be86-e9fa-11e3-8fa6-f628e7172e46 is blacklisted, skipping
140602 10:13:21 [Note] WSREP: Node 5327be86-e9fa-11e3-8fa6-f628e7172e46 state prim
140602 10:13:21 [Note] WSREP: view(view_id(PRIM,5327be86-e9fa-11e3-8fa6-f628e7172e46,78) memb {
5327be86-e9fa-11e3-8fa6-f628e7172e46,
5a4b91d1-e9fa-11e3-ac44-82a35eb07b16,
6cea84c0-e9fa-11e3-902a-c37fd410e57e,
d645de99-e9f7-11e3-8294-eec3f6cb9a92,
} joined {
} left {
} partitioned {
2f0f5c49-ea79-11e3-b5d0-9f1f1821a243,
})
140602 10:13:21 [Note] WSREP: forgetting 2f0f5c49-ea79-11e3-b5d0-9f1f1821a243 (tcp://10.10.10.173:4567)
140602 10:13:21 [Note] WSREP: New COMPONENT: primary = yes, bootstrap = no, my_idx = 0, memb_num = 4
140602 10:13:21 [Note] WSREP: (5327be86-e9fa-11e3-8fa6-f628e7172e46, ‘tcp://0.0.0.0:4567’) address ‘tcp://10.10.10.177:4567’ pointing to uuid 5327be86-e9fa-11e3-8fa6-f628e7172e46 is blacklisted, skipping
140602 10:13:21 [Note] WSREP: (5327be86-e9fa-11e3-8fa6-f628e7172e46, ‘tcp://0.0.0.0:4567’) address ‘tcp://10.10.10.177:4567’ pointing to uuid 5327be86-e9fa-11e3-8fa6-f628e7172e46 is blacklisted, skipping
140602 10:13:21 [Note] WSREP: (5327be86-e9fa-11e3-8fa6-f628e7172e46, ‘tcp://0.0.0.0:4567’) address ‘tcp://10.10.10.177:4567’ pointing to uuid 5327be86-e9fa-11e3-8fa6-f628e7172e46 is blacklisted, skipping
140602 10:13:21 [Note] WSREP: (5327be86-e9fa-11e3-8fa6-f628e7172e46, ‘tcp://0.0.0.0:4567’) turning message relay requesting off
140602 10:13:21 [Note] WSREP: STATE_EXCHANGE: sent state UUID: 32bf19dc-ea79-11e3-abb3-8fa6dbc1e13d
140602 10:13:21 [Note] WSREP: STATE EXCHANGE: sent state msg: 32bf19dc-ea79-11e3-abb3-8fa6dbc1e13d
140602 10:13:21 [Note] WSREP: STATE EXCHANGE: got state msg: 32bf19dc-ea79-11e3-abb3-8fa6dbc1e13d from 0 (galera05)
140602 10:13:21 [Note] WSREP: STATE EXCHANGE: got state msg: 32bf19dc-ea79-11e3-abb3-8fa6dbc1e13d from 1 (galera04)
140602 10:13:21 [Note] WSREP: STATE EXCHANGE: got state msg: 32bf19dc-ea79-11e3-abb3-8fa6dbc1e13d from 2 (galera03)
140602 10:13:21 [Note] WSREP: STATE EXCHANGE: got state msg: 32bf19dc-ea79-11e3-abb3-8fa6dbc1e13d from 3 (galera01)
140602 10:13:21 [Note] WSREP: Quorum results:
version = 2,
component = PRIMARY,
conf_id = 77,
members = 4/4 (joined/total),
act_id = 15422,
last_appl. = 15147,
protocols = 0/4/2 (gcs/repl/appl),
group UUID = 655d5286-e9f4-11e3-9ad3-a7361a15dc8a
140602 10:13:21 [Note] WSREP: Flow-control interval: [32, 32]
140602 10:13:21 [Note] WSREP: New cluster view: global state: 655d5286-e9f4-11e3-9ad3-a7361a15dc8a:15422, view# 78: Primary, number of nodes: 4, my index: 0, protocol version 2
140602 10:13:21 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
140602 10:13:21 [Note] WSREP: Assign initial position for certification: 15422, protocol version: 2
140602 10:13:26 [Note] WSREP: cleaning up 2f0f5c49-ea79-11e3-b5d0-9f1f1821a243 (tcp://10.10.10.173:4567)

I don’t understand what is going on here, help!

Thank You.

This is the log from donor, but what happened on joiner (10.10.10.173)? Paste the err log from it too. Also, is the 4444 TCP port open between them?