Impossible to join primary node

2 nodes percona cluster 5.6.24-72.2 + garbd
direct cable between eth1 of two nodes,
iptables disabled,

Node 2 is running perfectly
I installed the new release 5.6.24-72.2

Node 1 went down one night
I used the down time period to shrink the system partition and install the new release

What I have been trying
a. cleaning the content of /var/lib/mysql folder and starting the mysql server
node 1 stop
service mysql start


2015-06-22 18:05:54 154050 [Note] WSREP: State transfer required:
Group state: db960768-171f-11e5-847c-3349410895e7:4357233
Local state: 00000000-0000-0000-0000-000000000000:-1
2015-06-22 18:05:54 154050 [Note] WSREP: New cluster view: global state: db960768-171f-11e5-847c-3349410895e7:4357233, view# 4: Primary, number of nodes: 2, my index: 0, protocol version 3
2015-06-22 18:05:54 154050 [Warning] WSREP: Gap in state sequence. Need state transfer.
2015-06-22 18:05:54 154050 [Note] WSREP: Running: 'wsrep_sst_xtrabackup-v2 --role ‘joiner’ --address ‘172.18.172.145’ --auth ‘sst_user:umi_tss_20131205’ --datadir ‘/var/lib/mysql/’ --defaults-file ‘/etc/my.cnf’ --defaults-group-suffix ‘’ --parent ‘154050’ ‘’ ’
WSREP_SST: [INFO] Streaming with xbstream (20150622 18:05:55.528)
WSREP_SST: [INFO] Using socat as streamer (20150622 18:05:55.532)
WSREP_SST: [INFO] Xtrabackup based encryption enabled in my.cnf - Supported only from Xtrabackup 2.1.4 (20150622 18:05:55.572)
WSREP_SST: [INFO] Evaluating timeout -s9 100 socat -u TCP-LISTEN:4444,reuseaddr stdio | pv -f -i 10 -N joiner 2>>/var/log/mysql-sst-progress | xbcrypt --encrypt-algo=AES256 --encrypt-key=uo1zoo2ALoothaingookow7sho4eot4a -d | xbstream -x; RC=( ${PIPESTATUS[@]} ) (20150622 18:05:55.582)
2015-06-22 18:05:55 154050 [Note] WSREP: Prepared SST request: xtrabackup-v2|172.18.172.145:4444/xtrabackup_sst//1
Warning: Using a password on the command line interface can be insecure.
ERROR 2003 (HY000): Can’t connect to MySQL server on ‘127.0.0.1’ (111)
2015-06-22 18:05:55 154050 [Note] WSREP: REPL Protocols: 7 (3, 2)
2015-06-22 18:05:55 154050 [Note] WSREP: Service thread queue flushed.
2015-06-22 18:05:55 154050 [Note] WSREP: Assign initial position for certification: 4357233, protocol version: 3
2015-06-22 18:05:55 154050 [Note] WSREP: Service thread queue flushed.
2015-06-22 18:05:55 154050 [Warning] WSREP: Failed to prepare for incremental state transfer: Local state UUID (00000000-0000-0000-0000-000000000000) does not match group state UUID (db960768-171f-11e5-847c-3349410895e7): 1 (Operation not permitted)
at galera/src/replicator_str.cpp:prepare_for_IST():463. IST will be unavailable.
2015-06-22 18:05:55 154050 [Warning] WSREP: Member 0.0 (icts-zabbix01) requested state transfer from ‘icts-zabbix02’, but it is impossible to select State Transfer donor: No route to host
2015-06-22 18:05:55 154050 [ERROR] WSREP: Requesting state transfer failed: -113(No route to host)
2015-06-22 18:05:55 154050 [ERROR] WSREP: State transfer request failed unrecoverably: 113 (No route to host). Most likely it is due to inability to communicate with the cluster primary component. Restart required.
2015-06-22 18:05:55 154050 [Note] WSREP: Closing send monitor…
2015-06-22 18:05:55 154050 [Note] WSREP: Closed send monitor.
2015-06-22 18:05:55 154050 [Note] WSREP: gcomm: terminating thread
2015-06-22 18:05:55 154050 [Note] WSREP: gcomm: joining thread
2015-06-22 18:05:55 154050 [Note] WSREP: gcomm: closing backend


Node 2 do not list anything in the error log


tcpdump trace on both nodes show around 140 packets exchanged on tcp port 4567
nothing exchanges on udp port 4567 nor on tcp port 4568 or 4444


B. I have reinstalled the server … and rebooted … but nothing different.

Anyone have an idea on what is happening ?

Xibu

Could you provide my.cnf settings for both nodes? Also provide /etc/hosts file for both servers.

There are errors indicating network (firewall, port, name-resolution, ip conflict) issues:

ERROR 2003 (HY000): Can't connect to MySQL server on '127.0.0.1' (111)
ERROR] WSREP: Requesting state transfer failed: -113(No route to host)
2015-06-22 18:05:55 154050 [ERROR] WSREP: State transfer request failed unrecoverably: 113 (No route to host). Most likely it is due to inability to communicate with the cluster primary component. Restart required.

Hi ShahriyarR

  • as I stated before iptables are disabled so nothing can be blocked,
  • tcpdump captured traffic on port 4567 when I do a service start mysql on the node without data.
  • This configuration was running perfectly before the loss of node 1; nothing have been changed in the previous configurations files
    (hosts / my.cnf are identical to the restore files at the time the cluster was running)
  • ping between the two servers on eth1 interface IP address is successful

and again is it possible see my.cnf settings for both nodes?

Problem solved … something was preventing the connection on port 4444 … I reinstalled the cluster
nothing was wrong on the my.cnf configuration files

Glad to see that, problem solved.
As i previously stated, there was some network (firewall, port, name-resolution, ip conflict) issue.
Exactly with 4444 port:

[Note] WSREP: Prepared SST request: xtrabackup-v2|172.18.172.145:4444/xtrabackup_sst//1