Impossible to join primary node

Xibu · June 22, 2015, 11:28am

2 nodes percona cluster 5.6.24-72.2 + garbd
direct cable between eth1 of two nodes,
iptables disabled,

Node 2 is running perfectly
I installed the new release 5.6.24-72.2

Node 1 went down one night
I used the down time period to shrink the system partition and install the new release

What I have been trying
a. cleaning the content of /var/lib/mysql folder and starting the mysql server
node 1 stop
service mysql start

2015-06-22 18:05:54 154050 [Note] WSREP: State transfer required:
Group state: db960768-171f-11e5-847c-3349410895e7:4357233
Local state: 00000000-0000-0000-0000-000000000000:-1
2015-06-22 18:05:54 154050 [Note] WSREP: New cluster view: global state: db960768-171f-11e5-847c-3349410895e7:4357233, view# 4: Primary, number of nodes: 2, my index: 0, protocol version 3
2015-06-22 18:05:54 154050 [Warning] WSREP: Gap in state sequence. Need state transfer.
2015-06-22 18:05:54 154050 [Note] WSREP: Running: 'wsrep_sst_xtrabackup-v2 --role ‘joiner’ --address ‘172.18.172.145’ --auth ‘sst_user:umi_tss_20131205’ --datadir ‘/var/lib/mysql/’ --defaults-file ‘/etc/my.cnf’ --defaults-group-suffix ‘’ --parent ‘154050’ ‘’ ’
WSREP_SST: [INFO] Streaming with xbstream (20150622 18:05:55.528)
WSREP_SST: [INFO] Using socat as streamer (20150622 18:05:55.532)
WSREP_SST: [INFO] Xtrabackup based encryption enabled in my.cnf - Supported only from Xtrabackup 2.1.4 (20150622 18:05:55.572)
WSREP_SST: [INFO] Evaluating timeout -s9 100 socat -u TCP-LISTEN:4444,reuseaddr stdio | pv -f -i 10 -N joiner 2>>/var/log/mysql-sst-progress | xbcrypt --encrypt-algo=AES256 --encrypt-key=uo1zoo2ALoothaingookow7sho4eot4a -d | xbstream -x; RC=( ${PIPESTATUS[@]} ) (20150622 18:05:55.582)
2015-06-22 18:05:55 154050 [Note] WSREP: Prepared SST request: xtrabackup-v2|172.18.172.145:4444/xtrabackup_sst//1
Warning: Using a password on the command line interface can be insecure.
ERROR 2003 (HY000): Can’t connect to MySQL server on ‘127.0.0.1’ (111)
2015-06-22 18:05:55 154050 [Note] WSREP: REPL Protocols: 7 (3, 2)
2015-06-22 18:05:55 154050 [Note] WSREP: Service thread queue flushed.
2015-06-22 18:05:55 154050 [Note] WSREP: Assign initial position for certification: 4357233, protocol version: 3
2015-06-22 18:05:55 154050 [Note] WSREP: Service thread queue flushed.
2015-06-22 18:05:55 154050 [Warning] WSREP: Failed to prepare for incremental state transfer: Local state UUID (00000000-0000-0000-0000-000000000000) does not match group state UUID (db960768-171f-11e5-847c-3349410895e7): 1 (Operation not permitted)
at galera/src/replicator_str.cpp:prepare_for_IST():463. IST will be unavailable.
2015-06-22 18:05:55 154050 [Warning] WSREP: Member 0.0 (icts-zabbix01) requested state transfer from ‘icts-zabbix02’, but it is impossible to select State Transfer donor: No route to host
2015-06-22 18:05:55 154050 [ERROR] WSREP: Requesting state transfer failed: -113(No route to host)
2015-06-22 18:05:55 154050 [ERROR] WSREP: State transfer request failed unrecoverably: 113 (No route to host). Most likely it is due to inability to communicate with the cluster primary component. Restart required.
2015-06-22 18:05:55 154050 [Note] WSREP: Closing send monitor…
2015-06-22 18:05:55 154050 [Note] WSREP: Closed send monitor.
2015-06-22 18:05:55 154050 [Note] WSREP: gcomm: terminating thread
2015-06-22 18:05:55 154050 [Note] WSREP: gcomm: joining thread
2015-06-22 18:05:55 154050 [Note] WSREP: gcomm: closing backend

Node 2 do not list anything in the error log

tcpdump trace on both nodes show around 140 packets exchanged on tcp port 4567
nothing exchanges on udp port 4567 nor on tcp port 4568 or 4444

B. I have reinstalled the server … and rebooted … but nothing different.

Anyone have an idea on what is happening ?

Xibu

ShahriyarR · June 24, 2015, 12:47am

Could you provide my.cnf settings for both nodes? Also provide /etc/hosts file for both servers.

There are errors indicating network (firewall, port, name-resolution, ip conflict) issues:

ERROR 2003 (HY000): Can't connect to MySQL server on '127.0.0.1' (111)
ERROR] WSREP: Requesting state transfer failed: -113(No route to host)
2015-06-22 18:05:55 154050 [ERROR] WSREP: State transfer request failed unrecoverably: 113 (No route to host). Most likely it is due to inability to communicate with the cluster primary component. Restart required.

Xibu · June 24, 2015, 7:36am

Hi ShahriyarR

as I stated before iptables are disabled so nothing can be blocked,
tcpdump captured traffic on port 4567 when I do a service start mysql on the node without data.
This configuration was running perfectly before the loss of node 1; nothing have been changed in the previous configurations files
(hosts / my.cnf are identical to the restore files at the time the cluster was running)
ping between the two servers on eth1 interface IP address is successful

ShahriyarR · June 24, 2015, 11:42pm

and again is it possible see my.cnf settings for both nodes?

Xibu · June 30, 2015, 6:16am

Problem solved … something was preventing the connection on port 4444 … I reinstalled the cluster
nothing was wrong on the my.cnf configuration files

ShahriyarR · July 1, 2015, 1:57am

Glad to see that, problem solved.
As i previously stated, there was some network (firewall, port, name-resolution, ip conflict) issue.
Exactly with 4444 port:

[Note] WSREP: Prepared SST request: xtrabackup-v2|172.18.172.145:4444/xtrabackup_sst//1

Topic		Replies	Views
Failed to start/join new node (Second node in cluster) Percona XtraDB Cluster 5.x	0	1261	February 24, 2016
Issue in adding nodes with primary Percona XtraDB Cluster node Percona XtraDB Cluster 5.x	4	2523	November 12, 2014
Problem when node goes down Percona XtraDB Cluster 5.x mysql , percona	2	864	June 28, 2022
Errors during node connection Percona XtraDB Cluster 5.x	2	5738	May 6, 2014
Failed to rejoin the cluster Percona XtraDB Cluster 5.x	2	1313	May 21, 2016

Impossible to join primary node

Related topics