Node is not connecting.

Hi ,
We are using three node Percona XtraDB Cluster. Right now only one node(Node No: 2) is running. All the (Node No: 1, Node No: 3) remaining nodes are down. here I attached the Down nodes error log.
Please provide your suggestion to solve.
Advance Thanks.

Node No: 3
Error Log:

131224 4:16:06 [ERROR] WSREP: failed to open gcomm backend connection: 110: failed to reach primary view: 110 (Connection timed out)
at gcomm/src/pc.cpp:connect():157
131224 4:16:06 [ERROR] WSREP: gcs/src/gcs_core.c:gcs_core_open():195: Failed to open backend connection: -110 (Connection timed out)
131224 4:16:06 [ERROR] WSREP: gcs/src/gcs.c:gcs_open():1290: Failed to open channel ‘ecomm’ at ‘gcomm://X.X.X.2’: -110 (Connection timed out)
131224 4:16:06 [ERROR] WSREP: gcs connect failed: Connection timed out
131224 4:16:06 [ERROR] WSREP: wsrep::connect() failed: 6
131224 4:16:06 [ERROR] Aborting

131224 4:16:06 [Note] WSREP: Service disconnected.
131224 4:16:07 [Note] WSREP: Some threads may fail to exit.
131224 4:16:07 [Note] /usr/sbin/mysqld: Shutdown complete

131224 04:16:07 mysqld_safe mysqld from pid file /mnt/data//FD-R14-db3.pid ended
131224 04:19:56 mysqld_safe Starting mysqld daemon with databases from /mnt/data/
131224 4:19:56 [Note] Flashcache bypass: disabled
131224 4:19:56 [Note] Flashcache setup error is : ioctl failed

131224 4:19:56 [Note] WSREP: Read nil XID from storage engines, skipping position init
131224 4:19:56 [Note] WSREP: wsrep_load(): loading provider library ‘/usr/lib64/libgalera_smm.so’
131224 4:19:56 [Note] WSREP: wsrep_load(): Galera 2.2(r114) by Codership Oy <info@codership.com> loaded succesfully.
131224 4:19:56 [Note] WSREP: Found saved state: 00000000-0000-0000-0000-000000000000:-1
131224 4:19:56 [Note] WSREP: Reusing existing ‘/mnt/data//galera.cache’.
131224 4:19:56 [Note] WSREP: Passing config to GCS: base_host = X.X.X.3; gcache.dir = /mnt/data/; gcache.keep_pages_size = 0; gcache.mem_size = 0; gcache.name = /mnt/data//galera.cache; gcache.page_size = 128M; gcache.size = 128M; gcs.fc_debug = 0; gcs.fc_factor = 0.5; gcs.fc_limit = 16; gcs.fc_master_slave = NO; gcs.max_packet_size = 64500; gcs.max_throttle = 0.25; gcs.recv_q_hard_limit = 9223372036854775807; gcs.recv_q_soft_limit = 0.25; gcs.sync_donor = NO; replicator.causal_read_timeout = PT30S; replicator.commit_order = 3
131224 4:19:56 [Note] WSREP: Assign initial position for certification: -1, protocol version: -1
131224 4:19:56 [Note] WSREP: wsrep_sst_grab()
131224 4:19:56 [Note] WSREP: Start replication
131224 4:19:56 [Note] WSREP: Setting initial position to 00000000-0000-0000-0000-000000000000:-1
131224 4:19:56 [Note] WSREP: protonet asio version 0
131224 4:19:56 [Note] WSREP: backend: asio
131224 4:19:56 [Note] WSREP: GMCast version 0
131224 4:19:56 [Note] WSREP: (a5560fc4-6c52-11e3-0800-6de37b8550ef, ‘tcp://0.0.0.0:4567’) listening at tcp://0.0.0.0:4567
131224 4:19:56 [Note] WSREP: (a5560fc4-6c52-11e3-0800-6de37b8550ef, ‘tcp://0.0.0.0:4567’) multicast: , ttl: 1
131224 4:19:56 [Note] WSREP: EVS version 0
131224 4:19:56 [Note] WSREP: PC version 0
131224 4:19:56 [Note] WSREP: gcomm: connecting to group ‘ecomm’, peer ‘X.X.X.2:’
131224 4:19:57 [Note] WSREP: declaring 771b0086-6c52-11e3-0800-6686b0fba2c7 stable
131224 4:19:57 [Note] WSREP: view(view_id(PRIM,771b0086-6c52-11e3-0800-6686b0fba2c7,2) memb {
771b0086-6c52-11e3-0800-6686b0fba2c7,
a5560fc4-6c52-11e3-0800-6de37b8550ef,
} joined {
} left {
} partitioned {
})
131224 4:19:57 [Note] WSREP: gcomm: connected
131224 4:19:57 [Note] WSREP: Changing maximum packet size to 64500, resulting msg size: 32636
131224 4:19:57 [Note] WSREP: Shifting CLOSED → OPEN (TO: 0)
131224 4:19:57 [Note] WSREP: Opened channel ‘ecomm’
131224 4:19:57 [Note] WSREP: New COMPONENT: primary = yes, bootstrap = no, my_idx = 1, memb_num = 2
131224 4:19:57 [Note] WSREP: STATE EXCHANGE: Waiting for state UUID.
131224 4:19:57 [Note] WSREP: Waiting for SST to complete.
131224 4:19:57 [Note] WSREP: STATE EXCHANGE: sent state msg: a59fd8b1-6c52-11e3-0800-74c11aaf856b
131224 4:19:57 [Note] WSREP: STATE EXCHANGE: got state msg: a59fd8b1-6c52-11e3-0800-74c11aaf856b from 0 (FD-R14-db2)
131224 4:19:57 [Note] WSREP: STATE EXCHANGE: got state msg: a59fd8b1-6c52-11e3-0800-74c11aaf856b from 1 (FD-R14-db3)
131224 4:19:57 [Note] WSREP: Quorum results:
version = 2,
component = PRIMARY,
conf_id = 1,
members = 1/2 (joined/total),
act_id = 172,
last_appl. = -1,
protocols = 0/4/2 (gcs/repl/appl),
group UUID = 493d95e8-6c50-11e3-0800-b099dbf50e3d
131224 4:19:57 [Note] WSREP: Flow-control interval: [12, 23]
131224 4:19:57 [Note] WSREP: Shifting OPEN → PRIMARY (TO: 172)
131224 4:19:57 [Note] WSREP: State transfer required:
Group state: 493d95e8-6c50-11e3-0800-b099dbf50e3d:172
Local state: 00000000-0000-0000-0000-000000000000:-1
131224 4:19:57 [Note] WSREP: New cluster view: global state: 493d95e8-6c50-11e3-0800-b099dbf50e3d:172, view# 2: Primary, number of nodes: 2, my index: 1, protocol version 2
131224 4:19:57 [Warning] WSREP: Gap in state sequence. Need state transfer.
131224 4:19:59 [Note] WSREP: Running: ‘wsrep_sst_rsync ‘joiner’ ‘X.X.X.3’ ‘’ ‘/mnt/data/’ ‘/etc/my.cnf’ ‘15049’ 2>sst.err’
131224 4:19:59 [Note] WSREP: Prepared SST request: rsync|X.X.X.3:4444/rsync_sst
131224 4:19:59 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
131224 4:19:59 [Note] WSREP: Assign initial position for certification: 172, protocol version: 2
131224 4:19:59 [Warning] WSREP: Failed to prepare for incremental state transfer: Local state UUID (00000000-0000-0000-0000-000000000000) does not match group state UUID (493d95e8-6c50-11e3-0800-b099dbf50e3d): 1 (Operation not permitted)
at galera/src/replicator_str.cpp:prepare_for_IST():440. IST will be unavailable.
131224 4:19:59 [Note] WSREP: Node 1 (FD-R14-db3) requested state transfer from ‘any’. Selected 0 (FD-R14-db2)(SYNCED) as donor.
131224 4:19:59 [Note] WSREP: Shifting PRIMARY → JOINER (TO: 172)
131224 4:19:59 [Note] WSREP: Requesting state transfer: success, donor: 0

Node No 1 :
Error Log:
at galera/src/replicator_str.cpp:prepare_for_IST():440. IST will be unavailable.
131224 4:12:17 [Note] WSREP: Node 1 (FD-R14-db1) requested state transfer from ‘any’. Selected 0 (FD-R14-db2)(SYNCED) as donor.
131224 4:12:17 [Note] WSREP: Shifting PRIMARY → JOINER (TO: 172)
131224 4:12:17 [Note] WSREP: Requesting state transfer: success, donor: 0
131224 04:17:09 mysqld_safe Starting mysqld daemon with databases from /mnt/data/
131224 4:17:09 [Note] Flashcache bypass: disabled
131224 4:17:09 [Note] Flashcache setup error is : ioctl failed

131224 4:17:09 [Note] WSREP: Read nil XID from storage engines, skipping position init
131224 4:17:09 [Note] WSREP: wsrep_load(): loading provider library ‘/usr/lib64/libgalera_smm.so’
131224 4:17:09 [Note] WSREP: wsrep_load(): Galera 2.2(r114) by Codership Oy <info@codership.com> loaded succesfully.
131224 4:17:09 [Note] WSREP: Found saved state: 00000000-0000-0000-0000-000000000000:-1
131224 4:17:09 [Note] WSREP: Reusing existing ‘/mnt/data//galera.cache’.
131224 4:17:09 [Note] WSREP: Passing config to GCS: base_host = X.X.X.1; gcache.dir = /mnt/data/; gcache.keep_pages_size = 0; gcache.mem_size = 0; gcache.name = /mnt/data//galera.cache; gcache.page_size = 128M; gcache.size = 128M; gcs.fc_debug = 0; gcs.fc_factor = 0.5; gcs.fc_limit = 16; gcs.fc_master_slave = NO; gcs.max_packet_size = 64500; gcs.max_throttle = 0.25; gcs.recv_q_hard_limit = 9223372036854775807; gcs.recv_q_soft_limit = 0.25; gcs.sync_donor = NO; replicator.causal_read_timeout = PT30S; replicator.commit_order = 3
131224 4:17:09 [Note] WSREP: Assign initial position for certification: -1, protocol version: -1
131224 4:17:09 [Note] WSREP: wsrep_sst_grab()
131224 4:17:09 [Note] WSREP: Start replication
131224 4:17:09 [Note] WSREP: Setting initial position to 00000000-0000-0000-0000-000000000000:-1
131224 4:17:09 [Note] WSREP: protonet asio version 0
131224 4:17:09 [Note] WSREP: backend: asio
131224 4:17:09 [Note] WSREP: GMCast version 0
131224 4:17:09 [Note] WSREP: (41e1954d-6c52-11e3-0800-a5888b35da99, ‘tcp://0.0.0.0:4567’) listening at tcp://0.0.0.0:4567
131224 4:17:09 [Note] WSREP: (41e1954d-6c52-11e3-0800-a5888b35da99, ‘tcp://0.0.0.0:4567’) multicast: , ttl: 1
131224 4:17:09 [Note] WSREP: EVS version 0
131224 4:17:09 [Note] WSREP: PC version 0
131224 4:17:09 [Note] WSREP: gcomm: connecting to group ‘ecomm’, peer ‘X.X.X.2:’
131224 4:17:10 [Note] WSREP: declaring 493c5523-6c50-11e3-0800-25c9bc4e1fff stable
131224 4:17:10 [Note] WSREP: view(view_id(NON_PRIM,41e1954d-6c52-11e3-0800-a5888b35da99,8) memb {
41e1954d-6c52-11e3-0800-a5888b35da99,
493c5523-6c50-11e3-0800-25c9bc4e1fff,
} joined {
} left {
} partitioned {
0975f0b3-6c52-11e3-0800-2d07df149f30,
9183e102-6c51-11e3-0800-62536ac728a9,
})
131224 4:17:40 [ERROR] WSREP: failed to open gcomm backend connection: 110: failed to reach primary view: 110 (Connection timed out)
at gcomm/src/pc.cpp:connect():157
131224 4:17:40 [ERROR] WSREP: gcs/src/gcs_core.c:gcs_core_open():195: Failed to open backend connection: -110 (Connection timed out)
131224 4:17:40 [ERROR] WSREP: gcs/src/gcs.c:gcs_open():1290: Failed to open channel ‘ecomm’ at ‘gcomm://X.X.X.2’: -110 (Connection timed out)
131224 4:17:40 [ERROR] WSREP: gcs connect failed: Connection timed out
131224 4:17:40 [ERROR] WSREP: wsrep::connect() failed: 6
131224 4:17:40 [ERROR] Aborting
131224 4:17:40 [Note] WSREP: Service disconnected.
131224 4:17:41 [Note] WSREP: Some threads may fail to exit.
131224 4:17:41 [Note] /usr/sbin/mysqld: Shutdown complete
131224 04:17:41 mysqld_safe mysqld from pid file /mnt/data//FD-R14-db1.pid ended

Advance Thanks,
Periyannan N.

looks like both node1 & 3 needs SST.

make node 2 as primary
in mysql prompt give
SET GLOBAL wsrep_provider_options=“pc.bootstrap=1”;
(if u have done already then leave it)

then put =node2 address in both node 1& node3 my.cnf file and then try to start again.

If you provide some more information about you setup it will be helpful, like OS,percona version,status of innobackup.backup.log,innobackup.prepare.log (if any).

Hi Madhusudan,

First Thanks for your reply. Please find the below are my server setup:

OS Version: CentOS 6.3 (Final)

Percona Version: Percona-XtraDB-Cluster-server-5.5.27-23.6.356.rhel6.x86_64

And then we are using rsync for SST. I think we dont have innobackup.backup.log,innobackup.prepare.log.

Please provide any more suggestion, i am new to this Percona XtraDB Cluster.

Advance Thanks,

Regards,
Periyannan N.

did u try, what I suggested.? and your cluster is a production or in testing…?

more info about recovering cluster…
[URL]Percona XtraDB Cluster

Hi,

This production database, we took down time tomorrow only, we could not able to check right now.

And then for information, we are using galera-2.0

Thanks and Regards,
Periyannan N.

Hi,

For your Information,

And then here, three nodes are act as a master only(multi master).

Now the node no: 2 is running,
we didnt execute the “SET GLOBAL wsrep_provider_options=“pc.bootstrap=1”;” from begin onwards.
we didnt put “wsrep_sst_donor=node2” in my.cnf file from begin onwards.

Thanks and Regards,
Periyannan N.

Hi,

whether i need to delete the below log before execute the rsync command.
GRA_1_151865761.log
rsync_sst.conf
rsync_sst.pid

Thanks and Regards,
Periyannan N.

While adding node to the Cluster server I m not able to start mysql service.
And I have read the logs and I found below error on logs 
Failed to open Gcomm backend connections:  -110 (connection timeout) 
Failed to open channel ‘mysql1-cluster’ at ‘Gcomm:/(cluster IP)’ : -110 (connection time out)

Please help to resolve this issue…!