Cluster 5.5.29-23.7.2 and 7th node can't join [Warning] WSREP: no nodes coming from prim view

I have a 6 node PXC cluster, but I can not add a seventh node to it.

I have tried 3 times, I took a backup with innobackupex --galera-info and --apply-log was applied and I also copied data directory from primary node when it was down. All 5 other nodes
connected to the cluster fine and did IST.

So 6 nodes work fine, but not 7th!!! I can repeat this all the time. The OS and data is same as on the other nodes.

Am I hitting maximum number of nodes in Percona Xtradb cluster?

Below are the most significant warnings and complete error log below it.

/gcs_core.c:gcs_core_open():195: Failed to open backend connection: -110 (Connection timed out)
130408 10:44:36 [ERROR] WSREP: gcs/src/gcs.c:gcs_open():1290: Failed to open channel ‘pxcluster’ at ‘[gcomm://ip1’
(Connection timed out)
130408 10:44:36 [ERROR] WSREP: gcs connect failed: Connection timed out
130408 10:44:36 [ERROR] WSREP: wsrep::connect() failed: 6
130408 10:44:36 [ERROR] Aborting

130408 10:44:36 [Note] WSREP: Service disconnected.
130408 10:44:37 [Note] WSREP: Some threads may fail to exit.
130408 10:44:37 [Note] /home/mysql-cluster/percona/bin/mysqld: Shutdown complete

130408 10:44:37 mysqld_safe mysqld from pid file /home/mysql-cluster/data/mysqld.pid ended
(END)
130408 10:44:28 [Warning] WSREP: no nodes coming from prim view, prim not possible

Here are the warnings:

}
130408 10:44:21 [Note] WSREP: no install message received
130408 10:44:21 [Warning] WSREP: no nodes coming from prim view, prim not possible
130408 10:44:21 [Note] WSREP: view(view_id(NON_PRIM,3c3fd986-a039-11e2-0800-1249fa0c4c31,1 ) memb {
3c3fd986-a039-11e2-0800-1249fa0c4c31,
} joined {
} left {
} partitioned {
})
130408 10:44:23 [Note] WSREP: (3c3fd986-a039-11e2-0800-1249fa0c4c31, ‘[ ]tcp://0.0.0.0:4567[/ ]’) reconnecting to 7463a0d2-a037-11e2-0800-c0f7dd8c9991 ([ ]tcp://ip1:4567[/ ]), attempt 0
130408 10:44:25 [Note] WSREP: (3c3fd986-a039-11e2-0800-1249fa0c4c31, ‘[ ]tcp://0.0.0.0:4567[/ ]’) reconnecting to 7463a0d2-a037-11e2-0800-c0f7dd8c9991 ([ ]tcp://ip1:4567[/ ]), attempt 0
130408 10:44:26 [Note] WSREP: evs::proto(3c3fd986-a039-11e2-0800-1249fa0c4c31, GATHER, view_id(REG,3c3fd986-a039-11e2-0800-1249fa0c4c31,1)) suspecting node: fd02fef1-a037-11e2-0800-1c2ae8cfc882
130408 10:44:26 [Note] WSREP: (3c3fd986-a039-11e2-0800-1249fa0c4c31, ‘[ ]tcp://0.0.0.0:4567[/ ]’) reconnecting to 7463a0d2-a037-11e2-0800-c0f7dd8c9991 ([ ]tcp://ip1:4567[/ ]), attempt 0
130408 10:44:26 [Note] WSREP: evs::proto(3c3fd986-a039-11e2-0800-1249fa0c4c31, GATHER, view_id(REG,3c3fd986-a039-11e2-0800-1249fa0c4c31,1)) suspecting node: fd02fef1-a037-11e2-0800-1c2ae8cfc882
130408 10:44:27 [Note] WSREP: evs::proto(3c3fd986-a039-11e2-0800-1249fa0c4c31, GATHER, view_id(REG,3c3fd986-a039-11e2-0800-1249fa0c4c31,1)) suspecting node: 1aa3dfd5-a039-11e2-0800-c875c67e965d
130408 10:44:27 [Note] WSREP: evs::proto(3c3fd986-a039-11e2-0800-1249fa0c4c31, GATHER, view_id(REG,3c3fd986-a039-11e2-0800-1249fa0c4c31,1)) suspecting node: 20b5dce4-a038-11e2-0800-fd5636b90ff6
130408 10:44:27 [Note] WSREP: evs::proto(3c3fd986-a039-11e2-0800-1249fa0c4c31, GATHER, view_id(REG,3c3fd986-a039-11e2-0800-1249fa0c4c31,1)) suspecting node: 35900757-a038-11e2-0800-255626c12fbb
130408 10:44:27 [Note] WSREP: evs::proto(3c3fd986-a039-11e2-0800-1249fa0c4c31, GATHER, view_id(REG,3c3fd986-a039-11e2-0800-1249fa0c4c31,1)) suspecting node: 63d24c78-a038-11e2-0800-1ecd090a0499
130408 10:44:27 [Note] WSREP: evs::proto(3c3fd986-a039-11e2-0800-1249fa0c4c31, GATHER, view_id(REG,3c3fd986-a039-11e2-0800-1249fa0c4c31,1)) suspecting node: 7463a0d2-a037-11e2-0800-c0f7dd8c9991
130408 10:44:27 [Note] WSREP: evs::proto(3c3fd986-a039-11e2-0800-1249fa0c4c31, GATHER, view_id(REG,3c3fd986-a039-11e2-0800-1249fa0c4c31,1)) suspecting node: 82e5c399-a038-11e2-0800-ca20eebd35fc
130408 10:44:27 [Note] WSREP: evs::proto(3c3fd986-a039-11e2-0800-1249fa0c4c31, GATHER, view_id(REG,3c3fd986-a039-11e2-0800-1249fa0c4c31,1)) suspecting node: fd02fef1-a037-11e2-0800-1c2ae8cfc882

130408 10:44:28 [Warning] WSREP: subsequent views have same members, prev view view(view_id(REG,3c3fd986-a039-11e2-0800-1249fa0c4c31,1) memb {
3c3fd986-a039-11e2-0800-1249fa0c4c31,
} joined {
} left {
} partitioned {
}) current view view(view_id(REG,3c3fd986-a039-11e2-0800-1249fa0c4c31,2) memb {
3c3fd986-a039-11e2-0800-1249fa0c4c31,
} joined {
} left {
} partitioned {
})
130408 10:44:28 [Warning] WSREP: no nodes coming from prim view, prim not possible
130408 10:44:28 [Note] WSREP: view(view_id(NON_PRIM,3c3fd986-a039-11e2-0800-1249fa0c4c31,2 ) memb {
3c3fd986-a039-11e2-0800-1249fa0c4c31,
} joined {
} left {
} partitioned {
})
130408 10:44:33 [Note] WSREP: evs::proto(3c3fd986-a039-11e2-0800-1249fa0c4c31, GATHER, view_id(REG,3c3fd986-a039-11e2-0800-1249fa0c4c31,2)) suspecting node: 82e5c399-a038-11e2-0800-ca20eebd35fc
130408 10:44:34 [Note] WSREP: evs::proto(3c3fd986-a039-11e2-0800-1249fa0c4c31, GATHER, view_id(REG,3c3fd986-a039-11e2-0800-1249fa0c4c31,2)) suspecting node: 63d24c78-a038-11e2-0800-1ecd090a0499
130408 10:44:34 [Note] WSREP: evs::proto(3c3fd986-a039-11e2-0800-1249fa0c4c31, GATHER, view_id(REG,3c3fd986-a039-11e2-0800-1249fa0c4c31,2)) suspecting node: 82e5c399-a038-11e2-0800-ca20eebd35fc
130408 10:44:34 [Note] WSREP: evs::proto(3c3fd986-a039-11e2-0800-1249fa0c4c31, GATHER, view_id(REG,3c3fd986-a039-11e2-0800-1249fa0c4c31,2)) suspecting node: fd02fef1-a037-11e2-0800-1c2ae8cfc882
130408 10:44:34 [Note] WSREP: evs::proto(3c3fd986-a039-11e2-0800-1249fa0c4c31, GATHER, view_id(REG,3c3fd986-a039-11e2-0800-1249fa0c4c31,2)) suspecting node: 63d24c78-a038-11e2-0800-1ecd090a0499
130408 10:44:34 [Note] WSREP: evs::proto(3c3fd986-a039-11e2-0800-1249fa0c4c31, GATHER, view_id(REG,3c3fd986-a039-11e2-0800-1249fa0c4c31,2)) suspecting node: 82e5c399-a038-11e2-0800-ca20eebd35fc
130408 10:44:34 [Note] WSREP: evs::proto(3c3fd986-a039-11e2-0800-1249fa0c4c31, GATHER, view_id(REG,3c3fd986-a039-11e2-0800-1249fa0c4c31,2)) suspecting node: fd02fef1-a037-11e2-0800-1c2ae8cfc882
130408 10:44:35 [Note] WSREP: evs::proto(3c3fd986-a039-11e2-0800-1249fa0c4c31, GATHER, view_id(REG,3c3fd986-a039-11e2-0800-1249fa0c4c31,2)) suspecting node: 63d24c78-a038-11e2-0800-1ecd090a0499
130408 10:44:35 [Note] WSREP: evs::proto(3c3fd986-a039-11e2-0800-1249fa0c4c31, GATHER, view_id(REG,3c3fd986-a039-11e2-0800-1249fa0c4c31,2)) suspecting node: 82e5c399-a038-11e2-0800-ca20eebd35fc
130408 10:44:35 [Note] WSREP: evs::proto(3c3fd986-a039-11e2-0800-1249fa0c4c31, GATHER, view_id(REG,3c3fd986-a039-11e2-0800-1249fa0c4c31,2)) suspecting node: 63d24c78-a038-11e2-0800-1ecd090a0499
130408 10:44:35 [Note] WSREP: evs::proto(3c3fd986-a039-11e2-0800-1249fa0c4c31, GATHER, view_id(REG,3c3fd986-a039-11e2-0800-1249fa0c4c31,2)) suspecting node: 82e5c399-a038-11e2-0800-ca20eebd35fc
130408 10:44:36 [Note] WSREP: evs::proto(3c3fd986-a039-11e2-0800-1249fa0c4c31, GATHER, view_id(REG,3c3fd986-a039-11e2-0800-1249fa0c4c31,2)) suspecting node: 63d24c78-a038-11e2-0800-1ecd090a0499
130408 10:44:36 [Note] WSREP: evs::proto(3c3fd986-a039-11e2-0800-1249fa0c4c31, GATHER, view_id(REG,3c3fd986-a039-11e2-0800-1249fa0c4c31,2)) suspecting node: 82e5c399-a038-11e2-0800-ca20eebd35fc
130408 10:44:36 [Note] WSREP: evs::proto(3c3fd986-a039-11e2-0800-1249fa0c4c31, GATHER, view_id(REG,3c3fd986-a039-11e2-0800-1249fa0c4c31,2)) suspecting node: 63d24c78-a038-11e2-0800-1ecd090a0499
130408 10:44:36 [Note] WSREP: evs::proto(3c3fd986-a039-11e2-0800-1249fa0c4c31, GATHER, view_id(REG,3c3fd986-a039-11e2-0800-1249fa0c4c31,2)) suspecting node: 82e5c399-a038-11e2-0800-ca20eebd35fc
130408 10:44:36 [Note] WSREP: view((empty))
130408 10:44:36 [ERROR] WSREP: failed to open gcomm backend connection: 110: failed to reach primary view: 110 (Connection timed out)
at gcomm/src/pc.cpp:connect():139
130408 10:44:36 [ERROR] WSREP: gcs/src/gcs_core.c:gcs_core_open():195: Failed to open backend connection: -110 (Connection timed out)
130408 10:44:36 [ERROR] WSREP: gcs/src/gcs.c:gcs_open():1290: Failed to open channel ‘pxcluster’ at ‘gcomm://ip1’
(Connection timed out)
130408 10:44:36 [ERROR] WSREP: gcs connect failed: Connection timed out
130408 10:44:36 [ERROR] WSREP: wsrep::connect() failed: 6
130408 10:44:36 [ERROR] Aborting

130408 10:44:36 [Note] WSREP: Service disconnected.
130408 10:44:37 [Note] WSREP: Some threads may fail to exit.
130408 10:44:37 [Note] /home/mysql-cluster/percona/bin/mysqld: Shutdown complete

130408 10:44:37 mysqld_safe mysqld from pid file /home/mysql-cluster/data/mysqld.pid ended
(END)
130408 10:44:28 [Warning] WSREP: no nodes coming from prim view, prim not possible

Hello,

Could you please show us the cluster configuration? it looks like the new node can’t access to the 4567 port to the ip1. Maybe you can try to connect to the ip/port from the new just to discard network issues (telnet ip1 port)

You mention that you have copied the data directory from primary node when it was down, maybe that was the problem, you can add multiple ip’s in the gcomm:// configuration to ie: gcomm://ip1,ip2,ip3 … so if the ip1 is down, you can connect to ip2 or ip3 to join go the cluster.

Also, you can connect to the cluster and let the cluster make a full SST to update the node.

Regards,

Martin

@martinarrietac

Is “gcomm://ip1” valid? Or is “ip1” simply meant to be a placeholder, if this cannot be resolved then this node will not be abel to connect to the cluster. Try replacing with an valid IP or make sure host resolution is working.