Primary node restart failed

marat.komarov · May 29, 2013, 7:36am

I’ve setuped a simple Master-Slave using weighted quorum [URL=“http://www.codership.com/wiki/doku.php?id=weighted_quorum”]http://www.codership.com/wiki/doku.p...eighted_quorum[/URL]

node-1 is a master:
wsrep_provider_options=“pc.weight=1; gcs.fc_master_slave=yes”
wsrep_cluster_address=gcomm://node-1,node-2

node-2 is a slave:
wsrep_provider_options=“pc.weight=0; gcs.fc_master_slave=yes”
wsrep_cluster_address=gcomm://node-1,node-2

/etc/hosts on both servers has ip address mappings

When i restart mysql on master with service mysql restart it fails to join back cluster with a connection refused error in server log:

130529 8:59:31 [Note] WSREP: gcomm: connecting to group ‘my_wsrep_cluster’, peer ‘node-1:,node-2:’
130529 8:59:31 [Warning] WSREP: (9a694329-c85f-11e2-0800-3f3cae6230d1, ‘tcp://0.0.0.0:4567’) address ‘tcp://10.137.48.175:4567’ points to own listening address, blacklisting
130529 8:59:31 [Note] WSREP: (9a694329-c85f-11e2-0800-3f3cae6230d1, ‘tcp://0.0.0.0:4567’) address ‘tcp://10.137.48.175:4567’ pointing to uuid 9a694329-c85f-11e2-0800-3f3cae6230d1 is blacklisted, skipping
130529 8:59:31 [Note] WSREP: declaring b27d9646-c85b-11e2-0800-eeeca83be563 stable
130529 8:59:31 [Note] WSREP: view(view_id(NON_PRIM,9a694329-c85f-11e2-0800-3f3cae6230d1,16) memb {
9a694329-c85f-11e2-0800-3f3cae6230d1,
b27d9646-c85b-11e2-0800-eeeca83be563,
} joined {
} left {
} partitioned {
faec897c-c85e-11e2-0800-2721f22ff1fc,
})
130529 9:00:01 [ERROR] WSREP: failed to open gcomm backend connection: 110: failed to reach primary view: 110 (Connection timed out)
at gcomm/src/pc.cpp:connect():139
130529 9:00:01 [ERROR] WSREP: gcs/src/gcs_core.c:gcs_core_open():195: Failed to open backend connection: -110 (Connection timed out)
130529 9:00:01 [ERROR] WSREP: gcs/src/gcs.c:gcs_open():1290: Failed to open channel ‘my_wsrep_cluster’ at ‘gcomm://node-1,node-2’: -110 (Connection timed out)
130529 9:00:01 [ERROR] WSREP: gcs connect failed: Connection timed out
130529 9:00:01 [ERROR] WSREP: wsrep::connect() failed: 6
130529 9:00:01 [ERROR] Aborting

To get back node-1 to cluster i’ve started it with service mysql start --wsrep-cluster-address="gcomm://"

My question: is it expected behavior? it’s a bit strange to get connectivity error, while node-2 is up and port 4567 is listening. when cluster contains only one primary node is it safe to restart mysql daemon on it?

percona.jayj · May 30, 2013, 12:00pm

Can you pin this behavior on the weighted quorum? Does it behave normally if you get rid of the weight?

marat.komarov · May 31, 2013, 4:31am

Yes, without ‘weight’ settings it works normally.

percona.jayj · May 31, 2013, 6:59am

Then by all means file a bug! [URL=“Percona XtraDB Cluster”]http://www.percona.com/doc/percona-x...bugreport.html[/URL]

I haven’t tested this feature, so I can’t vouch for it.

f.demassis · October 15, 2015, 2:48pm

Hello,

I’m experiencing similar situation in an identical 2 nodes configuration (usefull for little shops with no shared storage).
When restarting both nodes it is necessary to clear Quorum (service mysql bootstrap-pcx on master node, probably works also on slave) in order to clearly restart the galera cluster. So every restart of both nodes require manual intervention.
I tried to reproduce the problem stopping each of the two nodes. Sequence of tests follows. Is there any patch or configuration of evs.* timing parameters that avoids this problems?

I can not post tests due to character number limits.

Thanks in advance,
Fabrizio

Topic		Replies	Views
Help! Node won't join Cluster after Restart. Percona XtraDB Cluster 5.x	1	918	November 22, 2017
Restart Node fails (ubuntu-16.04.3, percona-xtradb-cluster 5.7.19-29.22-3.xenial) Percona XtraDB Cluster 5.x	1	660	November 29, 2017
How to restart cluster after shutting down all nodes Percona XtraDB Cluster 5.x	2	5126	February 12, 2013
PXC8.0.32 - nodes not joining the cluster MySQL & MariaDB	6	25	March 1, 2025
Failed to rejoin the cluster Percona XtraDB Cluster 5.x	2	1281	May 21, 2016

Primary node restart failed

Related topics