Primary node restart failed

I’ve setuped a simple Master-Slave using weighted quorum [URL=“http://www.codership.com/wiki/doku.php?id=weighted_quorum”]http://www.codership.com/wiki/doku.p...eighted_quorum[/URL]

node-1 is a master:
wsrep_provider_options=“pc.weight=1; gcs.fc_master_slave=yes”
wsrep_cluster_address=gcomm://node-1,node-2

node-2 is a slave:
wsrep_provider_options=“pc.weight=0; gcs.fc_master_slave=yes”
wsrep_cluster_address=gcomm://node-1,node-2

/etc/hosts on both servers has ip address mappings

When i restart mysql on master with service mysql restart it fails to join back cluster with a connection refused error in server log:

130529 8:59:31 [Note] WSREP: gcomm: connecting to group ‘my_wsrep_cluster’, peer ‘node-1:,node-2:’
130529 8:59:31 [Warning] WSREP: (9a694329-c85f-11e2-0800-3f3cae6230d1, ‘tcp://0.0.0.0:4567’) address ‘tcp://10.137.48.175:4567’ points to own listening address, blacklisting
130529 8:59:31 [Note] WSREP: (9a694329-c85f-11e2-0800-3f3cae6230d1, ‘tcp://0.0.0.0:4567’) address ‘tcp://10.137.48.175:4567’ pointing to uuid 9a694329-c85f-11e2-0800-3f3cae6230d1 is blacklisted, skipping
130529 8:59:31 [Note] WSREP: declaring b27d9646-c85b-11e2-0800-eeeca83be563 stable
130529 8:59:31 [Note] WSREP: view(view_id(NON_PRIM,9a694329-c85f-11e2-0800-3f3cae6230d1,16) memb {
9a694329-c85f-11e2-0800-3f3cae6230d1,
b27d9646-c85b-11e2-0800-eeeca83be563,
} joined {
} left {
} partitioned {
faec897c-c85e-11e2-0800-2721f22ff1fc,
})
130529 9:00:01 [ERROR] WSREP: failed to open gcomm backend connection: 110: failed to reach primary view: 110 (Connection timed out)
at gcomm/src/pc.cpp:connect():139
130529 9:00:01 [ERROR] WSREP: gcs/src/gcs_core.c:gcs_core_open():195: Failed to open backend connection: -110 (Connection timed out)
130529 9:00:01 [ERROR] WSREP: gcs/src/gcs.c:gcs_open():1290: Failed to open channel ‘my_wsrep_cluster’ at ‘gcomm://node-1,node-2’: -110 (Connection timed out)
130529 9:00:01 [ERROR] WSREP: gcs connect failed: Connection timed out
130529 9:00:01 [ERROR] WSREP: wsrep::connect() failed: 6
130529 9:00:01 [ERROR] Aborting

To get back node-1 to cluster i’ve started it with service mysql start --wsrep-cluster-address="gcomm://"

My question: is it expected behavior? it’s a bit strange to get connectivity error, while node-2 is up and port 4567 is listening. when cluster contains only one primary node is it safe to restart mysql daemon on it?

Can you pin this behavior on the weighted quorum? Does it behave normally if you get rid of the weight?

Yes, without ‘weight’ settings it works normally.

Then by all means file a bug! [URL=“Percona XtraDB Cluster”]http://www.percona.com/doc/percona-x...bugreport.html[/URL] :smiley:

I haven’t tested this feature, so I can’t vouch for it.

Hello,

I’m experiencing similar situation in an identical 2 nodes configuration (usefull for little shops with no shared storage).
When restarting both nodes it is necessary to clear Quorum (service mysql bootstrap-pcx on master node, probably works also on slave) in order to clearly restart the galera cluster. So every restart of both nodes require manual intervention.
I tried to reproduce the problem stopping each of the two nodes. Sequence of tests follows. Is there any patch or configuration of evs.* timing parameters that avoids this problems?

I can not post tests due to character number limits.

Thanks in advance,
Fabrizio