Hi,
We installed and configured a cluster of 3 nodes. The synchronization is good but when I stop mysql on one node, all nodes are desynchronized and don’t accept new connections.
==================== Configuration of galera: ====================
wsrep_provider=/usr/lib/libgalera_smm.so
wsrep_cluster_name=“db_cluster”
wsrep_slave_threads=12
wsrep_certify_nonPK=1
wsrep_max_ws_rows=131072
wsrep_max_ws_size=1073741824
wsrep_debug=0
wsrep_convert_LOCK_to_trx=0
wsrep_retry_autocommit=1
wsrep_auto_increment_control=1
wsrep_replicate_myisam=1
wsrep_drupal_282555_workaround=0
wsrep_causal_reads=0
wsrep_sst_method=rsync
server-id=3
wsrep_node_address=192.168.10.3
wsrep_cluster_address=“gcomm://”
wsrep_provider_options=“pc.weight=0; gcache.size=8G; evs.keepalive_period=PT3S; evs.inactive_check_period=PT10S; evs.suspect_timeout=PT30S; evs.inactive_timeout=PT1M; evs.consensus_timeout=PT1M; evs.send_window=1024; evs.user_send_window=512;”
================================================== =========
Can you help us please ?
EDIT :
To add some information, here is the log I get on one of the desynchronised node (mysql still running) :
2014-02-05 16:02:05 19183 [Note] WSREP: view(view_id(NON_PRIM,e7516d17-8e6a-11e3-b85c-6a6eb0de5350,2) memb {
e7516d17-8e6a-11e3-b85c-6a6eb0de5350,0
} joined {
} left {
} partitioned {
fc04cf52-8e6a-11e3-b0f9-93a4b1f2a1d9,0
})
2014-02-05 16:02:05 19183 [Note] WSREP: view(view_id(NON_PRIM,e7516d17-8e6a-11e3-b85c-6a6eb0de5350,3) memb {
e7516d17-8e6a-11e3-b85c-6a6eb0de5350,0
} joined {
} left {
} partitioned {
fc04cf52-8e6a-11e3-b0f9-93a4b1f2a1d9,0
})
2014-02-05 16:02:05 19183 [Note] WSREP: New COMPONENT: primary = no, bootstrap = no, my_idx = 0, memb_num = 1
2014-02-05 16:02:05 19183 [Note] WSREP: Flow-control interval: [16, 16]
2014-02-05 16:02:05 19183 [Note] WSREP: Received NON-PRIMARY.
2014-02-05 16:02:05 19183 [Note] WSREP: Shifting SYNCED → OPEN (TO: 192992574)
2014-02-05 16:02:05 19183 [Note] WSREP: New COMPONENT: primary = no, bootstrap = no, my_idx = 0, memb_num = 1
2014-02-05 16:02:05 19183 [Note] WSREP: Flow-control interval: [16, 16]
2014-02-05 16:02:05 19183 [Note] WSREP: Received NON-PRIMARY.
2014-02-05 16:02:05 19183 [Note] WSREP: New cluster view: global state: 03b25294-7b07-11e3-ac2e-362fc6d31d98:192992574, view# -1: non-Primary, number of nodes: 1, my index: 0, protocol version 2
2014-02-05 16:02:05 19183 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
2014-02-05 16:02:05 19183 [Note] WSREP: New cluster view: global state: 03b25294-7b07-11e3-ac2e-362fc6d31d98:192992574, view# -1: non-Primary, number of nodes: 1, my index: 0, protocol version 2
2014-02-05 16:02:05 19183 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
2014-02-05 16:02:06 19183 [Note] WSREP: (e7516d17-8e6a-11e3-b85c-6a6eb0de5350, ‘tcp://0.0.0.0:4567’) turning message relay requesting on, nonlive peers: tcp://192.168.10.1:4567
2014-02-05 16:02:07 19183 [Note] WSREP: (e7516d17-8e6a-11e3-b85c-6a6eb0de5350, ‘tcp://0.0.0.0:4567’) reconnecting to fc04cf52-8e6a-11e3-b0f9-93a4b1f2a1d9 (tcp://192.168.10.1:4567), attempt 0
2014-02-05 16:02:52 19183 [Note] WSREP: (e7516d17-8e6a-11e3-b85c-6a6eb0de5350, ‘tcp://0.0.0.0:4567’) reconnecting to fc04cf52-8e6a-11e3-b0f9-93a4b1f2a1d9 (tcp://192.168.10.1:4567), attempt 30
2014-02-05 16:03:37 19183 [Note] WSREP: (e7516d17-8e6a-11e3-b85c-6a6eb0de5350, ‘tcp://0.0.0.0:4567’) reconnecting to fc04cf52-8e6a-11e3-b0f9-93a4b1f2a1d9 (tcp://192.168.10.1:4567), attempt 60
2014-02-05 16:04:22 19183 [Note] WSREP: (e7516d17-8e6a-11e3-b85c-6a6eb0de5350, ‘tcp://0.0.0.0:4567’) reconnecting to fc04cf52-8e6a-11e3-b0f9-93a4b1f2a1d9 (tcp://192.168.10.1:4567), attempt 90
So this node try to connect to a node wich is down instead of stay in the cluster alone.
To force him to connect to himself and make a one node cluster synchronised, I have to force it by issuing :
mysql> set global wsrep_cluster_address=“gcomm://”;