Not the answer you need?
Register and ask your own question!

Cluster down with 1/3 node down

DelardDelard ContributorInactive User Role Beginner
Hi,

We installed and configured a cluster of 3 nodes. The synchronization is good but when I stop mysql on one node, all nodes are desynchronized and don't accept new connections.

==================== Configuration of galera: ====================
wsrep_provider=/usr/lib/libgalera_smm.so
wsrep_cluster_name="db_cluster"
wsrep_slave_threads=12
wsrep_certify_nonPK=1
wsrep_max_ws_rows=131072
wsrep_max_ws_size=1073741824
wsrep_debug=0
wsrep_convert_LOCK_to_trx=0
wsrep_retry_autocommit=1
wsrep_auto_increment_control=1
wsrep_replicate_myisam=1
wsrep_drupal_282555_workaround=0
wsrep_causal_reads=0
wsrep_sst_method=rsync

server-id=3
wsrep_node_address=192.168.10.3
wsrep_cluster_address="gcomm://"
wsrep_provider_options="pc.weight=0; gcache.size=8G; evs.keepalive_period=PT3S; evs.inactive_check_period=PT10S; evs.suspect_timeout=PT30S; evs.inactive_timeout=PT1M; evs.consensus_timeout=PT1M; evs.send_window=1024; evs.user_send_window=512;"

================================================== =========

Can you help us please ?

EDIT :

To add some information, here is the log I get on one of the desynchronised node (mysql still running) :

2014-02-05 16:02:05 19183 [Note] WSREP: view(view_id(NON_PRIM,e7516d17-8e6a-11e3-b85c-6a6eb0de5350,2) memb {
e7516d17-8e6a-11e3-b85c-6a6eb0de5350,0
} joined {
} left {
} partitioned {
fc04cf52-8e6a-11e3-b0f9-93a4b1f2a1d9,0
})
2014-02-05 16:02:05 19183 [Note] WSREP: view(view_id(NON_PRIM,e7516d17-8e6a-11e3-b85c-6a6eb0de5350,3) memb {
e7516d17-8e6a-11e3-b85c-6a6eb0de5350,0
} joined {
} left {
} partitioned {
fc04cf52-8e6a-11e3-b0f9-93a4b1f2a1d9,0
})
2014-02-05 16:02:05 19183 [Note] WSREP: New COMPONENT: primary = no, bootstrap = no, my_idx = 0, memb_num = 1
2014-02-05 16:02:05 19183 [Note] WSREP: Flow-control interval: [16, 16]
2014-02-05 16:02:05 19183 [Note] WSREP: Received NON-PRIMARY.
2014-02-05 16:02:05 19183 [Note] WSREP: Shifting SYNCED -> OPEN (TO: 192992574)
2014-02-05 16:02:05 19183 [Note] WSREP: New COMPONENT: primary = no, bootstrap = no, my_idx = 0, memb_num = 1
2014-02-05 16:02:05 19183 [Note] WSREP: Flow-control interval: [16, 16]
2014-02-05 16:02:05 19183 [Note] WSREP: Received NON-PRIMARY.
2014-02-05 16:02:05 19183 [Note] WSREP: New cluster view: global state: 03b25294-7b07-11e3-ac2e-362fc6d31d98:192992574, view# -1: non-Primary, number of nodes: 1, my index: 0, protocol version 2
2014-02-05 16:02:05 19183 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
2014-02-05 16:02:05 19183 [Note] WSREP: New cluster view: global state: 03b25294-7b07-11e3-ac2e-362fc6d31d98:192992574, view# -1: non-Primary, number of nodes: 1, my index: 0, protocol version 2
2014-02-05 16:02:05 19183 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
2014-02-05 16:02:06 19183 [Note] WSREP: (e7516d17-8e6a-11e3-b85c-6a6eb0de5350, 'tcp://0.0.0.0:4567') turning message relay requesting on, nonlive peers: tcp://192.168.10.1:4567
2014-02-05 16:02:07 19183 [Note] WSREP: (e7516d17-8e6a-11e3-b85c-6a6eb0de5350, 'tcp://0.0.0.0:4567') reconnecting to fc04cf52-8e6a-11e3-b0f9-93a4b1f2a1d9 (tcp://192.168.10.1:4567), attempt 0
2014-02-05 16:02:52 19183 [Note] WSREP: (e7516d17-8e6a-11e3-b85c-6a6eb0de5350, 'tcp://0.0.0.0:4567') reconnecting to fc04cf52-8e6a-11e3-b0f9-93a4b1f2a1d9 (tcp://192.168.10.1:4567), attempt 30
2014-02-05 16:03:37 19183 [Note] WSREP: (e7516d17-8e6a-11e3-b85c-6a6eb0de5350, 'tcp://0.0.0.0:4567') reconnecting to fc04cf52-8e6a-11e3-b0f9-93a4b1f2a1d9 (tcp://192.168.10.1:4567), attempt 60
2014-02-05 16:04:22 19183 [Note] WSREP: (e7516d17-8e6a-11e3-b85c-6a6eb0de5350, 'tcp://0.0.0.0:4567') reconnecting to fc04cf52-8e6a-11e3-b0f9-93a4b1f2a1d9 (tcp://192.168.10.1:4567), attempt 90

So this node try to connect to a node wich is down instead of stay in the cluster alone.
To force him to connect to himself and make a one node cluster synchronised, I have to force it by issuing :
mysql> set global wsrep_cluster_address="gcomm://";

Comments

  • DelardDelard Contributor Inactive User Role Beginner
    Up :)

    And to add some information again, I found a way to work around the problem by adding pc.ignore_sb = yes in wsrep_provider_options.

    Does somebody have an idea on this please ?
  • madhusudanmadhusudan Contributor Current User Role Beginner
    Do not use split brain(pc.ignore_sb), unless its emergency.
    How did you setup the cluster.? did you follow the standard procedure..? http://www.percona.com/doc/percona-x...tallation.html

    Try this...
    disable pc.ignore_sb by commenting it out.
    Double check the my.cnf configuration on all nodes, & set the gcomm values accordingly(replace node1,node2,node3 with their IPs).
    node1 -> gcomm://
    node2 ->gcomm://node1,node2,node3
    node3 ->gcomm://node1,node2,node3

    Then after all nodes synched change the gcomm value of node1 to gcomm://node1,node2,node3 and restart mysql on that node1.

    ​To check if nodes are synced or not, login into the mysql prompt of any node and enter this command
    show status like 'wsrep%';
  • DelardDelard Contributor Inactive User Role Beginner
    Yeah I didn't use pc.ignore_sb. It was just to try to be more explicit.

    The thing is, I used to not mention the ip of the node in gcomm://, like this :

    node1 -> gcomm://
    node2 ->gcomm://node1,node3
    node3 ->gcomm://node1,node2

    And yes, the nodes was synced using this configuration and checking via show status like 'wsrep%';
    I will give a try to your config to see if there is some change.
    I also upgraded to the last stable release and the problem is the same.

    # dpkg -l | grep percona
    ii percona-toolkit 2.2.6 all Advanced MySQL and system command-line tools
    ii percona-xtrabackup 2.1.7-721-1.wheezy amd64 Open source backup tool for InnoDB and XtraDB
    ii percona-xtradb-cluster-client-5.6 5.6.15-25.3-711.wheezy amd64 Percona Server database client binaries
    ii percona-xtradb-cluster-common-5.6 5.6.15-25.3-711.wheezy amd64 Percona Server database common files (e.g. /etc/mysql/my.cnf)
    ii percona-xtradb-cluster-galera-3.x 189.wheezy amd64 Galera components of Percona XtraDB Cluster
    ii percona-xtradb-cluster-server-5.6 5.6.15-25.3-711.wheezy amd64 Percona Server database server binaries
  • madhusudanmadhusudan Contributor Current User Role Beginner
    The reason I told to use IP's is no need for DNS lookup, if DNS fails, then the nodes cannot see each other!, only thing u have to make sure is the IP's should be static.
    also check any firewall or other network issue that's preventing these nodes to connect each other.
Sign In or Register to comment.

MySQL, InnoDB, MariaDB and MongoDB are trademarks of their respective owners.
Copyright ©2005 - 2020 Percona LLC. All rights reserved.