How to restart cluster after shutting down all nodes

Simple question. I have 3 nodes (this is just in a test environment). I shut down all 3 nodes. Trying to start one up and it fails because it can not peer with the other nodes.

The setting for wsrep_cluster_address=gcomm://192.168.1.2:13306,192.168.1.3:13306

and the current host is 192.168.1.1

Do I need to reset anything or change the address back to “gcomm://”?

This is the error log excerpt:

Here is the error output:

Here is the error output:130118 18:51:08 mysqld_safe Starting mysqld daemon with databases from /home/y/var/mysql/data130118 18:51:08 mysqld_safe WSREP: Running position recovery with --log_error=/tmp/tmp.ScYNmlX5mh130118 18:51:14 mysqld_safe WSREP: Recovered position 2d891e1c-6191-11e2-0800-eb98ba2f6aa0:4130118 18:51:14 [Note] WSREP: wsrep_start_position var submitted: '2d891e1c-6191-11e2-0800-eb98ba2f6aa0:4’130118 18:51:14 [Note] WSREP: Read nil XID from storage engines, skipping position init130118 18:51:14 [Note] WSREP: wsrep_load(): loading provider library '/home/y/lib64/libgalera_smm.so’130118 18:51:14 [Note] WSREP: wsrep_load(): Galera 2.2(rexported) by Codership Oy <info@codership.com> loaded succesfully.130118 18:51:14 [Note] WSREP: Found saved state: 2d891e1c-6191-11e2-0800-eb98ba2f6aa0:-1130118 18:51:14 [Note] WSREP: Reusing existing ‘/home/y/var/mysql/data//galera.cache’.130118 18:51:14 [Note] WSREP: Passing config to GCS: base_host = 10.193.85.22; base_port = 4567; cert.log_conflicts = no; evs.inactive_check_period = PT10S; evs.inactive_timeout = PT1M; evs.install_timeout = PT1M; evs.keepalive_period = PT3S; evs.suspect_timeout = PT30S; gcache.dir = /home/y/var/mysql/data/; gcache.keep_pages_size = 0; gcache.mem_size = 0; gcache.name = /home/y/var/mysql/data//galera.cache; gcache.page_size = 128M; gcache.size = 128M; gcs.fc_debug = 0; gcs.fc_factor = 1; gcs.fc_limit = 16; gcs.fc_master_slave = NO; gcs.max_packet_size = 64500; gcs.max_throttle = 0.25; gcs.recv_q_hard_limit = 9223372036854775807; gcs.recv_q_soft_limit = 0.25; gcs.sync_donor = NO; gmcast.listen_addr = tcp://0.0.0.0:13306; ist.receive_addr = bf1-daredb-001.mysql.ngd.bf1.yahoo.com:9999; replicator.causal_read_timeout = PT30S; replicator.commit_order = 3130118 18:51:14 [Note] WSREP: Assign initial position for certification: 4, protocol version: -1130118 18:51:14 [Note] WSREP: wsrep_sst_grab()130118 18:51:14 [Note] WSREP: Start replication130118 18:51:14 [Note] WSREP: Setting initial position to 2d891e1c-6191-11e2-0800-eb98ba2f6aa0:4130118 18:51:14 [Note] WSREP: protonet asio version 0130118 18:51:14 [Note] WSREP: backend: asio130118 18:51:14 [Note] WSREP: GMCast version 0130118 18:51:14 [Note] WSREP: (08a0cf10-61a0-11e2-0800-839711d030e6, ‘tcp://0.0.0.0:13306’) listening at tcp://0.0.0.0:13306130118 18:51:14 [Note] WSREP: (08a0cf10-61a0-11e2-0800-839711d030e6, ‘tcp://0.0.0.0:13306’) multicast: , ttl: 1130118 18:51:14 [Note] WSREP: EVS version 0130118 18:51:14 [Note] WSREP: PC version 0130118 18:51:14 [Note] WSREP: gcomm: connecting to group ‘galeratest’, peer 'bf1-daredb-002.mysql.ngd.bf1.yahoo.com:13306,gq1-daredb-001.mysql.ngd.gq1.yahoo.com:13306’130118 18:51:17 [Warning] WSREP: no nodes coming from prim view, prim not possible130118 18:51:17 [Note] WSREP: view(view_id(NON_PRIM,08a0cf10-61a0-11e2-0800-839711d030e6,1) memb { 08a0cf10-61a0-11e2-0800-839711d030e6,} joined {} left {} partitioned {})130118 18:51:27 [Note] WSREP: declaring 10241359-61a0-11e2-0800-bb625e54cfae stable130118 18:51:27 [Warning] WSREP: no nodes coming from prim view, prim not possible130118 18:51:27 [Note] WSREP: view(view_id(NON_PRIM,08a0cf10-61a0-11e2-0800-839711d030e6,2) memb { 08a0cf10-61a0-11e2-0800-839711d030e6, 10241359-61a0-11e2-0800-bb625e54cfae,} joined {} left {} partitioned {})130118 18:51:47 [ERROR] WSREP: failed to open gcomm backend connection: 110: failed to reach primary view: 110 (Connection timed out) at gcomm/src/pc.cpp:connect():157130118 18:51:47 [ERROR] WSREP: gcs/src/gcs_core.c:gcs_core_open():195: Failed to open backend connection: -110 (Connection timed out)130118 18:51:47 [ERROR] WSREP: gcs/src/gcs.c:gcs_open():1290: Failed to open channel ‘galeratest’ at ‘gcomm://bf1-daredb-002.mysql.ngd.bf1.yahoo.com:13306,gq1-daredb-001.mysql.ngd.gq1.yahoo.com:13306’: -110 (Connection timed out)130118 18:51:47 [ERROR] WSREP: gcs connect failed: Connection timed out130118 18:51:47 [ERROR] WSREP: wsrep::connect() failed: 6130118 18:51:47 [ERROR] Aborting130118 18:51:47 [Note] WSREP: Service disconnected.130118 18:51:48 [Note] WSREP: Some threads may fail to exit.130118 18:51:48 [Note] /home/y/libexec64/mysqld: Shutdown complete130118 18:51:48 mysqld_safe mysqld from pid file /home/y/var/mysql/mysqld.pid ended

Thanks, Partha

You should choose one node, usually the one that you know it has the latest changes. Then start it with gcomm:// to create a new cluster:

/etc/init.d/mysql start --wsrep-cluster-address=“gcomm://”

Hi!

I have the same question.
If I configure the three servers with wsrep-cluster-address="gcomm://,, " and comes a power outage to all three servers, when they come back, no one will create the cluster.

If I set one server to wsrep-cluster-address=“gcomm://”, if this server restart, it will create a new group, so it’s not a solution.

Others problems that I could imagine… guessing that a server could create the cluster if it is not already created. What happen if one server start but his network interface is down? It will create one cluster, but maybe another one already exists.

I was thinking in a script to check the state of the others nodes, and if it could reach at least two of them (supposing that the cluster is three servers), one is selected to start the cluster.

What is the real solution??

Thanks!