First Cluster Crash, do i need a fourth node or garbd ?

Hello,

I have a running cluster with 3 nodes for 8 months. I’m using it only as Master / Backup / Backup (My Applications don’t handle very well mult-master-write) with HAProxy. I’m happy with this configuration, i can update everything (kernel every week…), doing some backup / restore for testing purpose on backup node etc…

Today i had my first crash :

  • in the morning i’m losing one node C but my monitor didn’t alert me and i was on another issue…
  • in the afternoon, a little latency in network and node A and node B just stop :

NODE A :

2018-09-17T12:36:30.074323Z 0 [Note] WSREP: (2e38e14b, 'tcp://0.0.0.0:4567') connection to peer b3803219 with addr tcp://10.0.0.65:4567 timed out, no messages seen in PT3S (gmcast.peer_timeout)
2018-09-17T12:36:30.074514Z 0 [Note] WSREP: (2e38e14b, 'tcp://0.0.0.0:4567') connection to peer 0dde40fe with addr tcp://10.0.0.64:4567 timed out, no messages seen in PT3S (gmcast.peer_timeout)
2018-09-17T12:36:30.074556Z 0 [Note] WSREP: (2e38e14b, 'tcp://0.0.0.0:4567') turning message relay requesting on, nonlive peers: tcp://10.0.0.64:4567 tcp://10.0.0.65:4567
2018-09-17T12:36:31.074793Z 0 [Note] WSREP: (2e38e14b, 'tcp://0.0.0.0:4567') reconnecting to 0dde40fe (tcp://10.0.0.64:4567), attempt 0
2018-09-17T12:36:31.075179Z 0 [Note] WSREP: (2e38e14b, 'tcp://0.0.0.0:4567') reconnecting to b3803219 (tcp://10.0.0.65:4567), attempt 0
2018-09-17T12:36:31.910630Z 0 [Note] WSREP: (2e38e14b, 'tcp://0.0.0.0:4567') connection established to 0dde40fe tcp://10.0.0.64:4567
2018-09-17T12:36:31.987476Z 0 [Note] WSREP: (2e38e14b, 'tcp://0.0.0.0:4567') connection established to b3803219 tcp://10.0.0.65:4567
2018-09-17T12:36:32.075490Z 0 [Note] WSREP: declaring node with index 0 suspected, timeout PT5S (evs.suspect_timeout)
2018-09-17T12:36:32.075548Z 0 [Note] WSREP: declaring node with index 2 suspected, timeout PT5S (evs.suspect_timeout)
2018-09-17T12:36:32.075566Z 0 [Note] WSREP: evs::proto(2e38e14b, OPERATIONAL, view_id(REG,0dde40fe,693)) suspecting node: 0dde40fe
2018-09-17T12:36:32.075607Z 0 [Note] WSREP: evs::proto(2e38e14b, OPERATIONAL, view_id(REG,0dde40fe,693)) suspected node without join message, declaring inactive
2018-09-17T12:36:32.075634Z 0 [Note] WSREP: evs::proto(2e38e14b, OPERATIONAL, view_id(REG,0dde40fe,693)) suspecting node: b3803219
2018-09-17T12:36:32.075644Z 0 [Note] WSREP: evs::proto(2e38e14b, OPERATIONAL, view_id(REG,0dde40fe,693)) suspected node without join message, declaring inactive
2018-09-17T12:36:32.077115Z 0 [Note] WSREP: (2e38e14b, 'tcp://0.0.0.0:4567') connection established to 0dde40fe tcp://10.0.0.64:4567
2018-09-17T12:36:32.082668Z 0 [Note] WSREP: (2e38e14b, 'tcp://0.0.0.0:4567') connection established to b3803219 tcp://10.0.0.65:4567
2018-09-17T12:36:33.075902Z 0 [Note] WSREP: Current view of cluster as seen by this node
view (view_id(NON_PRIM,0dde40fe,693)
memb {
2e38e14b,0
}
joined {
}
left {
}
partitioned {
0dde40fe,0
b3803219,0
}
)
2018-09-17T12:36:33.076069Z 0 [Note] WSREP: Current view of cluster as seen by this node
view (view_id(NON_PRIM,2e38e14b,694)
memb {
2e38e14b,0
}
joined {
}
left {
}
partitioned {
0dde40fe,0
b3803219,0
}
)
2018-09-17T12:36:33.076083Z 0 [Note] WSREP: New COMPONENT: primary = no, bootstrap = no, my_idx = 0, memb_num = 1
2018-09-17T12:36:33.076135Z 0 [Note] WSREP: Flow-control interval: [100, 100]
2018-09-17T12:36:33.076143Z 0 [Note] WSREP: Trying to continue unpaused monitor
2018-09-17T12:36:33.076150Z 0 [Note] WSREP: Received NON-PRIMARY.
2018-09-17T12:36:33.076158Z 0 [Note] WSREP: Shifting SYNCED -> OPEN (TO: 293303616)
2018-09-17T12:36:33.076244Z 2 [Note] WSREP: New cluster view: global state: e94fa648-0d4e-11e7-b53b-cf35fa571ba4:293303616, view# -1: non-Primary, number of nodes: 1, my index: 0, protocol version 3
2018-09-17T12:36:33.076293Z 2 [Note] WSREP: Setting wsrep_ready to false
2018-09-17T12:36:33.076301Z 2 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
2018-09-17T12:36:33.076275Z 0 [Note] WSREP: New COMPONENT: primary = no, bootstrap = no, my_idx = 0, memb_num = 1
2018-09-17T12:36:33.076361Z 0 [Note] WSREP: Flow-control interval: [100, 100]
2018-09-17T12:36:33.076369Z 0 [Note] WSREP: Trying to continue unpaused monitor
2018-09-17T12:36:33.076375Z 0 [Note] WSREP: Received NON-PRIMARY.


NODE B :


2018-09-17T12:36:29.910338Z 0 [Note] WSREP: (0dde40fe, 'tcp://0.0.0.0:4567') connection to peer 2e38e14b with addr tcp://10.0.0.61:4567 timed out, no messages seen in PT3S (gmcast.peer_timeout)
2018-09-17T12:36:29.910529Z 0 [Note] WSREP: (0dde40fe, 'tcp://0.0.0.0:4567') turning message relay requesting on, nonlive peers: tcp://10.0.0.61:4567
2018-09-17T12:36:30.910501Z 0 [Note] WSREP: (0dde40fe, 'tcp://0.0.0.0:4567') reconnecting to 2e38e14b (tcp://10.0.0.61:4567), attempt 0
2018-09-17T12:36:31.911475Z 0 [Note] WSREP: (0dde40fe, 'tcp://0.0.0.0:4567') connection established to 2e38e14b tcp://10.0.0.61:4567
2018-09-17T12:36:32.077385Z 0 [Note] WSREP: (0dde40fe, 'tcp://0.0.0.0:4567') connection established to 2e38e14b tcp://10.0.0.61:4567
2018-09-17T12:36:32.081116Z 0 [Note] WSREP: declaring b3803219 at tcp://10.0.0.65:4567 stable
2018-09-17T12:36:32.083570Z 0 [Note] WSREP: Node 0dde40fe state primary
2018-09-17T12:36:32.085508Z 0 [Note] WSREP: Current view of cluster as seen by this node
view (view_id(PRIM,0dde40fe,694)
memb {
0dde40fe,0
b3803219,0
}
joined {
}
left {
}
partitioned {
2e38e14b,0
}
)
2018-09-17T12:36:32.085528Z 0 [Note] WSREP: Save the discovered primary-component to disk
2018-09-17T12:36:32.085769Z 0 [Note] WSREP: forgetting 2e38e14b (tcp://10.0.0.61:4567)
2018-09-17T12:36:32.085803Z 0 [Note] WSREP: New COMPONENT: primary = yes, bootstrap = no, my_idx = 0, memb_num = 2
2018-09-17T12:36:32.086112Z 0 [Note] WSREP: STATE_EXCHANGE: sent state UUID: 4e82208f-ba76-11e8-9d76-8360ec68fa9d
2018-09-17T12:36:34.912390Z 0 [Note] WSREP: cleaning up 2e38e14b (tcp://10.0.0.61:4567)
2018-09-17T12:36:35.412545Z 0 [Note] WSREP: (0dde40fe, 'tcp://0.0.0.0:4567') connection to peer b3803219 with addr tcp://10.0.0.65:4567 timed out, no messages seen in PT3S (gmcast.peer_timeout)
2018-09-17T12:36:36.412788Z 0 [Note] WSREP: (0dde40fe, 'tcp://0.0.0.0:4567') reconnecting to b3803219 (tcp://10.0.0.65:4567), attempt 0
2018-09-17T12:36:37.079189Z 0 [Note] WSREP: (0dde40fe, 'tcp://0.0.0.0:4567') connection established to 2e38e14b tcp://10.0.0.61:4567
2018-09-17T12:36:37.590590Z 0 [Note] WSREP: declaring node with index 1 suspected, timeout PT5S (evs.suspect_timeout)
2018-09-17T12:36:37.590668Z 0 [Note] WSREP: evs::proto(0dde40fe, GATHER, view_id(REG,0dde40fe,694)) suspecting node: b3803219
2018-09-17T12:36:37.590691Z 0 [Note] WSREP: evs::proto(0dde40fe, GATHER, view_id(REG,0dde40fe,694)) suspected node without join message, declaring inactive
2018-09-17T12:36:38.090835Z 0 [Note] WSREP: declaring node with index 1 inactive (evs.inactive_timeout)
2018-09-17T12:36:39.091407Z 0 [Note] WSREP: Current view of cluster as seen by this node
view (view_id(NON_PRIM,0dde40fe,694)
memb {
0dde40fe,0
}
joined {
}
left {
}
partitioned {
b3803219,0
}
)
2018-09-17T12:36:39.091515Z 0 [Note] WSREP: declaring 2e38e14b at tcp://10.0.0.61:4567 stable
2018-09-17T12:36:39.091724Z 0 [Note] WSREP: New COMPONENT: primary = no, bootstrap = no, my_idx = 0, memb_num = 1
2018-09-17T12:36:39.091755Z 0 [Note] WSREP: Flow-control interval: [100, 100]
2018-09-17T12:36:39.091770Z 0 [Note] WSREP: Trying to continue unpaused monitor
2018-09-17T12:36:39.091783Z 0 [Note] WSREP: Received NON-PRIMARY.
2018-09-17T12:36:39.091797Z 0 [Note] WSREP: Shifting SYNCED -> OPEN (TO: 293303616)
2018-09-17T12:36:39.091884Z 2 [Note] WSREP: New cluster view: global state: e94fa648-0d4e-11e7-b53b-cf35fa571ba4:293303616, view# -1: non-Primary, number of nodes: 1, my index: 0, protocol version 3
2018-09-17T12:36:39.091903Z 2 [Note] WSREP: Setting wsrep_ready to false
2018-09-17T12:36:39.091931Z 2 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.



I just had to restart node A with /etc/init.d/mysql bootstrap-pxc after editing gratstate.dat (same file on both node A and B, node C was different). Then cluster came back with 2 nodes and i restart node C.

What is the best solution for me ? Just adding a new node ? or Galera Aritrator is possible ?

A fourth node will be the same process as the others, Galera will be a new thing to manage without consuming space for data.

What’s happenng if i reboot / crash / losing Galera Arbitrator ?

Thanks

Yathus

Hi Yathus,

After you lost C, your cluster had no longer a minimal size for HA, which is 3. So, any issue with one of the remaining nodes would lead to no-primary issue as the quorum cannot be established.

In case of network split, both nodes, as long as they were still up, should keep trying to reconnect. I wonder why this did not happen in that case? Did you restart them before the network was back up? "a little latency " should not break the cluster, even if two nodes, permanently. Can you share more what happened next in the logs to get some idea on this?

Anyway, if whole cluster is stopped, indeed bootstrapping is needed.
And yes, 3 nodes are a minimum needed to allow for any node or connection failure. Arbitrator is treated as normal node except it does not need SST nor stores data. But if you have let’s say one real node and one arbitrator - killing the latter will also stop the first one (quorum needed).
You may find this article useful btw: [url]https://www.percona.com/blog/2014/09/01/galera-replication-how-to-recover-a-pxc-cluster/[/url]

Yes, Node C wasn’t down. On the console screen i saw some log about memory / hang / may be “oom”. I have no log after reboot about this problem (maybe a disk error ? it’s a VM).

After hang, on node A :

2018-09-17T12:36:39.091822Z 0 [Note] WSREP: New COMPONENT: primary = no, bootstrap = no, my_idx = 1, memb_num = 2
2018-09-17T12:36:39.091851Z 0 [Note] WSREP: Flow-control interval: [141, 141]
2018-09-17T12:36:39.091858Z 0 [Note] WSREP: Trying to continue unpaused monitor
2018-09-17T12:36:39.091864Z 0 [Note] WSREP: Received NON-PRIMARY.
2018-09-17T12:36:39.091937Z 2 [Note] WSREP: New cluster view: global state: e94fa648-0d4e-11e7-b53b-cf35fa571ba4:293303616, view# -1: non-Primary, number of nodes: 2, my index: 1, protocol version 3
2018-09-17T12:36:39.091985Z 2 [Note] WSREP: Setting wsrep_ready to false
2018-09-17T12:36:39.091992Z 2 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
2018-09-17T12:36:40.077770Z 0 [Note] WSREP: (2e38e14b, 'tcp://0.0.0.0:4567') connection to peer 00000000 with addr tcp://10.0.0.65:4567 timed out, no messages seen in PT3S (gmcast.peer_timeout)
2018-09-17T12:36:44.078484Z 0 [Note] WSREP: (2e38e14b, 'tcp://0.0.0.0:4567') connection to peer 00000000 with addr tcp://10.0.0.65:4567 timed out, no messages seen in PT3S (gmcast.peer_timeout)
2018-09-17T12:36:48.079201Z 0 [Note] WSREP: (2e38e14b, 'tcp://0.0.0.0:4567') connection to peer 00000000 with addr tcp://10.0.0.65:4567 timed out, no messages seen in PT3S (gmcast.peer_timeout)
2018-09-17T12:36:52.579776Z 0 [Note] WSREP: (2e38e14b, 'tcp://0.0.0.0:4567') connection to peer 00000000 with addr tcp://10.0.0.65:4567 timed out, no messages seen in PT3S (gmcast.peer_timeout)
2018-09-17T12:36:57.080387Z 0 [Note] WSREP: (2e38e14b, 'tcp://0.0.0.0:4567') connection to peer 00000000 with addr tcp://10.0.0.65:4567 timed out, no messages seen in PT3S (gmcast.peer_timeout)
2018-09-17T12:37:01.081165Z 0 [Note] WSREP: (2e38e14b, 'tcp://0.0.0.0:4567') connection to peer 00000000 with addr tcp://10.0.0.65:4567 timed out, no messages seen in PT3S (gmcast.peer_timeout)
2018-09-17T12:37:05.581968Z 0 [Note] WSREP: (2e38e14b, 'tcp://0.0.0.0:4567') connection to peer 00000000 with addr tcp://10.0.0.65:4567 timed out, no messages seen in PT3S (gmcast.peer_timeout)
2018-09-17T12:37:09.582640Z 0 [Note] WSREP: (2e38e14b, 'tcp://0.0.0.0:4567') connection to peer 00000000 with addr tcp://10.0.0.65:4567 timed out, no messages seen in PT3S (gmcast.peer_timeout)
2018-09-17T12:37:13.583271Z 0 [Note] WSREP: (2e38e14b, 'tcp://0.0.0.0:4567') connection to peer 00000000 with addr tcp://10.0.0.65:4567 timed out, no messages seen in PT3S (gmcast.peer_timeout)
2018-09-17T12:37:17.583903Z 0 [Note] WSREP: (2e38e14b, 'tcp://0.0.0.0:4567') connection to peer 00000000 with addr tcp://10.0.0.65:4567 timed out, no messages seen in PT3S (gmcast.peer_timeout)
2018-09-17T12:37:21.584580Z 0 [Note] WSREP: (2e38e14b, 'tcp://0.0.0.0:4567') connection to peer 00000000 with addr tcp://10.0.0.65:4567 timed out, no messages seen in PT3S (gmcast.peer_timeout)


On node B :

2018-09-17T12:36:39.092365Z 0 [Note] WSREP: New COMPONENT: primary = no, bootstrap = no, my_idx = 0, memb_num = 2
2018-09-17T12:36:39.092404Z 0 [Note] WSREP: Flow-control interval: [141, 141]
2018-09-17T12:36:39.092419Z 0 [Note] WSREP: Trying to continue unpaused monitor
2018-09-17T12:36:39.092432Z 0 [Note] WSREP: Received NON-PRIMARY.
2018-09-17T12:36:39.092478Z 2 [Note] WSREP: New cluster view: global state: e94fa648-0d4e-11e7-b53b-cf35fa571ba4:293303616, view# -1: non-Primary, number of nodes: 2, my index: 0, protocol version 3
2018-09-17T12:36:39.092492Z 2 [Note] WSREP: Setting wsrep_ready to false
2018-09-17T12:36:39.092504Z 2 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
2018-09-17T12:36:39.095673Z 580178 [Warning] WSREP: Send action {(nil), 528, TORDERED} returned -107 (Transport endpoint is not connected)
2018-09-17T12:36:39.095714Z 580179 [Warning] WSREP: Send action {(nil), 1472, TORDERED} returned -107 (Transport endpoint is not connected)
2018-09-17T12:36:39.413579Z 0 [Note] WSREP: (0dde40fe, 'tcp://0.0.0.0:4567') connection to peer 00000000 with addr tcp://10.0.0.65:4567 timed out, no messages seen in PT3S (gmcast.peer_timeout)
2018-09-17T12:36:43.414439Z 0 [Note] WSREP: (0dde40fe, 'tcp://0.0.0.0:4567') connection to peer 00000000 with addr tcp://10.0.0.65:4567 timed out, no messages seen in PT3S (gmcast.peer_timeout)
2018-09-17T12:36:47.415673Z 0 [Note] WSREP: (0dde40fe, 'tcp://0.0.0.0:4567') connection to peer 00000000 with addr tcp://10.0.0.65:4567 timed out, no messages seen in PT3S (gmcast.peer_timeout)
2018-09-17T12:36:51.416491Z 0 [Note] WSREP: (0dde40fe, 'tcp://0.0.0.0:4567') connection to peer 00000000 with addr tcp://10.0.0.65:4567 timed out, no messages seen in PT3S (gmcast.peer_timeout)
2018-09-17T12:36:55.417342Z 0 [Note] WSREP: (0dde40fe, 'tcp://0.0.0.0:4567') connection to peer 00000000 with addr tcp://10.0.0.65:4567 timed out, no messages seen in PT3S (gmcast.peer_timeout)
2018-09-17T12:36:59.418238Z 0 [Note] WSREP: (0dde40fe, 'tcp://0.0.0.0:4567') connection to peer 00000000 with addr tcp://10.0.0.65:4567 timed out, no messages seen in PT3S (gmcast.peer_timeout)
2018-09-17T12:37:02.189337Z 580224 [Note] Access denied for user 'root'@'localhost' (using password: NO)
2018-09-17T12:37:03.919100Z 0 [Note] WSREP: (0dde40fe, 'tcp://0.0.0.0:4567') connection to peer 00000000 with addr tcp://10.0.0.65:4567 timed out, no messages seen in PT3S (gmcast.peer_timeout)


I think my HAProxy detect Node A “down”, then send connection to Node B and create split brain…

I know this doc, i used it to restart my cluster. I will had a fourth node then.