Bootstrap node create another cluster after network problem

pindi · September 29, 2017, 9:16am

Hi all,
I encountered a strange problem with an installation of Percona XtraDB Cluster v. 5.7.
My configuration respect the standard installation guide, 3 nodes (all master) with one node started how bootstrap at cluster startup time; HAProxy is configured on the cluster with failover check, with “leastconn” configuration (how described here: [URL=“https://www.percona.com/doc/percona-xtradb-cluster/LATEST/howtos/haproxy.html”]https://www.percona.com/doc/percona-...s/haproxy.html[/URL]).
My cluster work well but we have encountered a disaster scenario when, for some reason, node-1 (the bootstrapped) had network problems so HAProxy exclude it from the cluster and node-2 ed node-3 preserve all funcionality as expected. And here the problem: node-1 is returned online and, for reasons unknown to me, if a write occurred on Primary Component (node-2 & node-3), it create another cluster with the result that HAProxy, on the basis of “clustercheck” script, re-add node-1 to “global cluster” causing a Split-brain scenario with random writing on node-1 and node-2/node-3.
Is this a normal behavior? There is a problem in my configuration? If the connettivity problem happen on node-2 or node-3 the cluster it behaves as expected (node-2 go offline for some seconds, node-1 and node-3 keep active the cluster, write on cluster some data, reconnect node-2 than resync own data with donor node and finally node-2 is added to cluster ready to work).

Thanks so much.

pindi · October 12, 2017, 8:50am

Up please!

Kenn_Takara · October 16, 2017, 2:31pm

Hi pindi,

How are you bootstrapping node-1? It sounds like node-1 is restarting as a bootstrapped node (so it thinks of itself as a new cluster).

pindi · October 23, 2017, 9:11am

Hi Kenn,
yes, node-1 starting in bootstrapping mode (how described in the official guide) but in my case it did not restart, it lost connectivity to rest of cluster and when return online create new cluster instead of aligning with the remaining nodes. Is that a wrong configuration? How should it work to avoid connectivity problems? I assumed that bootstrapping configuration (for 1 node) is the standard way for starting the cluster: why does the node create a new cluster if it returns online after accidentally network problem? More generally, what should be the ideal configuration for a production environment to avoid problems of this type?

Thanks so much.

bdelmedico · October 23, 2017, 2:23pm

all nodes can ping?

What is the error log?

It is likely that you have to tweak grastate.dat.

pindi · October 26, 2017, 3:23am

Yes, all node ping each others and the for error log I have to replicate the scenario (stay tuned!).
What do you mean with “It is likely that you have to tweak grastate.dat”? My installation (3 CentOS 7 VMs with HAProxy + Keepalived + Percona XtraDB Cluster) follows the configuration in the official guide, where there is a Bootstrap node and two other nodes all with weight “1”. Why if

I unplug the bootstrap node from the network
write some data to other two node
reconnect bootstrap node to the network

node 1 (the bootstrapper) creates a new cluster (creating the disastrous scenario in which you can write in 2 different cluster - split brain scenario)?
My Percona XtraDB Cluster, configured based on this guide - https://www.percona.com/doc/percona-xtradb-cluster/LATEST/howtos/centos_howto.html - is in multi master configuration or not?
How I mean it, multi-master cluster should avoid this particular (but not rare) situation.

P.s.: bdelmedico, are you italian?

bdelmedico · October 30, 2017, 10:56am

I’m half Italian and I live in Brazil

my scenario is very similar to yours, but I use ubuntu … you only use the bootstrap to upload the first node, then you can go up normally, after the cluster is in the air you can stop any node that it will continue to function usually … even if it is the bootstrap node.

at least in my case the haproxy is based on the clustercheck response to release the node as active in loadbalance.

I unplug the bootstrap node from the network
haproxy will identify and send the requests to the next node.
write some data to other two node
OK
reconnect bootstrap node to the network
Percona will be responsible for replicating the data,

pindi · November 9, 2017, 8:33am

What do you mean with “then you can go up normally”? In the official guide is not mentioned anywhere that I have to “disable” bootstrapper mode for node-1 after start MySQL process. I suppose that after started, node-1 is “master” as the others.
Furthermore, in your scenario you jumped a step: after point 1 you must write some data to node-1, then write some data to other two node and only now try to reconnect network of node-1. This can happen when haproxy do not notice in time that node-1 is “out of sync” (result of clusterchek command). Or, for example, if there is a network error between nodes BUT node-1 is still reachable into network. Of course, I can adjust haproxy parameters to mitigate this problem but this is exactly what happened during my tests.
It is important to note how this problem not happened if losing connections are node 2 or 3 (in this case cluster work as expected).

Topic		Replies	Views
Percona XtraDB CLuster Disaster Recovery Issue Percona XtraDB Cluster 5.x	0	602	May 17, 2013
How to stop and start an XtraDB Mysql cluster of 3 nodes/ Percona XtraDB Cluster 8.x	7	1735	September 19, 2023
Bootstrap Node Percona XtraDB Cluster 5.x	5	683	June 27, 2018
Bootstrap with Existing UUID Percona XtraDB Cluster 5.x	5	1717	July 5, 2018
Issue with joining to bootstrapped node Percona XtraDB Cluster 5.x	8	1263	September 16, 2013

Bootstrap node create another cluster after network problem

Related topics