Hello,
just joined here. We’re experimenting with a Galera cluster on MariaDB,and while we were able to set up a 3 nodes cluster without issues, we managed to get a bit perplexed while experimenting to gain some confidence.
So, we have a 3 nodes setup on
- mariadb 10.5.10
- galera cluster 4.6
and it’s working flawless.
- We decided to add a 4th node in another datacenter for the purpose of evaluating latency. Mistakenly, the network the new node is configured such that, while allowing it to connect to the cluster, doesn’t let the cluster connect to it.
- The new node joins the cluster, a donor node is selected for rsync SST.
- SST fails because the rsync connection can’t be established, and for some reason, the donor node immediately sees the network as partitioned, and shuts itself down, going into a state where wsrep_ready=OFF, wsrep_connected=OFF, and while mariadb appears to be running, the wsrep provider appears dead.
- The new node is disconnected from the cluster, re-connects, a new donor is selected, and things repeat until the entire cluster is down and unresponsive.
- Each node is in a state that the mysql process can’t be terminated gracefully, so we need to kill each, and go through the process of boostrapping the cluster.
That’s a bit unsettling. I understand I can fix the network issues, but, still…