Hello. I am in the beginning stages of setting up a Percona XtraDB Cluster that is distributed across two separate data centers. While I have had great luck with the cluster in a single data center, I’m finding myself a bit confused on the ideal setup for this new build.
Would anyone have any suggestions as to the best way to distribute the nodes across two data centers? While we would like to keep costs down, we do not want to end up in a situation where we have a shutdown (split-brain, etc) due to not distributing the nodes properly.
Here are some thoughts that I’ve been rolling around thus far:
2 nodes in DC1, single node in DC2. Not sure if an arbitrator is needed here or not.
2 nodes in DC1, 2 nodes in DC2, with an arbitrator somewhere (not sure which DC).
3 nodes in DC1, 3 nodes in DC2, with an arbitrator somewhere (not sure which DC).
Hi, I’m also at a similar stage in planning a Percona deployment and am a little confused about an optimal setup between two datacentres.
I suspect, like me, you are looking at two potential failure states, the loss of a machine (node) and the loss of connectivity between the datacentres.
In the first case I believe at least 2 nodes need to be running in the cluster, so in order to allow for 1 to fail at least 2 must continue to be online - this would imply a minimum of 3 nodes.
However, if you had 3 nodes distributed with 2 at one datacentre and the third in the second datacentre then if the connection between the datacentres went down (or one of the datacentres was offline) you could be in the position where only 1 of your nodes is online - in which case the cluster would essentially be offline.
To avoid this you would need at least 2 nodes in each datacentre but since there would now be 4 nodes a network outage between datacentres would leave 2 pairs of nodes. I don’t know what would happen in this scenario since the quorum voting would be 50/50? Perhaps someone from Percona could comment on this configuration?
After reading the documentation I’m finally starting to understand why it suggests thinking in 3’s. If you were to have 3 datacentre locations with a total of 5 nodes (2 nodes in two datacentres and 1 node in the third datacentre) then even if one of the datacentres went offline the quorum voting would always be in favour of the working nodes (3 to 2 in favour OR 4 to 1 in favour) so the cluster should remain active.
In this 5 node scenario I believe it would allow for a node outage at any of the datacentres or the complete outage of a datacentre without the cluster going offline. There is the possibility that a machine (node) failure and the loss of a datacentre could happen at the same time which could be a problem so having the option to spin up an arbitrator at short notice at any of your datacentres might be wise (if you can use arbitrators like this in an emergency?).
I’d love to hear any further thoughts on this as I’m new to this area of computing (only been reading a couple of days) and would value any further insight into an optimal deployment plan with the minimum amount of outlay.