Hi, I’m also at a similar stage in planning a Percona deployment and am a little confused about an optimal setup between two datacentres.
I suspect, like me, you are looking at two potential failure states, the loss of a machine (node) and the loss of connectivity between the datacentres.
In the first case I believe at least 2 nodes need to be running in the cluster, so in order to allow for 1 to fail at least 2 must continue to be online - this would imply a minimum of 3 nodes.
However, if you had 3 nodes distributed with 2 at one datacentre and the third in the second datacentre then if the connection between the datacentres went down (or one of the datacentres was offline) you could be in the position where only 1 of your nodes is online - in which case the cluster would essentially be offline.
To avoid this you would need at least 2 nodes in each datacentre but since there would now be 4 nodes a network outage between datacentres would leave 2 pairs of nodes. I don’t know what would happen in this scenario since the quorum voting would be 50/50? Perhaps someone from Percona could comment on this configuration?
After reading the documentation I’m finally starting to understand why it suggests thinking in 3’s. If you were to have 3 datacentre locations with a total of 5 nodes (2 nodes in two datacentres and 1 node in the third datacentre) then even if one of the datacentres went offline the quorum voting would always be in favour of the working nodes (3 to 2 in favour OR 4 to 1 in favour) so the cluster should remain active.
In this 5 node scenario I believe it would allow for a node outage at any of the datacentres or the complete outage of a datacentre without the cluster going offline. There is the possibility that a machine (node) failure and the loss of a datacentre could happen at the same time which could be a problem so having the option to spin up an arbitrator at short notice at any of your datacentres might be wise (if you can use arbitrators like this in an emergency?).
I’d love to hear any further thoughts on this as I’m new to this area of computing (only been reading a couple of days) and would value any further insight into an optimal deployment plan with the minimum amount of outlay.