DR Setup

Hi,

Im currently using XtraDB servers in a cluster for MySQL HA, and MaxScale servers for Load Balancing. But I only have a single site, I need to create a secondary site which is also active and has traffic coming to it to. From what I understand I cant have DR if all of the nodes are in a single cluster because I wont have a quorum if one of the sites disconnects, because you need more than 50% for it.
What is the best way to setup a DR solution for 2 active sites?

1 Like

@mkl262 Every time a customer tries to implement active/active like this, it always ends up being a huge pain or eventual failure. You should rearchitect to active/passive, vs active/active. Then, you can use standard async replication between the two clusters to keep the DR side in sync with the active side. Use GTIDs and pick one node from site A to be the source/master, and one node from site B to be the replica/slave. Setup normal async replication and all is good to go.

1 Like

Greetings @matthewb … so regarding multi-DC configurations where there is a desire to have an active/active deployment scenario, is PXC the wrong choice? I am considering options that will require north of six DC’s that operate in an active/active manner (could be much higher than six DC’s spread across the USA) and synchronize data. I do like the notion of cluster with PXC in a single site so that numerous app servers can talk to any and all of the cluster nodes. What are your thoughts for this type of a configuration?

1 Like

@aredman Hey there, and welcome. If you came to us as an official client with that design desire, I’d ask you why, and would want to hear all of your logic behind the choice. Many, many times our clients come to us with big massive complex desires which, in reality and in practice, never pan out and end up as an over-engineered solution to an extremely standard problem. With only your paragraph, 6 DCs attempting to be all active and in-sync feels over-engineered, and/or attempting to solve a problem that doesn’t exist.

In my head, this is where I compare you to other clients of ours. I’ve been directly involved with several high-end retailers setting up their MySQL environments for high-traffic black friday and nobody has that complex of a WAN DB setup.

Is PXC wrong? Not necessarily. You can certainly implement a 6 node PXC with one node in each DC, local applications connecting to local DB, and PXC maintaining quorum. But your commit speed on any node cannot be faster than the slowest link between any two nodes, since PXC used synchronous replication.

My example above is very typical: Primary DC has a full, local cluster and apps talk directly to all nodes (using ProxySQL or other middleware to load-balance read-only queries across all nodes while sending write traffic to a single node). In other DCs, using native mysql async replication, stand up another PXC and replicate traffic. App servers in this DC are also on standby. If your latency was acceptable, you could have apps in DR DCs talk directly to primary PXC DC (this is common setup too). Should you need to failover, the PXC in DR might be just a few transactions behind in lag, but otherwise in sync. Switch apps to connect to this PXC.

1 Like

Greetings @matthewb Thanks very much, Sir … pleasure to be here in the community, so thanks for the warm welcome. Let’s simplify this scenario down to two DC’s. The logic for active/active is that from an application experience lens, users that are geographically closer to one DC would not want to be directed to the farther DC, which of course would impact their latency and overall application experience. In this configuration, active/active is without a question an essential element of ensuring an optimum user experience.

With geographic load balancing based on very capable filter logic provided by next-gen DNS providers, it’s easy to groom users to one site vs. the other site. If, for example, the ‘app’ was a social media app that involved 5% writes and 95% reads as general behavior, ensuring equal access to ‘stream’ data, regardless of which DC the user may have been forwarded to, is critical. Taking that sort of scenario, is pxc good at each respective DC with sync occurring between defined ‘donor’ nodes?

Note: I’m not suggesting one pxc node per DC, but a pxc cluster per DC.

1 Like

@aredman What I’ve typically seen implemented with such a geolocalized architecture, is the apps do their 95% reads from the local cluster, but do take the WAN trip for those 5% writes to the primary database. This methodology helps keep all of the writes going to 1 node, helping to prevent any write-conflicts that may occur during active/active replication. And because writes happen so infrequently, the latency for write isn’t generally “felt” by the user.

Can you do PXC1 <–> PXC2 as master/master? Yes. You pick one node in each cluster to be the async source to the other cluster and a replica to receive from other’s source. This script can help manage failover in such a setup:

https://github.com/y-trudeau/Mysql-tools/blob/master/PXC/replication_manager.sh

2 Likes

Excellent input @matthewb. Thanks much for the detail.

1 Like