Application DR test

I have application servers with database connection actively running on DC1. Likewise, there are application servers on DC2, but they do not send any requests to the database.

It is valid for my database servers with similar architecture.In the 5 node cluster structure, percona mysql works as master-master as database production environment in version 8.0.29. 3 nodes are located on dc1 and 2 nodes are located on dc2.

What I want to do here is make my application servers on dc2 accessible in case there is an access problem on dc1. I want to ensure the continuity of the production environment over dc2 by performing database server access over dc2. I set my firewall and network rules accordingly.

It’s a small-scale disaster scenario.

There are three questions I want to ask here.

Question1: Is continuity ensured when I route the database data entry to one of the nodes on the dc2? is there any problem.

Question2: In fallback scenario, will databases on DC1 cause problems when I point apps to DC1?

Question3: If I want to close DC1 databases, I started bootstap on node 1, how do I close and open this node?

1 Like

Hi @bthnklc thank you for posting your question to the Percona forums!

  1. Continuity is ensured when you route queries to dc2 because Percona XtraDB Cluster (PXC) automatically replicates all database changes to every member of the cluster. The only issue to be aware of is if there is latency on the link between DC1 <-> DC2 then there is the chance that events are not yet replicated if you choose to read from DC2, you could experience a stale read. The scenario would arise if you do read-after-write - Write to DC1 and then immediately read from DC2 . You may need to work with the variable wsrep_sync_wait Index of wsrep system variables - Percona XtraDB Cluster
  2. You are permitted to move queries between DC1 and DC2 as you wish. For performance reasons we recommend you only write to a single member of the cluster. Issues will arise if hte network has high latency / is unstable between DC1 <-> DC2, as mentioned in #1 .
  3. If you want to turn off databases in DC1 and run entirely in DC2, then it is simply a matter of turning off your instances in DC1 one at a time. They will gracefully leave the cluster and DC2 will remain in PRIMARY state, able to accept queries.

Hi @Michael_Coburn ,
Thank you for your return. I would like to point it out again to be clear enough. I aim to realize the DR scenario on an application basis. Applications on DC1 will be closed and applications on DC2 will be opened. Applications opened on DC2 will only be written on the database node on DC2.

I will close the applications on DC1 and open the applications on DC2, then there will be no requests to the database nodes on DC1 and the requests DC2(app) to DC2(db) will come to one of the nodes on DC2. Is there any inconsistency at the Database level at this point?

Do I have to close the databases on dc1 while doing this process?

If I’m going to close it, I opened the database with bootstrap on node1, how should I close this node?

Hi @bthnklc , thank you for clarifying your post

So it depends on your failure condition. If the network between DC1 <-> DC2 remains up and all PXC nodes remain communicating while you stop your app servers in DC1 and start them in DC2, then you will be fine - no action needed on the PXC side.

If however you envision a DR scenario where DC1 just goes offline and the network connection is severed, you will need to take action. Specifically you will have lost quorum, meaning you have 3 out of 5 servers just “went offline” and thus the 2 remaining PXC servers in DC2 will have less than 50% membership (2/5) and thus will go into NON-PRIMARY state, which means they will not accept queries. To remedy this you will need to do what is known as “boostrapping” where you will take down 1 of the 2 instances so that you have a single running PXC server and then connect to it and run:

SET GLOBAL wsrep_provider_options='pc.bootstrap=YES';

Then you can start up the 2nd PXC node in DC2, which will connect to the cluster and you’ll be running on two PXC nodes in DC2 in PRIMARY state.