I’m planning to implement Percona XtraDB Cluster, and was wondering if it’s possible to synchronise two clusters. Example:
Nodes in Cluster A (production environment): A1, A2, A3, A4
Nodes in Cluster B (disaster recovery environment): B1, B2, B3, B4
Ideally, both clusters would be fully synchronised in real-time. My current ideal would be to set up one node in each cluster to use MySQL replication (i.e. A1 → B1) which would then be populated to the rest of Cluster B.
In a disaster recovery situation, the replication would be broken, making Cluster B “live”.
Would this kind of setup work?
Are they on two datacenters ?
Do you use the WAN network four your replication ?
In this case you must use wsrep_provider_options parameters in your configuration (i.e evs.keepalived_period, evs.suspect_timeout,…)
Yes, the two clusters would indeed be in two datacentres, linked via an IPsec VPN over a WAN.
Your suggestion seems to be to create one big cluster and split the nodes between two datacentres, with replication over a WAN. From the Percona documentation, I understand that cluster replication is as quick as replication to your slowest node, so our intra-cluster replication would basically be limited to the speed of the WAN. Is this correct?
My other concern is that Cluster B should be read-only until the time when we switch to our disaster recovery solution. We should also be able to fail-back to Cluster A when the outage is over.
Would a “read_only” configuration line in my.cnf on the Cluster B nodes work (assuming we go with a two-cluster system as I described in my first post)?
Craig: 1 cluster over a WAN will have a commit penalty of the about the single worst RTT between any two nodes in your cluster. That is, likely the ping time of your WAN link.
In a two cluster scenario, you should be able to make cluster B read_only. This would effectively function like one big dual-master (or master/slave) DR setup with regular MySQL, and all the same caveats would apply about data loss on failover, repointing and recovering on a DR Failover, etc.
However, here’s a gotcha. If A1 replicates to B1 and either of those two failed, you basically have to rebuild all of B. There’s no easy way to failover async replication channels to other nodes in the cluster. Nope, you can’t guarantee binlog positions are the same, though you could fudge this a bit if you tried hard. This will be solved in Galera 3.0 on 5.6 using GTIDs, but for now you’re stuck.
Jay, I’m investigating the same setup. Any timeframe on PXC 5.6 and Galera 3.0?