Advise on application connectivity to PXC cluster deployed on nodes spread across two regions

Hi,
We have PXC 8.0 cluster deployed on Kubernetes as stateful set 5 with multi master on nodes spread across two regions. Separate services were created on each region to load balance the traffic across the available PXC pods in the cluster. HAProxy was also deployed on the cluster.

Can you please advise on the below?

  1. Is this recommended to create separate services in each region to allow applications connect to database within the same region?
  2. If yes, How does resiliency work in case one of the pod goes offline and comes back after say 30 mins? As the pod joins back the service in Kubernetes, does the data gets replicated automatically as part of pod startup before traffic starts routed by Kubernetes service?
  3. If not, do you recommend to have application connect to haproxy service ( as its already deployed ) so that writes are directed to only one node and resiliency is automatically taken care of ? Currently, application doesn’t have the configuration to segregate write and read operations.
  4. High SQL commits ( ~100msecs )are observed , how can I debug further ?
  5. PMM is not enabled yet, do you recommend to have it enabled on production environment?

Thank you!

Hi VithalAkunuri,

I will only comment about the PXC stuff and leave the K8s issues for someone else to reply.

Deploying PXC in multiple regions will affect performance severely and is strongly discouraged. For reference there is a blogpost about this: https://www.percona.com/blog/how-not-to-do-mysql-high-availability-geographic-node-distribution-with-galera-based-replication-misuse/

The reason is that all nodes in the topology needs to communicate in real time (not only when write comes through) so all activity will be funneled and slowed down to the network speed. For refrence you can read more about this on this blogpost https://www.percona.com/blog/investigating-mysql-replication-latency-in-percona-xtradb-cluster/

Regards

1 Like

This is because your cluster is split between regions. The fastest you can commit is equal to the slowest latency between any 2 nodes. Put all 5 nodes into the same region, and configure haproxy to write to a single node.

To maximize PXC, you will need to make this modification. HAProxy does not understand SQL, so the app must decide. Or you can deploy ProxySQL which does understand SQL, and can route connections based on SELECT or not.

Thanks @matthewb for your reply. Can you please comment if its recommended to use Kubernetes service for application connection ?

Yes, use K8S service for application connections. The K8S service knows when/if pods restart and automatically maintains the backend mapping of IP->pod.

Thanks @matthewb

After the pod restart , does it immediately accepts the write requests ? The fact that the pod may not be in sync with other nodes of the cluster for a brief period of time , this node could be in non-primary until the replication completes?