HA-Proxy doesn't start to allow Connections as soon as a minimum number of required pxc Pods are ready

hamster · February 22, 2022, 11:44am

Hello!

I’ve installed version 1.10.0 of the mysql percona operator onto a k8s cluster. The cluster is physically separated onto 3 different locations. Two locations have each 2 nodes and the third location contains the fifth node. I’ve configured manual PVs and a StorageClass. I’ve configured the StatefulSet of haproxy and pxc to have each 5 pods deployed. I did that in cr.yaml. My frontend application can connect to the PXC Cluster and everything works fine. I can kill single pods and my frontend application still works. The deleted pod comes back up and joins back into the cluster. Great.

I want to test further realistic HA failure scenarios. Eg. all physical sites loose power thus the whole k8s cluster is now gone. Power gets reestablished, but only on two sites. The third site has an electrical fault which can’t be fixed easily.

When I now kill all PXC pods to simulate this unplanned downtime of the whole cluster with k delete pod cluster1-pxc-{0..4} following happens:

all cluster1-pxc pods get terminated (as expected)
the readiness probes of all the cluster1-haproxy pods changes from 2/2 to 1/2 due to the unavailability of the PXC cluster (as expected)
the cluster1-pxc pods gets started one after another starting with pod suffix zero (as expected)

…but here comes the problem! The critical issue is that unless ALL (meaning 5/5) PXC pods are up an running again, the haproxy cluster won’t be allowing connections to the PXC cluster. In my scenario explained above one site doesn’t have electricity for now. So I am stuck with 3/5 ready PXC pods and 1/2 ready state of the haproxy pods and my frontend application is no longer working, even tho the PXC cluster is formed.

How do I tell haproxy to allow connections as soon as a quorum is reached in the PXC cluster?

Thank you for your input.

P.S.: Interestingly when deploying the cr.yaml the first time ha-proxy starts serve connections as soon as PXC quorum is reached.

I also tried to play with the PodDisruptionBuget but alass no improvement on that front either.

Sergey_Pronin · February 28, 2022, 8:09am

Hey @hamster ,

this is a good question and use case. We will research the ways how to fix it.

I have raised the bug here: [K8SPXC-953] HAProxy does not serve traffic unless all PXC Pods are up and running - Percona JIRA

Sergey_Pronin · May 1, 2023, 12:01pm

We reviewed it, and it seems that it is a valid behavior.
When all the nodes go down, the cluster is considered crashed.
We must wait for all nodes to go up to recover from the crash, otherwise there is a chance that we miss some transactions.
It is not related to the way Operator works, but more about how sync-replication should operate.

Topic		Replies	Views
PXC cluster for mysql is not choosing the secondary as primary Percona XtraDB Cluster 5.x mysql , percona	2	348	September 10, 2024
Percona Operator for MySQL based on Percona xtradb cluster HA problems Percona Operator for MySQL	2	824	August 18, 2023
Can you help me setup pxc cluster on k8s Percona Operator for MySQL	11	1097	August 25, 2021
Cluster status and backups not working Percona Operator for MySQL	5	1057	October 28, 2021
Percona Cluster helm 1.10.0 start error Percona Operator for MySQL percona	5	1123	May 22, 2025

HA-Proxy doesn't start to allow Connections as soon as a minimum number of required pxc Pods are ready

Related topics