I’ve run into the same problem, but:
- CNI is already Calico 3.26.1 on Kubernetes 1.27.2
- Operator is 1.13
- ANSI_QUOTES are not enabled
Differences are that the pxc/mysql pods are all working fine with no errors, and I can access them from the HAProxy pod directly using DNS Names and various ports without a problem.
kk -n percona get pods
NAME READY STATUS RESTARTS AGE
dev01-1-haproxy-0 2/3 Running 3 (3m7s ago) 27m
dev01-1-pxc-0 2/2 Running 0 28m
dev01-1-pxc-1 2/2 Running 0 28m
dev01-1-pxc-2 2/2 Running 0 29m
percona-xtradb-cluster-operator-f879dfdf4-f2nzs 1/1 Running 0 58m
Failure on the HAProxy pod:
+ exec haproxy -W -db -f /etc/haproxy-custom/haproxy-global.cfg -f /etc/haproxy/pxc/haproxy.cfg -p /etc/haproxy/pxc/haproxy.pid -S /etc/haproxy/pxc/haproxy-main.sock
[NOTICE] (1) : New worker (10) forked
[NOTICE] (1) : Loading success.
[WARNING] (10) : kill 27
[WARNING] (10) : Server galera-nodes/dev01-1-pxc-0 is DOWN, reason: External check timeout, code: 0, check duration: 10003ms. 0 active and 2 backup servers left. Running on backup. 0 sessions active, 0 requeued, 0 remaining in queue.
[WARNING] (10) : kill 28
[WARNING] (10) : Backup Server galera-nodes/dev01-1-pxc-2 is DOWN, reason: External check timeout, code: 0, check duration: 10003ms. 0 active and 1 backup servers left. Running on backup. 0 sessions active, 0 requeued, 0 remaining in queue.
[WARNING] (10) : kill 29
[WARNING] (10) : Backup Server galera-nodes/dev01-1-pxc-1 is DOWN, reason: External check timeout, code: 0, check duration: 10001ms. 0 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
[ALERT] (10) : backend 'galera-nodes' has no server available!
[WARNING] (10) : kill 30
[WARNING] (10) : Server galera-admin-nodes/dev01-1-pxc-0 is DOWN, reason: External check timeout, code: 0, check duration: 10001ms. 0 active and 2 backup servers left. Running on backup. 0 sessions active, 0 requeued, 0 remaining in queue.
[WARNING] (10) : kill 31
[WARNING] (10) : Backup Server galera-admin-nodes/dev01-1-pxc-2 is DOWN, reason: External check timeout, code: 0, check duration: 10001ms. 0 active and 1 backup servers left. Running on backup. 0 sessions active, 0 requeued, 0 remaining in queue.
[WARNING] (10) : kill 32
[WARNING] (10) : Backup Server galera-admin-nodes/dev01-1-pxc-1 is DOWN, reason: External check timeout, code: 0, check duration: 10001ms. 0 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
[ALERT] (10) : backend 'galera-admin-nodes' has no server available!
[WARNING] (10) : kill 33
[WARNING] (10) : Server galera-replica-nodes/dev01-1-pxc-0 is DOWN, reason: External check timeout, code: 0, check duration: 10001ms. 2 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
[WARNING] (10) : kill 34
[WARNING] (10) : Server galera-replica-nodes/dev01-1-pxc-1 is DOWN, reason: External check timeout, code: 0, check duration: 10001ms. 1 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
[WARNING] (10) : kill 35
[WARNING] (10) : Server galera-replica-nodes/dev01-1-pxc-2 is DOWN, reason: External check timeout, code: 0, check duration: 10001ms. 0 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
[ALERT] (10) : backend 'galera-replica-nodes' has no server available!
[WARNING] (10) : kill 36
[WARNING] (10) : Server galera-mysqlx-nodes/dev01-1-pxc-0 is DOWN, reason: External check timeout, code: 0, check duration: 10001ms. 0 active and 2 backup servers left. Running on backup. 0 sessions active, 0 requeued, 0 remaining in queue.
[WARNING] (10) : kill 37
[WARNING] (10) : Backup Server galera-mysqlx-nodes/dev01-1-pxc-2 is DOWN, reason: External check timeout, code: 0, check duration: 10001ms. 0 active and 1 backup servers left. Running on backup. 0 sessions active, 0 requeued, 0 remaining in queue.
[WARNING] (10) : kill 38
[WARNING] (10) : Backup Server galera-mysqlx-nodes/dev01-1-pxc-1 is DOWN, reason: External check timeout, code: 0, check duration: 10001ms. 0 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
[ALERT] (10) : backend 'galera-mysqlx-nodes' has no server available!
I created a custom image and added >> appends to the check_pxc.sh echo’s file to output values to /tmp/external_checks.log but the file isn’t even created leading me to believe the external check is not even being called, or there is some permission error in doing so - running manually from a pod shell works and creates and populates the file without issue:
bash-5.1$ cat /usr/local/bin/check_pxc.sh |grep ">>"
echo $PXC_SERVER_IP >> /tmp/external_check.log
echo "The following values are used for PXC node $PXC_SERVER_IP in backend $HAPROXY_PROXY_NAME: " >> /tmp/external_check.log
echo "wsrep_local_state is ${PXC_NODE_STATUS[0]}; pxc_maint_mod is ${PXC_NODE_STATUS[1]}; wsrep_cluster_status is ${PXC_NODE_STATUS[2]}; $AVAILABLE_NODES nodes are available" >> /tmp/external_check.log
echo "PXC node $PXC_SERVER_IP for backend $HAPROXY_PROXY_NAME is ok" >> /tmp/external_check.log
echo "PXC node $PXC_SERVER_IP for backend $HAPROXY_PROXY_NAME is not ok" >> /tmp/external_check.log
bash-5.1$ /usr/local/bin/check_pxc.sh '' '' dev1-pxc-1.dev1-pxc.percona.svc.cluster.local
bash-5.1$ cat /tmp/external_check.log
dev1-pxc-1.dev1-pxc.percona.svc.cluster.local
The following values are used for PXC node dev1-pxc-1.dev1-pxc.percona.svc.cluster.local in backend :
wsrep_local_state is 4; pxc_maint_mod is DISABLED; wsrep_cluster_status is Primary; 3 nodes are available
PXC node dev1-pxc-1.dev1-pxc.percona.svc.cluster.local for backend is ok