HAProxy is failing on the basic path

I am following the basic installation steps from here Install with kubectl - Percona Operator for MySQL based on Percona XtraDB Cluster, but the HAProxy seems to be failing.

The logs I am seeing in the HAProxy:

The following values are used for PXC node 10.244.2.5 in backend galera-admin-nodes:
wsrep_local_state is 4; pxc_maint_mod is DISABLED; wsrep_cluster_status is Primary; 1 nodes are available
PXC node 10.244.2.5 for backend galera-admin-nodes is ok
[WARNING]  (1) : Process 249 exited with code 0 (Exit)
[NOTICE]   (1) : Reloading HAProxy
[WARNING]  (272) : Proxy galera-in stopped (cumulated conns: FE: 1, BE: 0).
[WARNING]  (272) : Proxy galera-admin-in stopped (cumulated conns: FE: 1, BE: 0).
[WARNING]  (272) : Proxy galera-replica-in stopped (cumulated conns: FE: 0, BE: 0).
[WARNING]  (272) : Proxy galera-mysqlx-in stopped (cumulated conns: FE: 0, BE: 0).
[WARNING]  (272) : Proxy stats stopped (cumulated conns: FE: 0, BE: 0).
[WARNING]  (272) : Proxy galera-nodes stopped (cumulated conns: FE: 0, BE: 1).
[WARNING]  (272) : Proxy galera-admin-nodes stopped (cumulated conns: FE: 0, BE: 1).
[WARNING]  (272) : Proxy galera-replica-nodes stopped (cumulated conns: FE: 0, BE: 0).
[WARNING]  (272) : Proxy galera-mysqlx-nodes stopped (cumulated conns: FE: 0, BE: 0).
[WARNING]  (272) : Proxy <HTTPCLIENT> stopped (cumulated conns: FE: 0, BE: 0).
[NOTICE]   (1) : New worker (304) forked
[NOTICE]   (1) : Loading success.
[NOTICE]   (1) : haproxy version is 2.6.12
[NOTICE]   (1) : path to executable is /usr/sbin/haproxy
[WARNING]  (1) : Former worker (272) exited with code 0 (Exit)
[NOTICE]   (1) : Reloading HAProxy

and after a while I see:

[...]
[WARNING]  (695) : kill 1067
[WARNING]  (695) : kill 1103
[WARNING]  (695) : kill 1116
[WARNING]  (695) : kill 1130
[WARNING]  (695) : kill 1131
[WARNING]  (695) : kill 1132
[WARNING]  (695) : kill 1145
[WARNING]  (695) : kill 1146
[WARNING]  (695) : kill 1147
[WARNING]  (1) : Exiting Master process...
[...]
[WARNING]  (1) : Process 1172 exited with code 0 (Exit)
[WARNING]  (1) : All workers exited. Exiting... (0)

which causes the container to restart.

On the operator I am seeing the following in repeat:

2023-11-08T15:33:59.238Z	INFO	Waiting for HAProxy to be ready before smart update	{"controller": "pxc-controller", "namespace": "default", "name": "cluster1", "reconcileID": "ae623b71-6b4f-44dd-a90e-beb3bc06cb38"}
2023-11-08T15:34:02.278Z	INFO	reconcile replication error	{"controller": "pxc-controller", "namespace": "default", "name": "cluster1", "reconcileID": "ae623b71-6b4f-44dd-a90e-beb3bc06cb38", "err": "get primary pxc pod: failed to get proxy connection: dial tcp 10.96.181.22:3306: connect: connection refused"}

and the Pods state is something like that:

$ kubectl get pods
NAME                                               READY   STATUS    RESTARTS      AGE
cluster1-haproxy-0                                 1/2     Running   2 (21s ago)   9m12s
cluster1-haproxy-1                                 1/2     Running   2 (7s ago)    7m22s
cluster1-haproxy-2                                 0/2     Pending   0             5m46s

The version is v1.13.0 and the steps are the default on the guide above.

Any input would be appreciated!

Hello @Marios_Cako ,

Thanks for reaching out! First thing you could do is to check why cluster-haproxy-2is pending with this command: kubectl describe pod cluster-haproxy-2.

Events will provide a clue as to why it is pending and from there we can troubleshoot further.
More often then not it is related to affinity configuration.

Thanks for the reply @Inel_Pandzic !

The cluster1-haproxy-2 becomes “1/2 Running” after a while with 1/2 Ready containers as the rest of the HAProxy Pods. The above was just an early snapshot.

Here are some of the events:

$ kubectl get events
38m         Normal    Killing                 pod/cluster1-haproxy-0                                  Stopping container haproxy
38m         Normal    Killing                 pod/cluster1-haproxy-0                                  Stopping container mysql-monit
[...]
26s         Warning   Unhealthy               pod/cluster1-haproxy-0                                  Readiness probe failed: ERROR 2013 (HY000): Lost connection to MySQL server at 'reading initial communication packet', system error: 2
33m         Warning   Unhealthy               pod/cluster1-haproxy-0                                  Liveness probe failed: ERROR 2013 (HY000): Lost connection to MySQL server at 'reading initial communication packet', system error: 2

or specifically for the cluster1-haproxy-2:

$ kubectl get events | grep 'pod/cluster1-haproxy-2'
41m         Normal    Killing                 pod/cluster1-haproxy-2                                  Stopping container haproxy
41m         Normal    Killing                 pod/cluster1-haproxy-2                                  Stopping container mysql-monit
34m         Normal    Scheduled               pod/cluster1-haproxy-2                                  Successfully assigned default/cluster1-haproxy-2 to playground-worker2
34m         Normal    Pulling                 pod/cluster1-haproxy-2                                  Pulling image "percona/percona-xtradb-cluster-operator:1.13.0"
34m         Normal    Pulled                  pod/cluster1-haproxy-2                                  Successfully pulled image "percona/percona-xtradb-cluster-operator:1.13.0" in 1.483983731s (1.483990595s including waiting)
34m         Normal    Created                 pod/cluster1-haproxy-2                                  Created container pxc-init
34m         Normal    Started                 pod/cluster1-haproxy-2                                  Started container pxc-init
34m         Normal    Pulling                 pod/cluster1-haproxy-2                                  Pulling image "percona/percona-xtradb-cluster-operator:1.13.0-haproxy"
34m         Normal    Pulled                  pod/cluster1-haproxy-2                                  Successfully pulled image "percona/percona-xtradb-cluster-operator:1.13.0-haproxy" in 10.434196568s (10.434202779s including waiting)
34m         Normal    Created                 pod/cluster1-haproxy-2                                  Created container haproxy
34m         Normal    Started                 pod/cluster1-haproxy-2                                  Started container haproxy
34m         Normal    Pulling                 pod/cluster1-haproxy-2                                  Pulling image "percona/percona-xtradb-cluster-operator:1.13.0-haproxy"
34m         Normal    Pulled                  pod/cluster1-haproxy-2                                  Successfully pulled image "percona/percona-xtradb-cluster-operator:1.13.0-haproxy" in 1.516360305s (1.516366458s including waiting)
34m         Normal    Created                 pod/cluster1-haproxy-2                                  Created container pxc-monit
34m         Normal    Started                 pod/cluster1-haproxy-2                                  Started container pxc-monit
9m54s       Warning   Unhealthy               pod/cluster1-haproxy-2                                  Readiness probe failed: ERROR 2013 (HY000): Lost connection to MySQL server at 'reading initial communication packet', system error: 2
24m         Warning   Unhealthy               pod/cluster1-haproxy-2                                  Liveness probe failed: ERROR 2013 (HY000): Lost connection to MySQL server at 'reading initial communication packet', system error: 2
4m53s       Warning   BackOff                 pod/cluster1-haproxy-2                                  Back-off restarting failed container haproxy in pod cluster1-haproxy-2_default(79b765c3-b4c1-4e0a-a520-d1569a25efee)

Note that this is the default installation, and it is consistently failing. I have tried to fresh install multiple times.

Hi @Marios_Cako, haproxy pods have liveness and readiness checks. That is why you can see these restarts. E.g.liveness check https://github.com/percona/percona-docker/blob/main/haproxy/dockerdir/usr/local/bin/liveness-check.sh#L11 The root of the issue is that script can’t connect to DB via haproxy. You need to check your DB status or something is wrong with a password for the monitor user.