Frequent sporadic "MySQL server has gone away" errors with Operator for MySQL (XtraDB Cluster) and HAProxy

jhwalker · May 16, 2025, 8:02am

Description:

We are running a Percona Operator for MySQL (XtraDB Cluster) cluster with 3 PXC replicas and 3 HAProxy replicas. This cluster is serving a PHP application that connects using PDO to the <cluster-name>-haproxy service.

The cluster appears to be healthy; however every few minutes we see an error like this (as reported from PHP):

SQLSTATE[HY000]: General error: 2006 MySQL server has gone away

Sometimes, this message is seen instead:

SQLSTATE[08S01]: Communication link failure: 1158 Got an error reading communication packets

Debugging attempted so far

We disabled HAProxy and connected directly to the <cluster-name>-pxc service instead. With this change applied, no further errors were encountered during the whole time the change was live. On switching back to <cluster-name>-haproxy, the errors have begun again.

We are using the default HAProxy config provided by the operator and I’m unsure where to start in trying to resolve this problem. I checked the HAProxy logs and there are a mixture of CD and SD error codes for termination_state (as far as I can see), e.g.:

[pod/mysql-haproxy-0/haproxy] {"time":"16/May/2025:07:52:20.337", "client_ip": "10.244.6.140", "client_port":"35524", "backend_source_ip": "10.244.7.208", "backend_source_port": "34858", "frontend_name": "galera-in", "backend_name": "galera-nodes", "server_name":"mysql-pxc-0", "tw": "1", "tc": "1", "Tt": "2", "bytes_read": "83", "termination_state": "SD", "actconn": "233", "feconn" :"232", "beconn": "231", "srv_conn": "231", "retries": "0", "srv_queue": "0", "backend_queue": "0" }

[pod/mysql-haproxy-0/haproxy] {"time":"16/May/2025:07:47:07.227", "client_ip": "10.244.6.140", "client_port":"46498", "backend_source_ip": "10.244.7.208", "backend_source_port": "40846", "frontend_name": "galera-in", "backend_name": "galera-nodes", "server_name":"mysql-pxc-0", "tw": "1", "tc": "148", "Tt": "318405", "bytes_read": "1189188", "termination_state": "CD", "actconn": "245", "feconn" :"244", "beconn": "243", "srv_conn": "243", "retries": "0", "srv_queue": "0", "backend_queue": "0" }

Any guidance on what to change to solve this would be very much appreciated please, as these errors are coming in regularly.

Version:

Operator: 1.15.0
PXC: percona/percona-xtradb-cluster:8.0.35
HAProxy: percona/haproxy:2.8.5

Slava_Sarzhan · May 16, 2025, 11:07am

Hi @jhwalker, please check this task Jira
Do you have any errors/warnings in PXC log?

jhwalker · May 16, 2025, 2:25pm

Thanks for the reply; unfortunately, it looks like I can’t see this task:

(Logged into JIRA using my forum email address)

There were no errors or warnings showing in the log.

Please let me know any other information you need to help. We are currently bypassing HAProxy again because this is a production service with PHP serving up to 100 reqs/second in peak time. We see typically about 1-2 errors per minute, sometimes more, so a very small percentage of overall requests, but enough to impact users and drown us in error reports.

We can start to use HAProxy again temporarily to collect debugging data. In the meantime, we are connecting to PXC via a custom Kubernetes Service that always targets the <cluster-name>-pxc-0 Pod; we will failover manually by updating the Service’s selector if the -0 Pod fails and we need to switch to a replica.

jhwalker · May 22, 2025, 7:32pm

@Slava_Sarzhan did you get a chance to check my reply above please?

Just wondering if that JIRA task may include relevant info or not, as I couldn’t access it. I’m still trying to get to the bottom of this problem.

I’ve attached a log file excerpt from mysql-pxc-0 Pod from earlier in the week, when we briefly started using HAProxy again (still giving errors).

log.txt (87.9 KB)

Topic		Replies	Views
XtraDB operators on older version dont have the HAProxy and Cluster working properly MySQL & MariaDB	3	762	January 6, 2022
Percona Operator, MySQL server has gone away, ProxySQL pods restart fixes the issue Percona XtraDB Cluster 8.x	1	486	February 7, 2024
HAProxy is failing on the basic path Percona Operator for MySQL	3	1181	December 7, 2023
Reg: Connection closed unexpectedly errors Percona XtraDB Cluster 8.x mysql , percona	1	645	October 24, 2022
Percona XtraDB Cluster - HAProxy - sysbench FAILED! Percona XtraDB Cluster 5.x	0	816	February 18, 2016

Frequent sporadic "MySQL server has gone away" errors with Operator for MySQL (XtraDB Cluster) and HAProxy

Description:

Debugging attempted so far

Version:

Related topics