[Frequent But Random] General error: 2013 Lost connection to MySQL server during query on simple select with limit

chhatra · September 24, 2024, 2:10pm

Hi,

I apologize for opening a new topic related to the “Lost connection issue” as there are many other similar topics.
My case is a bit different in that I frequently (once or twice a day) see the random error of “General error: 2013 Lost connection to MySQL server during query” on simple select with the limit statement on a 5-10 record table from our API log.
The strange thing is that the same API is called thousands of times/per day without any issues.

Could someone give me directions on how to debug/find the root cause of this?
Thanks in advance.

Debug/Info
Our PXC runs on Kubernetes (Vultr Cloud).
Our error logs show nothing alarming.
There is no restart information shown.
There seems to be no issue with network communication.
Our monitoring shows normal CPU/Memory usage.
The maximum connection also did not even reach half of the allowable value.

Usual Error
SQLSTATE[HY000]: General error: 9001 Max connect timeout reached while reaching hostgroup 10 after 10457ms (SQL: select * from x limit 1)
SQLSTATE[HY000]: General error: 2013 Lost connection to MySQL server during query (SQL: select DISTINCT with join)

matthewb · September 25, 2024, 12:28am

Have you looked at ProxySQL’s logs? Looks like there’s a timeout occurring between proxysql and the backend mysql.

chhatra · September 26, 2024, 12:59pm

Hi @matthewb
Thank you for helping me.

I was able to manage to see the error logs from proxysql as the following:

00:00:31 MySQL_Monitor.cpp:7780:monitor_galera_process_ready_tasks(): [ERROR] Timeout on Galera health check for cluster1-pxc-0.X.X:3306 after 1229ms. If the server is overload, increase mysql-monitor_galera_healthcheck_timeout.

00:00:35 MySQL_Monitor.cpp:2214:monitor_galera_thread(): [ERROR] Error on Galera check for cluster1-pxc-0.X.X:3306 after 1001ms. Unable to create a connection. If the server is overload, increase mysql-monitor_connect_timeout. Error: timeout or error in creating new connection: Can't connect to MySQL server on 'Y.Y.Y.Y' (110)

00:00:36 MySQL_Session.cpp:1706:handler_again___status_PINGING_SERVER(): [ERROR] Ping timeout during ping on cluster1-pxc-0.X.X:3306 after 200245us (timeout 200ms)

Then, after 30 seconds or so, the cluster is online again.

What strange is that, if I tried to see the ping error logs at ProxySQL, I see no errors at all.

select * from mysql_server_ping_log;
> No error log shown (null)

If I tried to see the error logs at MySQL backend, the following is the only error I see:

select * from stats_mysql_errors limit 100;

> error: WSREP has not yet prepared node for application use

Could I know If the error is related to network issue at some particular moments, will increasing mysql-monitor_galera_healthcheck_timeout help in such a case?
Thanks in advance.

matthewb · September 26, 2024, 5:04pm

A ping timeout would be indications of a network issue. When ProxySQL is having this issue, can you connect directly to the mysql backend? If not, that’s an issue, then try connecting manually from the proxysql server to mysql backend and see if that has issues.

chhatra · September 28, 2024, 2:14am

Hi @matthewb
Thank you for your help.
Because the issue only happened around 30 seconds at a time, it is really not possible for me to debug based on your suggestion.
But It seems like the issue is because our ProxySQL cannot reach MySQL’s backend at those points of time.

In the case of the network loss for 30 seconds or so like this, will adjusting mysql-monitor_galera_healthcheck_timeout help?
I am sorry I am very new to this. I am not sure what might be the complications if I adjust the health check timeout.

Please kindly give me any suggestions.

matthewb · September 28, 2024, 1:50pm

That’s not normal network behavior. Do you have a faulty switch? Are you using hostnames/DNS anywhere? NAT/Firewall device?

Set up fping on proxysql server to run in the background, pinging several machines, local and remote. If a remote always has ping uptime, but you see loss to local network, then something else is wrong.

chhatra · October 3, 2024, 2:49pm

@matthewb Thank you for your help.

Sorry for not being able to get back to you earlier.
Actually, we host our PXC on a cloud provider (no hostname, only k8s SVC names).
It seems that there are times that cause this issue due to network bottlenecks/high traffic loads at the cloud provider side.
That is why it’s random.

If the issue only happens for 30-40 seconds at that random time, would it be possible to increase the health check timeout for this matter?

Thank you in advance for your help.

matthewb · October 3, 2024, 4:04pm

Yes, you can increase the health check timeout for this purpose.

chhatra · October 19, 2024, 10:52pm

Hi @matthewb
Thank you very much for your help.
I am sorry for not able to reply sooner.

I have not made any updates to the health check variable yet but as of now, we don’t really see the problem anymore.
I guess the network issue is solved at the cloud provider (VPS’s host node network issue).

Thank you.

Topic		Replies	Views
Proxysql have some issue to connect perocona cluster? Percona Server for MySQL 8.0 mysql	2	1620	February 27, 2025
Proxysql Error Lost connection to mysql server ProxySQL	4	5469	December 29, 2020
Percona XtraDB Cluster in K8s Percona XtraDB Cluster 5.x	1	593	March 14, 2019
Repeatedly getting galera timeout on PROXYSQL connected to PXC 8 Cluster Percona XtraDB Cluster 8.x percona	8	2537	February 18, 2023
Percona Cluster ERROR 2013 (HY000): Lost connection to MySQL server during query Percona XtraDB Cluster 5.x	1	6845	June 22, 2015

[Frequent But Random] General error: 2013 Lost connection to MySQL server during query on simple select with limit

Related topics