MySQL 8.0.24 / 8.0.25 connection problems with `FEDERATED` storage engine (Communication link failure)

Since latest Percona Server for MySQL 8.0.25 upgrade, I am struggling with severe connection issues between two servers. I could track this down to FEDERATED storage engine.

On application/client level, the sporadic errors appear like this:

Communication link failure: 1160 Got an error writing communication packets
Communication link failure: 1156 Got packets out of order

It’s hard to debug as those errors don’t appear in mysql.err, not even with log_error_verbosity = 3. Looks like internal connections from FEDERATED tables to server are not getting logged.

Those errors only pop up after a couple of hours. Initially the workaround was to restart MySQL on the host with FEDERATED tables (see below, host backend) every 2-3 hours, which made the connection dropping disappear.

Here’s my setup (all on Percona Server for MySQL 8.0.25):

  • Host backend: PHP 8.0 application (using PDO/mysqlnd) with MySQL triggers that copy data from main application db to second mailsync database. mailsync database only contains tables of storage engine FEDERATED that write data to remote host mail
  • Host mail: REPLICATION MASTER for database mailsync
  • Host mx1: 1st REPLICATION SLAVE / REPLICA for database mailsync
  • Host mx2: 2nd REPLICATION SLAVE / REPLICA for database mailsync

I have that rather complex setup running in production for 7 months now on Percona Server for MySQL 8.0, without any issues until and including MySQL 8.0.23. Data in FEDERATED tables is accessed more or less frequently (5-10 times/hour). Only the latest upgrade to 8.0.25 broke it. The connection issues appear immediately without running into any timeout, and also affect simple SELECT queries accessing very little data.

I suspect this has something to do with the newly introduced connection management in MySQL 8.0.24:

https://dev.mysql.com/doc/relnotes/mysql/8.0/en/news-8-0-24.html#mysqld-8-0-24-connection-management

Connection Management Notes

Previously, if a client did not use the connection to the server within the period specified by the wait_timeout system variable and the server closed the connection, the client received no notification of the reason. Typically, the client would see Lost connection to MySQL server during query (CR_SERVER_LOST) or MySQL server has gone away (CR_SERVER_GONE_ERROR).

In such cases, the server now writes the reason to the connection before closing it, and client receives a more informative error message, The client was disconnected by the server because of inactivity. See wait_timeout and interactive_timeout for configuring this behavior. (ER_CLIENT_INTERACTION_TIMEOUT).

The previous behavior still applies for client connections to older servers and connections to the server by older clients.

As I have that same setup of 4 related MySQL servers running in production in two different companies (my own webhosting company and a similar scaled hosting company I manage the infrastructure for), I now have a great way I can prove it is really MySQL > 8.0.23 and FEDERATED storage engine related. I have downgraded the 4 servers (by full data dump and fresh MySQL reinstall / reloading all data) of one company to Percona Server for MySQL 8.0.23 - the problem no longer pops up!

On MySQL 8.0.25 I have also found a workaround to make the problem (nearly… 2 days of testing is not enough yet) disappear:

[mysqld]
interactive_timeout     = 86400
wait_timeout            = 86400

Raising the wait_timeout on all 3 hosts (mainly mail as FEDERATED table “server”, but I also raised those values on MySQL replication slaves, which was probably not needed) from default 8h to 24h is currently my best workaround. Before I tried to lower wait_timeout to 1h, and indeed the connection issues got much more frequent, 1+ hrs after restarting MySQL. So it looks like MySQL internally keeps track of its FEDERATED connections to the remote server and once the server with drops inactive connections after wait_timeout, FEDERATED still tries to re-use them and struggles with the new MySQL 8.0.24+ Connection Management. In my eyes, FEDERATED should silently try to reconnect in case a previous server connection got dropped due to a timeout, as it was before.

Can you tell if this is a known bug that was possibly already fixed in latest MySQL 8.0.26 or if I am the first one reporting those issues?

Thanks, Philip

2 Likes

Hi, same issue here. Are you sure that this is connected to the MySLQ update to > 8.0.24?

On Debian there were very long timeout settings by default. After changing them we also started to get these error messages on federated tables.

1 Like