Communications link failure in Percona MySQL 5.7.24


We are randomly getting errors in our Percona mysql db and the connecting application when running a “select * from Table” query on a table that has ~ 19 000 000 rows and 26 columns.

In the mysql_error log we getting the following error:

[Note] Aborted connection xxx to db: '****' user: '****' host: '**********' (Got an error reading communication packets)

In the application we got two of the following error for every error in the db error log. :

ERROR Database access problem. Killing off this connection and all remaining connections in the connection pool. SQL State = 08S01
ERROR Communications link failure - The last packet successfully received from the server was {some value around 7,200,000} milliseconds ago. The last packet sent successfully to the server was {some value around 7,200,000} milliseconds ago.

In the last 48 h we got 9 errors and there is a very interesting correlation between the database and application errors.

When we got an Aborted connection error in the mysql, exactly 3h and 45m later we get the application errors .
The ~ 7,200,000 ms is only 2 h so I don’t really know where did the remaining 1h and 45 minutes go.

A searched for answers in the past two days, and I changed the slave_net_timeout and net_read_timeout configs but it doesn’t help either.
The next thing I want to try is to increase the max_allowed_packet size, but I’m not sure I don’t know what effect it would have on the db.

I’m wondering if anyone has any solutions or thoughts on this problem.

The MySQL version is 5.7.25 and it is running in docker. The application is java based and using hibernate db connector.
The config is the following:

collation-server = utf8_unicode_ci
init-connect = 'SET NAMES utf8'
character-set-server = utf8
lower_case_table_names = 1
innodb_buffer_pool_size = 48G # (adjust value here, 50%-70% of total RAM)
innodb_log_file_size = 2G
innodb_flush_log_at_trx_commit = 1 # may change to 2 or 0
innodb_flush_method = O_DIRECT
innodb_print_all_deadlocks = 'ON'
slave_net_timeout = 3600
net_read_timeout = 300