Mysqldump: Got error: 1160: Got an error writing communication packets when using LOCK TABLES

We get this error almost every second day on our nightly DB dump cron job. Often for two days in a row.

This is on Debian 10.13, all recent updates installed.
It started around February when using 8.0.34-26. It’s still there with 8.0.36-28.

mysqldump is stared locally with the --opt --routines -u -p --hex-blob parameters

mysqldump setting are:

[mysqldump]
quick
quote-names
max_allowed_packet      = 16M

The error arises at the very start of the dump. The resulting file always contains these lines only:

-- MySQL dump 10.13  Distrib 8.0.36-28, for Linux (x86_64)
--
-- Host: localhost    Database: XXXXXXX
-- ------------------------------------------------------
-- Server version       8.0.36-28

/*!40101 SET @OLD_CHARACTER_SET_CLIENT=@@CHARACTER_SET_CLIENT */;
/*!40101 SET @OLD_CHARACTER_SET_RESULTS=@@CHARACTER_SET_RESULTS */;
/*!40101 SET @OLD_COLLATION_CONNECTION=@@COLLATION_CONNECTION */;
/*!50503 SET NAMES utf8mb4 */;
/*!40103 SET @OLD_TIME_ZONE=@@TIME_ZONE */;
/*!40103 SET TIME_ZONE='+00:00' */;
/*!40014 SET @OLD_UNIQUE_CHECKS=@@UNIQUE_CHECKS, UNIQUE_CHECKS=0 */;
/*!40014 SET @OLD_FOREIGN_KEY_CHECKS=@@FOREIGN_KEY_CHECKS, FOREIGN_KEY_CHECKS=0 */;
/*!40101 SET @OLD_SQL_MODE=@@SQL_MODE, SQL_MODE='NO_AUTO_VALUE_ON_ZERO' */;
/*!40111 SET @OLD_SQL_NOTES=@@SQL_NOTES, SQL_NOTES=0 */;
/*!50717 SELECT COUNT(*) INTO @rocksdb_has_p_s_session_variables FROM INFORMATION_SCHEMA.TABLES WHERE TABLE_SCHEMA = 'performance_schema' AND TABLE_NAME = 'session_variables' */;
/*!50717 SET @rocksdb_get_is_supported = IF (@rocksdb_has_p_s_session_variables, 'SELECT COUNT(*) INTO @rocksdb_is_supported FROM performance_schema.session_variables WHERE VARIABLE_NAME=\'rocksdb_bulk_load\'', 'SELECT 0') */;
/*!50717 PREPARE s FROM @rocksdb_get_is_supported */;
/*!50717 EXECUTE s */;
/*!50717 DEALLOCATE PREPARE s */;
/*!50717 SET @rocksdb_enable_bulk_load = IF (@rocksdb_is_supported, 'SET SESSION rocksdb_bulk_load = 1', 'SET @rocksdb_dummy_bulk_load = 0') */;
/*!50717 PREPARE s FROM @rocksdb_enable_bulk_load */;
/*!50717 EXECUTE s */;
/*!50717 DEALLOCATE PREPARE s */;

IMHO this can’t be a network issue because mysqldump connects locally i.e. using the file socket.

Cheers

Since you are on 8.0, I’m guessing all of your tables are InnoDB. If that is true, you should be using --single-transaction to execute mysqldump. Table locks are unnecessary.

Another, better, option, is to switch to mydumper which is faster, multi-threaded, logical dump solution.

Hi @matthewb thanks for your reply.

Yes, all tables are InnoDB. I will give –single-transaction a try. But I wonder why it worked before for years…

@matthewb: After adding –single-transaction it worked for six days. But last night we again got a similar error:

mysqldump: Couldn't execute 'show create table sys_collection': Got an error writing communication packets (1160)

So this is not solved, unfortunately…

@bitone That’s a temporary error. Simply try again. In your script you should be checking the exit status of the mysqldump command. If the exit code is not 0, then your script should simple try again. Also, I would edit your script to ignore the sys schema. you don’t need to dump that in your backup.

@matthewb But this is only a workaround. If we have to check after each DB operation if it was successful then the DB is not reliable enough for produciton.

And the script was working before. So something has changed that affects the reliability.

IMHO this is a bug.

edited:

sys_collection is a table of the CMS. It’s not related to DB sys.

I just had this error four times in a row:

What exactly does the error mean? Maybe it’s misleading?
Does this happen when a table is locked by another process and therefore the DB connection times out?

As I mentioned in my first post, mysqldump is running locally, so we can rule out network issues.

That’s not correct. Every application should always check the result code of any database operation. If you don’t check, then you don’t really know if the commit succeeded or not. That’s a design flaw of your code, not a flaw of the database. The database will always tell you if something succeeded or not; it’s your responsibility to act on that response. If the database had a temporary error, it’s not the job of the DB to try again, it’s your code’s job to act accordingly.

It means there was some communication error between mysqldump and your local mysql server. What does the mysql error log say? Have you tried running those commands directly yourself?

No offense, but this must be a joke. I’m working with different kinds of DBs for over 25 years.
We’re not talking about issues on SQL statement level here.

Yes we also check for low level errors. To correct them. But when a tested statement on an existing DB regularly returns some communication error, even when we’re connecting through a local file socket, there’s something wrong with the DB engine.

If we need to execute instructions more than once for the server to do it, then that server is not reliable enough for production.

There’s nothing in the log that’s related to this.
Yes I tried those commands; see the screenshot in my last reply. This error appeared four times in a row.

My mistake. Above you said “check after each DB operation if it was successful” and I took “DB operation” to mean literally any operation which includes things like INSERT, COMMIT, etc and also extends to tools that operate against the database.

The screenshot shows using mysqldump. Have you tried those failing SQL manually?

Also, please add --verbose --force to your command so we can see where exactly the dump is failing. Have you also tried without --routines and without --hex-blob purely for testing/checking?

Hi @matthewb Thanks for you understanding, I’m glad you didn’t take my last message the wrong way.

Today it was like when you go to the doctor and as soon you’re there the pain has gone…

I wrote a script that does the dump every few seconds. And I tried to force it a bit by writing to and reading from a table in the same DB at the same time.
But the error showed up only once in the first 5 minutes and then never again.

As you suggested, I used the --verbose --force arguments when running it:

-- Retrieving table structure for table tx_eqbericht_domain_model_module2021...
-- Sending SELECT query...
-- Retrieving rows...
-- Rolling back to savepoint sp...
-- Retrieving table structure for table tx_eqbericht_domain_model_module2022...
mysqldump: Couldn't execute 'show create table `tx_eqbericht_domain_model_module2022`': Got an error writing communication packets (1160)
-- Skipping dump data for table 'tx_eqbericht_domain_model_module2022', it has no fields
-- Rolling back to savepoint sp...
-- Retrieving table structure for table tx_eqbericht_domain_model_navigation...
-- Sending SELECT query...
-- Retrieving rows...
-- Rolling back to savepoint sp...

The error occurs at line 486 of 1075.

I run the statement show create tablle ... a few times without any issues.

When the dump script is running in the late evening there’s hardly anything going on on the server.
The whole dump takes only about 5 seconds.
This is on a physical four core Xeon 3.3GHz server with 16GB RAM and hardware RAID-1.
The DB in question has 166 tables and takes 543MB on disk.

Therefore I think that we can rule out resource issues.

I rather get the suspicion that this error potentially occurs when the DB is in some “calm” state.

Are you running other DDLs or locking the table in some way?

I know you are not looking to make major changes, and “it’s worked before”, but could you give mydumper a try? I’m curious to see if the issue occurs here.

No DDL, but the CMS (TYPO3) seems to use row locking at some points. And transactions on inserts, so there’s more locking…

But then we should rather run into a lock timeout than a read error…?

I guess I have to give this a try.
What I don’t like about it is that we have to keep track of one more repository on a production critical server.

Simple transactions should not be causing such an issue, but if you’re doing explicit LOCKs then that might be the problem. There’s really no such reason at all to use LOCKs because transactions exist and are ACID.

TYPO3 uses Doctrine as DB abstraction layer. I grepped the code but couldn’t find any explicit locking.
Therefore the record locks I saw must be from update or delete statements.

I just logged into the CMS and got this:

It looks like this is a general issue now.

Verify you’re actually connecting to MySQL via socket. Are you explicitly providing the socket in your connection string? A connection to 127.0.0.1 still uses networking.

mysqldump connects through the file socket. The CMS uses indeed a network connection via 127.0.1 i.e. the loop device.
If we could not rely on the loop device, a lot of other issues would occur on Linux…

Have you looked over these MySQL "Got an error reading communication packet"

Are you tracking Aborted_connections stats? Do you see 1:1 increase in aborted cons when you get these errors?

Note that the error says “writing communication packets” which means the client application (ie: your code) had an issue talking to mysql.

We actually monitor them (Zabbix), but in a rather useless way for this purpose:

image

What we have here is the relationship between open and aborted connections. Therefore I can’t say what is causing a spike in the graph. If the ratio increases, it could mean we have fewer open connections or more aborted connections.

I have to investigate a bit more.

But either way I couldn’t find any correlation with the one error message in the CMS.