"MySQL server has gone away"

Hello together!

I have only recently become familiar with the Percona XtraDB cluster and am trying to set up a cluster for my app. I was already able to do this - the servers are synchronized and doing their job in multi-master mode (3 nodes).

After importing my backup, I found out by testing my application that I had defined two Triggers (BI & BU) for a single table. After I could finally delete them, I now get an error message before each insert in the earlier mentioned table (“MySQL Server has gone away”) and one of my nodes says goodbye.

I was able to get the following from the logfile:

    2021-01-08T17:28:55.823567Z 0 [Note] [MY-000000] [WSREP] Initiating SST cancellation
    17:28:55 UTC - mysqld got signal 11 ;
    Most likely, you have hit a bug, but this error can also be caused by malfunctioning hardware.

    Build ID: 5db8d861b54b2ccfe4ce14befed66f20e02cc707
    Server Version: 8.0.21-12.1 Percona XtraDB Cluster (GPL), Release rel12, Revision 4d973e2, WSREP version 26.4.3, wsrep_26.4.3

    Thread pointer: 0x7fbd6c813e10
    Attempting backtrace. You can use the following information to find out
    where mysqld died. If you see no messages after this, something went
    terribly wrong...
    stack_bottom = 7fbde862bd90 thread_stack 0x46000
    /usr/sbin/mysqld(my_print_stacktrace(unsigned char const*, unsigned long)+0x41) [0x564dabedee51]
    /usr/sbin/mysqld(handle_fatal_signal+0x393) [0x564daaf01143]
    /lib/x86_64-linux-gnu/libpthread.so.0(+0x153c0) [0x7fbe004803c0]
    /usr/lib/galera4/libgalera_smm.so(+0x22fcce) [0x7fbdf404fcce]
    /usr/sbin/mysqld(wsrep::wsrep_provider_v26::replay(wsrep::ws_handle const&, wsrep::high_priority_service*)+0x3c) [0x564dac67b02c]
    /usr/sbin/mysqld(Wsrep_client_service::replay()+0xbb) [0x564daaf0f03b]
    /usr/sbin/mysqld(wsrep::transaction::after_statement()+0x13d) [0x564dac6797ed]
    /usr/sbin/mysqld(wsrep::client_state::after_statement()+0x136) [0x564dac665226]
    /usr/sbin/mysqld(+0x12006ca) [0x564daadac6ca]
    /usr/sbin/mysqld(dispatch_command(THD*, COM_DATA const*, enum_server_command)+0x38ce) [0x564daadb0c1e]
    /usr/sbin/mysqld(do_command(THD*)+0x27f) [0x564daadb130f]
    /usr/sbin/mysqld(+0x1345048) [0x564daaef1048]
    /usr/sbin/mysqld(+0x28a9ed2) [0x564dac455ed2]
    /lib/x86_64-linux-gnu/libpthread.so.0(+0x9609) [0x7fbe00474609]
    /lib/x86_64-linux-gnu/libc.so.6(clone+0x43) [0x7fbdffc18293]

    Trying to get some variables.
    Some pointers may be invalid and cause the dump to abort.
    Query (0): Connection ID (thread ID): 0
    Status: NOT_KILLED

    You may download the Percona XtraDB Cluster operations manual by visiting
    http://www.percona.com/software/percona-xtradb-cluster/. You may find information
    in the manual which will help you identify the cause of the crash.

Since I already have my system active, I would be very happy about any help!

Many thanks in advance!

Kind regards,

1 Like

Hello. Which version of PXC is this? It sounds like you have some inconsistency with the triggers across your nodes. Without knowing anything else about your setup, I would suggest you gracefully stop 2 of the 3 nodes and make sure everything functions well with the 1 node. If so, I would then trash the datadir on the other nodes and let them both SST a fresh dataset from node 1.

1 Like

Hi Matthewb! Thank you for your message. I have already done this as I deleted the trigger. The cause of the actual problem was that in the “CREATE TRIGGER” statement there was still a user of my old server inside. So I wanted to delete this and create it again - which I did, until I found out that triggers in the cluster can cause problems → thus I removed it again. After a long back & forth the clusters are running again, in between it worked once … and suddenly I’m facing the same problem again.

I also don’t see the trigger in any of my nodes, so it’s hard for me to narrow down the error, though I’m sure that was the root cause.

I am using the latest version of PCX (version 8)

1 Like

Does anyone else have any ideas what I can do? I have tested it with 2 nodes so far - both were gone after an insert via the WebApp. Since this is a production system, I don’t necessarily want to “kill” the main node.

The funny thing is that the server does not “go away” when I insert data from the shell. This problem only occurs when my app (CakePHP) inserts a line.

Best regards,

1 Like

@andrekat If the problem only happens when your app inserts, then your app is doing something additional that you are not on the shell. You need to determine what is different between the two.

I’m confused a bit. Do you have the trigger on the table on all 3 nodes now? Or did you completely delete the trigger? Have you tested inserts without the trigger?

1 Like

Sorry for the confusion. The trigger no longer exists on any of the nodes, but the error is still present.

An insert into the table works, but not via the PHP MySQLi provider. My first guess was the cache, but I have cleared it several times.

1 Like

The error of ‘MySQL Server has gone away’ still happens when you run your php script? Are nodes still crashing?

1 Like

Yes, that is correct. I tried this with two nodes and both crashed.

1 Like

Ok. So we know the issue is not the trigger since it does not exist. And we know the issue only happens when PHP/mysqli access the database. Running the SQL yourself on mysql CLI does not cause the issue. Your next course of action is to trace exactly what php is sending to mysql. Turn on or add logging everywhere in your code. There is clearly something php is doing that you are not which is causing the crash. You can use mysql’s general log too but do not have this enabled for a long period of time. Enable it, run php code, turn it off. examine log for your query and see what comes before it.

1 Like

Thanks! I will definitely give it a try and let you know as soon as I find out more

2 Likes

Okay, I don’t see through it at all, unfortunately. I have now enabled query logging at the application level and can see which statements were executed last (see log). I have executed the last statements 1:1 in the CLI (except for the “BEGIN”) and I have not been able to find a similar error anywhere. Is this related to the “BEGIN”?

2021-01-18 10:01:36 Debug: duration=2 rows=1 SHOW TABLE STATUS WHERE Name = 'tags'
2021-01-18 10:01:36 Debug: duration=1 rows=1 SELECT Tags.id AS `Tags__id`, Tags.title AS `Tags__title`, Tags.date_change AS `Tags__date_change`, Tags.date_create AS `Tags__date_create` FROM tags Tags WHERE Tags.id in (27)
2021-01-18 10:01:36 Debug: duration=0 rows=0 BEGIN
2021-01-18 10:01:36 Debug: duration=2 rows=11 SHOW FULL COLUMNS FROM `music_contents`
2021-01-18 10:01:36 Debug: duration=2 rows=2 SHOW INDEXES FROM `music_contents`
2021-01-18 10:01:36 Debug: duration=3 rows=1 SELECT * FROM information_schema.key_column_usage AS kcu
            INNER JOIN information_schema.referential_constraints AS rc
            ON (
                kcu.CONSTRAINT_NAME = rc.CONSTRAINT_NAME
                AND kcu.CONSTRAINT_SCHEMA = rc.CONSTRAINT_SCHEMA
            )
            WHERE kcu.TABLE_SCHEMA = 'REPLACED' AND kcu.TABLE_NAME = 'music_contents' AND rc.TABLE_NAME = 'music_contents'
2021-01-18 10:01:36 Debug: duration=2 rows=1 SHOW TABLE STATUS WHERE Name = 'music_contents'
2021-01-18 10:01:36 Debug: duration=1 rows=1 SELECT 1 AS `existing` FROM music_contents MusicContents WHERE MusicContents.id = 2941 LIMIT 1
2021-01-18 10:01:36 Debug: duration=2 rows=1 INSERT INTO single_tracks (music_content_id, name, interpreter, licensed, fsk, tempo, dynamics, arrangement, status, voice, language, publish_year, date_create, date_change) VALUES (2941, 'Andre Karner', 'Test', 1, 0, 1, 1, 'vocal/instrumental', 1, 'm', 'en', '', '2021-01-18 10:01:36', '2021-01-18 10:01:36')
2021-01-18 10:01:36 Debug: duration=2 rows=2 SHOW FULL COLUMNS FROM `single_track_tags`
2021-01-18 10:01:36 Debug: duration=160 rows=0 SHOW INDEXES FROM `single_track_tags`
2021-01-18 10:01:36 Debug: duration=0 rows=0 ROLLBACK
2021-01-18 10:01:36 Debug: duration=0 rows=0 SHOW FULL COLUMNS FROM `channels`
2021-01-18 10:01:36 Debug: duration=0 rows=0 BEGIN

Immediately after the last statement “BEGIN” it writes the following error message

2021-01-18 10:01:36 Warning: Warning (2): PDOStatement::execute(): MySQL server has gone away in [/var/www/XXXXXX.at/public_html/one/vendor/cakephp/cakephp/src/Database/Statement/MysqlStatement.php, line 39]
Request URL: /single-tracks/add
Referer URL: https://XXXXX.at/single-tracks/add
Client IP: xxxxxx
Trace:
Cake\Error\BaseErrorHandler::handleError() - CORE/src/Error/BaseErrorHandler.php, line 153
PDOStatement::execute() - [internal], line ??
Cake\Database\Statement\MysqlStatement::execute() - CORE/src/Database/Statement/MysqlStatement.php, line 39
Cake\Database\Statement\StatementDecorator::execute() - CORE/src/Database/Statement/StatementDecorator.php, line 174
Cake\Database\Log\LoggingStatement::execute() - CORE/src/Database/Log/LoggingStatement.php, line 56
Cake\Database\Connection::query() - CORE/src/Database/Connection.php, line 315
Cake\Database\Connection::execute() - CORE/src/Database/Connection.php, line 271
Cake\Database\Schema\Collection::_reflect() - CORE/src/Database/Schema/Collection.php, line 129
Cake\Database\Schema\Collection::describe() - CORE/src/Database/Schema/Collection.php, line 102
Cake\ORM\Table::getSchema() - CORE/src/ORM/Table.php, line 531
Cake\ORM\Query::addDefaultTypes() - CORE/src/ORM/Query.php, line 210
Cake\ORM\Query::__construct() - CORE/src/ORM/Query.php, line 167
Cake\ORM\Table::query() - CORE/src/ORM/Table.php, line 1569
Cake\ORM\Table::find() - CORE/src/ORM/Table.php, line 1171
Cake\ORM\Association::find() - CORE/src/ORM/Association.php, line 1030
Cake\ORM\Association\BelongsToMany::Cake\ORM\Association\{closure}() - CORE/src/ORM/Association/BelongsToMany.php, line 1174
Cake\Database\Connection::transactional() - CORE/src/Database/Connection.php, line 668
Cake\ORM\Association\BelongsToMany::replaceLinks() - CORE/src/ORM/Association/BelongsToMany.php, line 1205
Cake\ORM\Association\BelongsToMany::saveAssociated() - CORE/src/ORM/Association/BelongsToMany.php, line 691
Cake\ORM\AssociationCollection::_save() - CORE/src/ORM/AssociationCollection.php, line 260
1 Like

The ‘BEGIN’ simply refers to starting a new SQL transaction. I see the ROLLBACK before it which means the INSERT did not happen. There should be a COMMIT but not seeing that.

SHOW INDEXES FROM single_track_tags took 160 seconds to run and returned nothing. That does not seem correct. You have a 3-node PXC right? And you are sending these connections to 1 node right?

Have you stopped the other 2 nodes and ran this PHP using only 1 member of the cluster?

1 Like

All right, thanks! Fact, that sounds weird. Yes exactly, I have a 3-node PXC and control these 3 with ProxySQL. However, the error also happens when I only address 1 of the 3 nodes. I have not stopped 2 so far, unless it is absolutely necessary to narrow down or fix the error.

EDIT: Running the query “SHOW INDEXES FROM single_track_tags” on the CLI is also working.

1 Like

I would remove ProxySQL from the equation and, for testing, tell your application to connect directly to node 1 and run some tests. Does the node still crash? Have you checked if proxysql is crashing?

1 Like

I already did that for testing purposes. The DEV environment is connected directly to one node. I‘ve already tested it with each node.

1 Like

Well, if this issue is still happening with proxysql missing, then you are hitting some random bug that is only triggered by the SQL from your ORM. I would indeed stop nodes 2 & 3, then restart node 1 and comment out all wsrep_ lines in your config. This essentially starts node 1 as a plain mysql server. Run your app again. If mysql still crashes, then you’ve found some bug and you’ll need to report it. If mysql does not crash, restart node 1 but turn all the Galera stuff back on. Run again against 1 node. If crashes this time, then your SQL is triggering something strange within Galera and you need to report the bug. If your code only crashes the node when all 3 are running/connected, then it sounds like you are trying to do something incompatible with galera.

2 Likes

Okay, now it’s getting weird. I have just enabled logging in MySQL (SET global general_log = on) to see if my application is not logging everything. After I executed the command, I wanted to try again to reproduce the error - and lo and behold, suddenly it works?

The application addresses the database via ProxySQL … should I still worry about this?

UPDATE: Ok I mean I have found the solution to the problem. Before I enabled logging, I ran the “SHOW INDEXES” command again to see if it really takes that long. The command was executed in a few ms, then I ran my tests and it worked (in the DEV environment).

Now I tried the same in the LIVE environment (without running the “SHOW INDEXES” command earlier) → MySQL Server has gone away. Well, I first restarted the affected node, ran the “SHOW INDEXES” command in each schema + node → ran test again and no crash!!!

1 Like

To be honest, you have a crappy ORM. No ORM should be executing SHOW INDEXES and SELECT FROM information_schema on a regular basis. The data from these queries is only needed when you are creating a new model, not every time you execute.
If you found a solution, great!

1 Like