Database crash in regular intervals - mysqld got signal 11

Hello,

we are testing Percona Server for Mysql 8.0.33-25 on Debian 11.7 (installed from repository). We are getting error mysqld got signal 11 followed by database crash in regular intervals. We have tested Percona Server and Percona Xtradb cluster and error is present on both in versions 8.0.22 and later. Version 8.0.21 runs without an error. We have also tested our code with no problem on Percona Server and XtraDB Cluster 5.7.X

The crash is caused by our JAVA applications that send the data into the database with simple insert/update/delete operations – but these operations trigger database triggers.

The database crash repeats regularly – for example every 4 minutes. Java parameter maxLifetime in our client applications seems to be somehow related to these database crashes - in case we decrease Java parameter maxLifetime also period of database crashes decrease.

Can someone help us debug backtrace of this error? Is it a database bug? Can we somehow optimize database configuration on our side to prevent database crashes?

PS: I have found this thread https://bugs.mysql.com/bug.php?id=102036 and this bug seems to me that is of similar type (they also got signal 11 on executing a trigger)

Thank you in advance

2023-08-25T10:51:54Z UTC - mysqld got signal 11 ;
Most likely, you have hit a bug, but this error can also be caused by malfunctioning hardware.
BuildID[sha1]=7712c421dbb5f98622a8ada6b288d73ab447e90a
Server Version: 8.0.33-25 Percona Server (GPL), Release '25', Revision '60c9e2c5'

Thread pointer: 0x7f82c9dee050
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 7f8b9c110c20 thread_stack 0x100000
/usr/sbin/mysqld(my_print_stacktrace(unsigned char const*, unsigned long)+0x2e) [0x558d672b8b9e]
/usr/sbin/mysqld(print_fatal_signal(int)+0x38b) [0x558d663d403b]
/usr/sbin/mysqld(handle_fatal_signal+0xc5) [0x558d663d4105]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x13140) [0x7f8f866e3140]
/usr/sbin/mysqld(find_type2(TYPELIB const*, char const*, unsigned long, CHARSET_INFO const*)+0x25) [0x558d66357275]
/usr/sbin/mysqld(Field_enum::store(char const*, unsigned long, CHARSET_INFO const*)+0xaf) [0x558d664c1d0f]
/usr/sbin/mysqld(Item::save_in_field(Field*, bool)+0x53) [0x558d65f9add3]
/usr/sbin/mysqld(sp_eval_expr(THD*, Field*, Item**)+0xab) [0x558d6619485b]
/usr/sbin/mysqld(sp_lex_instr::reset_lex_and_exec_core(THD*, unsigned int*, bool)+0x5fa) [0x558d661a8a0a]
/usr/sbin/mysqld(sp_lex_instr::validate_lex_and_execute_core(THD*, unsigned int*, bool)+0xa5) [0x558d661a92a5]
/usr/sbin/mysqld(sp_head::execute(THD*, bool)+0x5c1) [0x558d6619faf1]
/usr/sbin/mysqld(sp_head::execute_function(THD*, Item**, unsigned int, Field*)+0x5d5) [0x558d661a1fe5]
/usr/sbin/mysqld(Item_func_sp::execute_impl(THD*)+0x117) [0x558d6600fac7]
/usr/sbin/mysqld(Item_func_sp::execute()+0x7c) [0x558d6600fbfc]
/usr/sbin/mysqld(Item_func_sp::val_str(String*)+0x41) [0x558d6600ff51]
/usr/sbin/mysqld(eval_string_arg(CHARSET_INFO const*, Item*, String*)+0x7d) [0x558d66002f8d]
/usr/sbin/mysqld(Arg_comparator::compare_string()+0x26) [0x558d65f9f126]
/usr/sbin/mysqld(Item_func_eq::val_int()+0x28) [0x558d65fa4ba8]
/usr/sbin/mysqld(Item::val_bool()+0xad) [0x558d65f7d4ad]
/usr/sbin/mysqld(sp_instr_jump_if_not::exec_core(THD*, unsigned int*)+0x2d) [0x558d661a62ad]
/usr/sbin/mysqld(sp_lex_instr::reset_lex_and_exec_core(THD*, unsigned int*, bool)+0x5fa) [0x558d661a8a0a]
/usr/sbin/mysqld(sp_lex_instr::validate_lex_and_execute_core(THD*, unsigned int*, bool)+0xa5) [0x558d661a92a5]
/usr/sbin/mysqld(sp_head::execute(THD*, bool)+0x5c1) [0x558d6619faf1]
/usr/sbin/mysqld(sp_head::execute_trigger(THD*, MYSQL_LEX_CSTRING const&, MYSQL_LEX_CSTRING const&, GRANT_INFO*)+0x26d) [0x558d661a04ed]
/usr/sbin/mysqld(Trigger::execute(THD*)+0xec) [0x558d6638685c]
/usr/sbin/mysqld(Trigger_chain::execute_triggers(THD*)+0x18) [0x558d66387c48]
/usr/sbin/mysqld(Table_trigger_dispatcher::process_triggers(THD*, enum_trigger_event_type, enum_trigger_action_time_type, bool)+0x4d) [0x558d6637fc6d]
/usr/sbin/mysqld(Sql_cmd_update::update_single_table(THD*)+0x2041) [0x558d663456c1]
/usr/sbin/mysqld(Sql_cmd_dml::execute(THD*)+0x16e) [0x558d662b791e]
/usr/sbin/mysqld(mysql_execute_command(THD*, bool)+0xaef) [0x558d6625d57f]
/usr/sbin/mysqld(sp_instr_stmt::exec_core(THD*, unsigned int*)+0x4f) [0x558d661a61bf]
/usr/sbin/mysqld(sp_lex_instr::reset_lex_and_exec_core(THD*, unsigned int*, bool)+0x169) [0x558d661a8579]
/usr/sbin/mysqld(sp_lex_instr::validate_lex_and_execute_core(THD*, unsigned int*, bool)+0xa5) [0x558d661a92a5]
/usr/sbin/mysqld(sp_instr_stmt::execute(THD*, unsigned int*)+0xea) [0x558d661aabea]
/usr/sbin/mysqld(sp_head::execute(THD*, bool)+0x5c1) [0x558d6619faf1]
/usr/sbin/mysqld(sp_head::execute_procedure(THD*, mem_root_deque<Item*>*)+0x858) [0x558d661a2c68]
/usr/sbin/mysqld(Sql_cmd_call::execute_inner(THD*)+0x154) [0x558d661d5a14]
/usr/sbin/mysqld(Sql_cmd_dml::execute(THD*)+0x16e) [0x558d662b791e]
/usr/sbin/mysqld(mysql_execute_command(THD*, bool)+0xaef) [0x558d6625d57f]
/usr/sbin/mysqld(sp_instr_stmt::exec_core(THD*, unsigned int*)+0x4f) [0x558d661a61bf]
/usr/sbin/mysqld(sp_lex_instr::reset_lex_and_exec_core(THD*, unsigned int*, bool)+0x169) [0x558d661a8579]
/usr/sbin/mysqld(sp_lex_instr::validate_lex_and_execute_core(THD*, unsigned int*, bool)+0xa5) [0x558d661a92a5]
/usr/sbin/mysqld(sp_instr_stmt::execute(THD*, unsigned int*)+0xea) [0x558d661aabea]
/usr/sbin/mysqld(sp_head::execute(THD*, bool)+0x5c1) [0x558d6619faf1]
/usr/sbin/mysqld(sp_head::execute_procedure(THD*, mem_root_deque<Item*>*)+0x858) [0x558d661a2c68]
/usr/sbin/mysqld(Sql_cmd_call::execute_inner(THD*)+0x154) [0x558d661d5a14]
/usr/sbin/mysqld(Sql_cmd_dml::execute(THD*)+0x16e) [0x558d662b791e]
/usr/sbin/mysqld(mysql_execute_command(THD*, bool)+0xaef) [0x558d6625d57f]
/usr/sbin/mysqld(dispatch_sql_command(THD*, Parser_state*, bool)+0x5d4) [0x558d66261384]
/usr/sbin/mysqld(dispatch_command(THD*, COM_DATA const*, enum_server_command)+0xf71) [0x558d662628c1]
/usr/sbin/mysqld(do_command(THD*)+0x252) [0x558d66264be2]
/usr/sbin/mysqld(+0x1353280) [0x558d663c5280]
/usr/sbin/mysqld(+0x26b0185) [0x558d67722185]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x7ea7) [0x7f8f866d7ea7]
/lib/x86_64-linux-gnu/libc.so.6(clone+0x3f) [0x7f8f85e84a2f]

Trying to get some variables.
Some pointers may be invalid and cause the dump to abort.
Query (7f82fdc22e40): UPDATE                         Object                     SET                         Pushed = 1,                         PushedTime = aTime,                         PushedConfigurationId = lConfigurationId                     WHERE                         ObjectId = lObjectId
Connection ID (thread ID): 3640
Status: NOT_KILLED

Please help us make Percona Server better by reporting any
bugs at https://bugs.percona.com/

You may download the Percona Server operations manual by visiting
http://www.percona.com/software/percona-server/. You may find information
in the manual which will help you identify the cause of the crash

Hi Jaromir,

Can you repeat this behavior with upstream (community) MySQL packages, or is this only happening when you use Percona Server?

Also, can you reproduce this at will? This will help with investigations and with testing and reporting a bug (if it’s still happening in latest release).

Hi @lebeda.jaromir and welcome to Percona community,

In addition to previous question, can you also provide two more inputs:

  • Table definition
    SHOW CREATE TABLE table_that_is_causing_crash;
  • Check error log for anything specific about that table or if that table is corrupt itself?
    CHECK TABLE table_that_is_causing_crash;

Thanks,
K

If repeatable, can you turn on core dumps, cause a crash, then share the core with us? This would help immensely.

Hello,
thank you for your reaction.

I have performed tests also with community Mysql (on same server with same data dir):

  1. mysql-8.0.21-linux-glibc2.12-x86_64 - works fine
  2. mysql-8.0.34-linux-glibc2.28-x86_64 - fails with signal 11 after cca 30 minutes
  3. mysql-8.1.0-linux-glibc2.28-x86_64 - fails with signal 11 after cca 30 minutes

I can reproduce this behavior any time - I just turn on our JAVA applications that sends data to DB.

Hello,
I have caused the crash again (on Percona Server for Mysql 8.0.33-25 ). In error_log I see that failing query is UPDATE.

I attach error_log
error.log (10.1 KB)
also I attach output of CREATE TABLE and CHECK TABLE of table that caused the crash (because of our policy I had to anonymize attribute/table names).create_check_table.txt (2.3 KB)
I do not see suspicious records in error_log other than this crash.

Unfortunately we have quite complicated logic - table that caused crash has ON UPDATE trigger - this trigger invokes stored procedures that can INSERT/UPDATE/DELETE in other tables - I tried CHECK TABLE for them too and it seems OK. Personally I do not think the problem is that tables are corrupted.

One information to be more specific - our clients (JAVA apps) are reading data from Kafka and send hundreds of these queries per second into the Database, after some specific amount of time the Database crashes with signal 11.

Hello,
I have turned on core dumps, caused a crash and the core dump file size is 9GB.

But I am not allowed to share the core dump file because management is afraid it could contain sensitive data :frowning: I am thinking if there is any other solution how to give you information you need to investigate? Maybe - is it possible to give us instructions on what to look for in core dump file? So that it is not necessary to send you all the content of 9GB?

Sending you what to look for is variable since there could be any number of reasons for the crash. Having the core dump allows our engineers to view the entire stack and all threads that could be contributing to the problem.

We can provide a secure means of uploading the core dump. If your company engages in a Support contract, then the various NDAs and “legal stuff” would obviate any sensitive data issues that your management may be concerned with.

I would first remove the trigger and run the application. Verify that all works good like this. Then add the trigger calling the proc, but simplify the proc to just 1 statement. Again, verify this works under load. Then add a bit more complexity to the proc. Repeat this process until the crash happens. Then remove what you added and confirm that adding X to the proc produces the crash. Check to make sure there’s no string-int comparisons. Check that all collations and charsets are the same.

Hello, thank you for reaction. For now we are trying to limit triggers and so find exactly which code causes crash. I will inform you when we have some output.
Regarding core dump and possibility of NDAs, I will discuss again with management and I will let you know next week. Thank you

Do you have any update?
I have the exact same problem.

Hello, sorry for delay. I think we have finally found solution of our problem with ‘signal 11’.

Our databases crash during database triggers execution. These triggers are called often (many times per second). In triggers we have condition to limit on which database instance should run part of our code:

IF getInstance() = ‘INS1’ THEN
– execute some code
END IF;

Triggers fail on comparison getInstance() = ‘INS1’

The function getInstance looks like this:

CREATE
DEFINER=db_definer@localhost
FUNCTION getInstance()
RETURNS enum(‘INS1’,‘INS2’,‘INS3’) CHARSET utf8

BEGIN
RETURN ‘INS2’;
END

If we change function getInstance (for example) so that value ‘INS2’ comes from select to database table it seems database runs stable (we have tested with PXC cluster 8.0.34).

CREATE
DEFINER=db_definer@localhost
FUNCTION getInstance()
RETURNS varchar(4)

BEGIN
DECLARE lIns varchar(4);
SELECT
Instance from DbInstance
INTO lIns;
RETURN lIns;
END

Maybe is problem with the ENUM in the original function? Could someone explain us this behaviour please?

Thank you :slight_smile: