Hello,
we have upgraded to MySQL 8 in December and struggle with reoccurring Signal 11 crashes since then. We’ve managed to stabilise the environment (5 Clusters with 2 nodes each) for quite a bit, but we still have about 2-10 Signal 11 occurrences per month though. We just can’t get rid of it and we have no clue what the actual reason could be … therefore hoping for some help from you guys out there!
So first of all the general information:
Packages:
- percona-server-client-8.0.19-10.1.el8.x86_64
- percona-release-1.0-25.noarch
- percona-server-shared-8.0.19-10.1.el8.x86_64
- percona-server-shared-compat-8.0.19-10.1.el8.x86_64
- percona-server-devel-8.0.19-10.1.el8.x86_64
- percona-server-server-8.0.19-10.1.el8.x86_64
yes, I know … it is not the newest release. But we tried an upgrade to a newer release already and the number of signal 11 incidents increased. Hence we’d like to nail down the reason first before we just try to upgrade again (which takes a lot of time! and don’t forget - it’s a one way upgrade … so getting back to the original version is quite a lot of work)
OS:
- CentOS 8.2 (4.18.0-193.19.1.el8_2.x86_64)
More info on the situation:
Out of the 5 clusters only 1 cluster never had Signal 11 errors. Despite having less resources, less databases, less tables, etc… the setup is identical (same packages, OS, …). So I’d say it is fair to assume the Signal 11 error is related to either the data or the way the application is using the Percona Servers that triggers the crashes.
From the other 4 clusters that experienced Signal 11 crashes only 1 is still having the crashes regularly (at least twice a month). On the others the crashes occur only very rare (maybe every 2-3 month), but it is still happening though. A lot of people looked into it here but we really can’t put a finger on what is causing it in the end.
Here is the most recent Signal 11 errors:
11:22:07 UTC - mysqld got signal 11 ;
Most likely, you have hit a bug, but this error can also be caused by malfunctioning hardware.
Thread pointer: 0x7e7e77f8a360
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong…
stack_bottom = 7f03e46f8d50 thread_stack 0x100000
/usr/sbin/mysqld(my_print_stacktrace(unsigned char const*, unsigned long)+0x41) [0x1f88161]
/usr/sbin/mysqld(handle_fatal_signal+0x333) [0x11eda53]
/lib64/libpthread.so.0(+0x12dd0) [0x7f0d93aacdd0]
/usr/sbin/mysqld(MDL_ticket::has_stronger_or_equal_type(enum_mdl_type) const+0xb) [0xf3ccbb]
/usr/sbin/mysqld(MDL_ticket_store::find_in_hash(MDL_request const&) const+0x65) [0xf3eab5]
/usr/sbin/mysqld(MDL_context::find_ticket(MDL_request*, enum_mdl_duration*)+0x19) [0xf3eb59]
/usr/sbin/mysqld(MDL_context::try_acquire_lock_impl(MDL_request*, MDL_ticket**)+0x45) [0xf40735]
/usr/sbin/mysqld(MDL_context::acquire_lock(MDL_request*, unsigned long)+0xa7) [0xf411f7]
/usr/sbin/mysqld(open_table(THD*, TABLE_LIST*, Open_table_context*)+0x12b8) [0x101b808]
/usr/sbin/mysqld(open_tables(THD*, TABLE_LIST**, unsigned int*, unsigned int, Prelocking_strategy*)+0x473) [0x1020ee3]
/usr/sbin/mysqld(dd::Open_dictionary_tables_ctx::open_tables()+0xb6) [0x1d321f6]
/usr/sbin/mysqld(bool dd::cache::Storage_adapter::get<dd::Item_name_key, dd::Abstract_table>(THD*, dd::Item_name_key const&, enum_tx_isolation, bool, dd::Abstract_table const**)+0xc4) [0x1da7184]
/usr/sbin/mysqld(bool dd::cache::Shared_dictionary_cache::get<dd::Item_name_key, dd::Abstract_table>(THD*, dd::Item_name_key const&, dd::cache::Cache_elementdd::Abstract_table)+0x7f) [0x1d9ba9f]
/usr/sbin/mysqld(bool dd::cache::Dictionary_client::acquire<dd::Item_name_key, dd::Abstract_table>(dd::Item_name_key const&, dd::Abstract_table const, bool*, bool*)+0x28d) [0x1d5d38d]
/usr/sbin/mysqld(bool dd::cache::Dictionary_client::acquiredd::Abstract_table(std::__cxx11::basic_string<char, std::char_traits, Stateless_allocator<char, dd::String_type_alloc, My_free_functor> > const&, std::__cxx11::basic_string<char, std::char_traits, Stateless_allocator<char, dd::String_type_alloc, My_free_functor> > const&, dd::Abstract_table const**)+0x1a9) [0x1d5ec19]
/usr/sbin/mysqld(get_table_share(THD*, char const*, char const*, char const*, unsigned long, bool, bool)+0x7d8) [0x1019cc8]
/usr/sbin/mysqld(open_table(THD*, TABLE_LIST*, Open_table_context*)+0xd10) [0x101b260]
/usr/sbin/mysqld(open_tables(THD*, TABLE_LIST**, unsigned int*, unsigned int, Prelocking_strategy*)+0x473) [0x1020ee3]
/usr/sbin/mysqld(open_tables_for_query(THD*, TABLE_LIST*, unsigned int)+0x93) [0x10219d3]
/usr/sbin/mysqld(Sql_cmd_dml::prepare(THD*)+0xd8) [0x10f2358]
/usr/sbin/mysqld(Sql_cmd_dml::execute(THD*)+0xcf) [0x10fd2df]
/usr/sbin/mysqld(mysql_execute_command(THD*, bool)+0x3bd0) [0x10aaac0]
/usr/sbin/mysqld(mysql_parse(THD*, Parser_state*, bool)+0x408) [0x10acb78]
/usr/sbin/mysqld(dispatch_command(THD*, COM_DATA const*, enum_server_command)+0x2025) [0x10af125]
/usr/sbin/mysqld(do_command(THD*)+0x20c) [0x10afe4c]
/usr/sbin/mysqld() [0x11decd0]
/usr/sbin/mysqld() [0x2473a40]
/lib64/libpthread.so.0(+0x82de) [0x7f0d93aa22de]
/lib64/libc.so.6(clone+0x43) [0x7f0d91a8ce83]Trying to get some variables.
Some pointers may be invalid and cause the dump to abort.
Query (7e99c41b4408): UPDATEemail_289_detailclick
ASDC
INNER JOINemail_289_subscriber
ASS
ONDC
.email_subscriber_id
=S
.id
SETDC
.email_subscriber_id
=NULL,DC
.ident_hash
=IFNULL(DC
.ident_hash
, MD5(CONCAT(’ r J th g K\nA < nLc % \nH % |s A _ M '0TY l ‘, UNHEX(MD5(CONCAT(’ x \r !+ V & O
" O + 5c{ ' MZ>3 J ',DC
.email_subscriber_id
)))))) WHERES
.type
!=‘test’ ANDS
.subscriber_id
IN (SELECTid
FROMmw3_account_2412
.aggregator_643941203
)
Connection ID (thread ID): 70373877
Status: NOT_KILLED
the statements are usually very, very long. So the statements in the signal 11 errors are not complete of course. the statements vary from crash to crash …
We can’t really see any errors in the logs other than:
[Warning] [MY-011825] [InnoDB] Cannot add field
<<field>>
in table<<database>>
.<<table>>
because after adding it, the row size is 8139 which is greater than maximum allowed size (8126) for a record on index leaf page.
The customer is aware that they need to redesign their tables … but I don’t think this is the reason for the signal 11 errors.
Any help is very much appreciated!!
Thanks & Best Regards