Mysql Crash 'boost::wrapexcept<std::system_error>'

I just inherited a web server cluster using XtraDB 8 for MySQL clustering, and in the past couple of days I’ve seen the following error on multiple nodes:

2022-07-14T14:05:41.175929Z 12 [Note] [MY-000000] [WSREP] wsrep_notify_cmd is not defined, skipping notification.
terminate called after throwing an instance of 'boost::wrapexcept<std::system_error>'
  what():  remote_endpoint: Transport endpoint is not connected
2022-07-14T14:05:42.299411Z 0 [Note] [MY-000000] [WSREP] Initiating SST cancellation
14:05:42 UTC - mysqld got signal 6 ;
Most likely, you have hit a bug, but this error can also be caused by malfunctioning hardware.

Build ID: 2165eff2f1909b2f032b76b423382ec097755ae3
Server Version: 8.0.27-18.1 Percona XtraDB Cluster (GPL), Release rel18, Revision ac35177, WSREP version 26.4.3, wsrep_26.4.3

Thread pointer: 0x0
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 0 thread_stack 0x100000
/usr/sbin/mysqld(my_print_stacktrace(unsigned char const*, unsigned long)+0x41) [0x55c658319ea1]
/usr/sbin/mysqld(handle_fatal_signal+0x393) [0x55c657338f63]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x14420) [0x7fb2e24a1420]
/lib/x86_64-linux-gnu/libc.so.6(gsignal+0xcb) [0x7fb2e1b7900b]
/lib/x86_64-linux-gnu/libc.so.6(abort+0x12b) [0x7fb2e1b58859]
/lib/x86_64-linux-gnu/libstdc++.so.6(+0x9e911) [0x7fb2e1f30911]
/lib/x86_64-linux-gnu/libstdc++.so.6(+0xaa38c) [0x7fb2e1f3c38c]
/lib/x86_64-linux-gnu/libstdc++.so.6(+0xaa3f7) [0x7fb2e1f3c3f7]
/lib/x86_64-linux-gnu/libstdc++.so.6(+0xaa6a9) [0x7fb2e1f3c6a9]
/usr/lib/galera4/libgalera_smm.so(+0x1e569) [0x7fb2d5c64569]
/usr/lib/galera4/libgalera_smm.so(+0xa050a) [0x7fb2d5ce650a]
/usr/lib/galera4/libgalera_smm.so(+0xa20ab) [0x7fb2d5ce80ab]
/usr/lib/galera4/libgalera_smm.so(+0xa4428) [0x7fb2d5cea428]
/usr/lib/galera4/libgalera_smm.so(+0xaa9d3) [0x7fb2d5cf09d3]
/usr/lib/galera4/libgalera_smm.so(+0x9b707) [0x7fb2d5ce1707]
/usr/lib/galera4/libgalera_smm.so(+0x893a2) [0x7fb2d5ccf3a2]
/usr/lib/galera4/libgalera_smm.so(+0x193258) [0x7fb2d5dd9258]
/usr/lib/galera4/libgalera_smm.so(+0x1bc5ae) [0x7fb2d5e025ae]
/usr/lib/galera4/libgalera_smm.so(+0x1bc6d6) [0x7fb2d5e026d6]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x8609) [0x7fb2e2495609]
/lib/x86_64-linux-gnu/libc.so.6(clone+0x43) [0x7fb2e1c55133]
You may download the Percona XtraDB Cluster operations manual by visiting
http://www.percona.com/software/percona-xtradb-cluster/. You may find information
in the manual which will help you identify the cause of the crash.
Log of wsrep recovery (--wsrep-recover):
 INFO: WSREP: Running position recovery with --log_error='/var/lib/mysql/wsrep_recovery_verbose.v3wZh5' --pid-file='/var/lib/mysql/webcluster1-recover.pid'
 INFO: WSREP: Recovered position 528ba769-fc26-11ec-a8f6-1f7beb5ef71d:3407779

The nodes will start mysqld again and rejoin the cluster if it’s started, but I’m not adept enough to find the problem based on that trace.

provided below are a dpkg and the contents of the my.cnf.

user@webcluster3:/$ dpkg -l|egrep -i "mysql|percona|maria|boost|sysbench"
ii  libboost-date-time1.71.0:amd64             1.71.0-6ubuntu6                     amd64        set of date-time libraries based on generic programming concepts
ii  libboost-filesystem1.71.0:amd64            1.71.0-6ubuntu6                     amd64        filesystem operations (portable paths, iteration over directories, etc) in C++
ii  libboost-iostreams1.71.0:amd64             1.71.0-6ubuntu6                     amd64        Boost.Iostreams Library
ii  libboost-locale1.71.0:amd64                1.71.0-6ubuntu6                     amd64        C++ facilities for localization
ii  libboost-thread1.71.0:amd64                1.71.0-6ubuntu6                     amd64        portable C++ multi-threading
ii  libdbd-mysql-perl:amd64                    4.050-3                             amd64        Perl5 database interface to the MariaDB/MySQL database
ii  libmysqlclient21:amd64                     8.0.29-0ubuntu0.20.04.3             amd64        MySQL database client library
ii  mysql-common                               5.8+1.0.5ubuntu2                    all          MySQL database common files, e.g. /etc/mysql/my.cnf
ii  percona-release                            1.0-27.generic                      all          Package to install Percona gpg key and APT repos
ii  percona-xtradb-cluster                     1:8.0.27-18-1.focal                 amd64        Percona XtraDB Cluster with Galera
ii  percona-xtradb-cluster-client              1:8.0.27-18-1.focal                 amd64        Percona XtraDB Cluster database client binaries
ii  percona-xtradb-cluster-common              1:8.0.27-18-1.focal                 amd64        Percona XtraDB Cluster database common files (e.g. /etc/mysql/my.cnf)
ii  percona-xtradb-cluster-server              1:8.0.27-18-1.focal                 amd64        Percona XtraDB Cluster database server binaries
ii  php7.4-mysql                               7.4.3-4ubuntu2.12                   amd64        MySQL module for PHP

# The Percona Server 8.0 configuration file.
#
# For explanations see
# http://dev.mysql.com/doc/mysql/en/server-system-variables.html
#
#
# * IMPORTANT: Additional settings that can override those from this file!
#   The files must end with '.cnf', otherwise they'll be ignored.
#

!includedir /etc/mysql/mysql.conf.d/
!includedir /etc/mysql/conf.d/
[mysqld]
ssl-ca=/etc/mysql/certs/ca.pem
ssl-cert=/etc/mysql/certs/server-cert.pem
ssl-key=/etc/mysql/certs/server-key.pem

[client]
ssl-ca=/etc/mysql/certs/ca.pem
ssl-cert=/etc/mysql/certs/client-cert.pem
ssl-key=/etc/mysql/certs/client-key.pem

Any help would be appreciated.

1 Like

I have a fresh install of Percona XtraDB on a single node. It is exposed to the internet and I’m getting random crashes like you have. I suspect it’s a scan that is happening and killing the node. There’s a similar thread here:
[MDEV-25068] Node crashes with Transport endpoint is not connected mysqld got signal 6 ; - Jira where it says Fix is in Galera library 26.4.12.

Not sure if that’s what is happening or if/when the fix will be incorporated into Percona’s software…

1 Like

Thanks for that! We do use Nessus for scans, and I’m trying to get scan times from our Information Assurance team to see if those times correlate with my crashes.

I was struggling to figure out why it would have happened on one node the first time, and then took out every node in the cluster the second time it happened. But I’m sure that IA is running scheduled scans on that vlan, which would get all three at the same time, and I asked for a manual scan to see if a config change to an unrelated piece of software remediated another vulnerability on a single node the day before.

1 Like