Looks Like Bug, to many connection crashes PXC

Hi All,
Firstly thanks for walking me through this journey.

This is linked to “SSL issue with async replication between 2 pxc
Context has changed hence creating a new thread.

Looks like it’s a bug! as per the error message on line 18,
This happens when the security team tries to scan MySQL using root for scanning vulnerabilities, the SSL error occurs and PXC node on which scan is running crashes, log message says it’s a bug and my my.cnf and other outputs shared above look fine.

After they try to login multiple times still mysql service should not crash, correct? or is it a safety feature in PXC if to many attempts are made service will crash??

The scanning tool attempts connecting approx 160 times and every second it connects 6 times after which mysql services crash.

In the general query log out of above mentioned 160 entries 99% of them are just connect and quit on repeat in each new line of the general log file, the scan tool also tries to scan if root and anonymous users are enabled without password.

Does the PXC cluster get overloaded and crashes??

Can you please help here.

Handshake failed: wrong version number
Handshake failed: unsupported protocol
Handshake failed: version too low
Handshake failed: unexpected message
Handshake failed: peer did not return a certificate
[libprotobuf ERROR /mnt/jenkins/workspace/pxc80-autobuild-RELEASE/test/rpmbuild/BUILD/Percona-XtraDB-Cluster-8.0.28/extra/protobuf/protobuf-3.11.4/src/google/protobuf/message_lite.cc:123] Can't parse message of type "Mysqlx.Connection.CapabilitiesSet" because it is missing required fields: (cannot determine missing fields for lite message)
Handshake failed: peer did not return a certificate
[libprotobuf ERROR /mnt/jenkins/workspace/pxc80-autobuild-RELEASE/test/rpmbuild/BUILD/Percona-XtraDB-Cluster-8.0.28/extra/protobuf/protobuf-3.11.4/src/google/protobuf/message_lite.cc:123] Can't parse message of type "Mysqlx.Prepare.Prepare" because it is missing required fields: (cannot determine missing fields for lite message)
Handshake failed: wrong version number
[libprotobuf ERROR /mnt/jenkins/workspace/pxc80-autobuild-RELEASE/test/rpmbuild/BUILD/Percona-XtraDB-Cluster-8.0.28/extra/protobuf/protobuf-3.11.4/src/google/protobuf/message_lite.cc:123] Can't parse message of type "Mysqlx.Crud.DropView" because it is missing required fields: (cannot determine missing fields for lite message)
Handshake failed: unknown protocol
Handshake failed: version too low
Handshake failed: no shared cipher
Terminate called after throwing an instance of 'boost::exception_detail::clone_impl<boost::exception_detail::error_info_injector<std::system_error> >'
  what():  remote_endpoint: Transport endpoint is not connected
2022-10-09T09:26:40.868149Z 0 [Note] [MY-000000] [WSREP] Initiating SST cancellation
09:26:40 UTC - mysqld got signal 6 ;
Most likely, you have hit a bug, but this error can also be caused by malfunctioning hardware.

Build ID: 5aaeb8aff2f9757ae471361dbf4fa4ba945f6104
Server Version: 8.0.28-19.1 Percona XtraDB Cluster (GPL), Release rel19, Revision f544540, WSREP version 26.4.3, wsrep_26.4.3

Thread pointer: 0x0
Attempting backtrace. You can use the following information to find out
1 Like

Hi Adi,

I’m not sure if this issue is related to security scanning. We see that the cluster is trying an SST that fails with an abort signal. We also see several SSL-related errors: handshake failed.

Could you test by disabling the security scanner on this cluster for a few hours and check if the cluster crashes? If the cluster crashes, check if SELinux is enabled and check again.

Thank you

1 Like

Hi @Pep_Pla,
Yes, without scanning cluster services don’t go down, we have checked now for 48 hrs and even replication works fine on all 3 nodes, and services work fine, but when we initiate the scan within 3 mins MySQL services go down.
I have also tried disabling the SELinux and running the scan still when we initiate the scan MySQL services go down within 3 mins.

The team is using the Nessus tool to scan MySQL.

Thanks
Adi

1 Like

Hi Sir,
Also, the above-provided logs have a few more additional logs which append in already provided logs, below logs are part of the logs provided in my first message.

terminate called after throwing an instance of 'boost::exception_detail::clone_impl<boost::exception_detail::error_info_injector<std::system_error> >'
  what():  remote_endpoint: Transport endpoint is not connected
2022-10-06T08:12:56.363065Z 0 [Note] [MY-000000] [WSREP] Initiating SST cancellation
08:12:56 UTC - mysqld got signal 6 ;
Most likely, you have hit a bug, but this error can also be caused by malfunctioning hardware.

Build ID: 5aaeb8aff2f9757ae471361dbf4fa4ba945f6104
Server Version: 8.0.28-19.1 Percona XtraDB Cluster (GPL), Release rel19, Revision f544540, WSREP version 26.4.3, wsrep_26.4.3

Thread pointer: 0x0
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 0 thread_stack 0x100000
/usr/sbin/mysqld(my_print_stacktrace(unsigned char const*, unsigned long)+0x41) [0x217bee1]
/usr/sbin/mysqld(print_fatal_signal(int)+0x323) [0x11a0993]
/usr/sbin/mysqld(handle_fatal_signal+0xc8) [0x11a0a68]
/lib64/libpthread.so.0(+0x12ce0) [0x7fa685434ce0]
/lib64/libc.so.6(gsignal+0x10f) [0x7fa6837e4a9f]
/lib64/libc.so.6(abort+0x127) [0x7fa6837b7e05]
/lib64/libstdc++.so.6(+0x9009b) [0x7fa68418509b]
/lib64/libstdc++.so.6(+0x9653c) [0x7fa68418b53c]
/lib64/libstdc++.so.6(+0x96597) [0x7fa68418b597]
/lib64/libstdc++.so.6(+0x967f8) [0x7fa68418b7f8]
/usr/lib64/galera4/libgalera_smm.so(+0x8c7d9) [0x7fa676c677d9]
/usr/lib64/galera4/libgalera_smm.so(+0x8cf1c) [0x7fa676c67f1c]
/usr/lib64/galera4/libgalera_smm.so(+0x9dff1) [0x7fa676c78ff1]
/usr/lib64/galera4/libgalera_smm.so(+0xa0f72) [0x7fa676c7bf72]
/usr/lib64/galera4/libgalera_smm.so(+0xa8943) [0x7fa676c83943]
/usr/lib64/galera4/libgalera_smm.so(+0x8551d) [0x7fa676c6051d]
/usr/lib64/galera4/libgalera_smm.so(+0x1ac08e) [0x7fa676d8708e]
/usr/lib64/galera4/libgalera_smm.so(+0x1ac1b6) [0x7fa676d871b6]
/lib64/libpthread.so.0(+0x81cf) [0x7fa68542a1cf]
/lib64/libc.so.6(clone+0x43) [0x7fa6837cfdd3]
You may download the Percona XtraDB Cluster operations manual by visiting
http://www.percona.com/software/percona-xtradb-cluster/. You may find information
in the manual which will help you identify the cause of the crash.

1 Like

Hi,

Could you check which ports is Nessus scanning? Is it trying to connect to 3306, or to the specific galera/pxc ports?

Other information would help, number of concurrent connections, interval between connections…

Thanks,

Pep

1 Like

Hi @Pep_Pla
Trying to get the requested information. Meanwhile found this reported issue in MariaDB Jira which has a major impact. I also have the same errors reported here.
https://jira.mariadb.org/browse/MDEV-26861
Thanks
Adi

1 Like

Hi @Pep_Pla,

This issue is resolved by upgrading Galera version to 4.12 of xtradb cluster, I have documented steps here.

Thanks
Aditya

1 Like