Database servers log "[Galera] unserialize error invalid protocol version" then mysqld dies

This happens on each server within the same hour as each other, eventually taking our entire cluster down.

Here’s the full log from one server from when the error begins:

2022-08-02T14:37:33.586089Z 0 [Warning] [MY-000000] [Galera] unserialize error invalid protocol version 6: 71 (Protocol error)
	 at gcomm/src/gcomm/datagram.hpp:unserialize():133
2022-08-02T14:38:20.975490Z 0 [Warning] [MY-000000] [Galera] unserialize error invalid protocol version 2: 71 (Protocol error)
	 at gcomm/src/gcomm/datagram.hpp:unserialize():133
2022-08-02T14:38:25.986377Z 0 [Warning] [MY-000000] [Galera] unserialize error invalid protocol version 1: 71 (Protocol error)
	 at gcomm/src/gcomm/datagram.hpp:unserialize():133
2022-08-02T14:38:43.165347Z 0 [Warning] [MY-000000] [Galera] checksum failed, hdr: len=1 has_crc32=0 has_crc32c=0 crc32=1
2022-08-02T14:38:57.100836Z 0 [Warning] [MY-000000] [Galera] unserialize error invalid protocol version 2: 71 (Protocol error)
	 at gcomm/src/gcomm/datagram.hpp:unserialize():133
2022-08-02T14:38:57.104978Z 0 [Warning] [MY-000000] [Galera] unserialize error invalid protocol version 2: 71 (Protocol error)
	 at gcomm/src/gcomm/datagram.hpp:unserialize():133
2022-08-02T14:39:00.665961Z 0 [Warning] [MY-000000] [Galera] unserialize error invalid protocol version 2: 71 (Protocol error)
	 at gcomm/src/gcomm/datagram.hpp:unserialize():133
2022-08-02T14:39:33.968910Z 0 [Warning] [MY-000000] [Galera] unserialize error invalid protocol version 4: 71 (Protocol error)
	 at gcomm/src/gcomm/datagram.hpp:unserialize():133
terminate called after throwing an instance of 'boost::exception_detail::clone_impl<boost::exception_detail::error_info_injector<std::system_error> >'
  what():  remote_endpoint: Transport endpoint is not connected
2022-08-02T14:50:43.154269Z 0 [Note] [MY-000000] [WSREP] Initiating SST cancellation
14:50:43 UTC - mysqld got signal 6 ;
Most likely, you have hit a bug, but this error can also be caused by malfunctioning hardware.

Build ID: 197cca034159ea848cfc7c45f97087bb0d9c0428
Server Version: 8.0.28-19.1 Percona XtraDB Cluster (GPL), Release rel19, Revision f544540, WSREP version 26.4.3, wsrep_26.4.3

Thread pointer: 0x0
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 0 thread_stack 0x100000
/usr/sbin/mysqld(my_print_stacktrace(unsigned char const*, unsigned long)+0x3d) [0x20ef01d]
/usr/sbin/mysqld(print_fatal_signal(int)+0x323) [0x1182a03]
/usr/sbin/mysqld(handle_fatal_signal+0xc0) [0x1182ad0]
/lib64/libpthread.so.0(+0xf630) [0x7efd2cba9630]
/lib64/libc.so.6(gsignal+0x37) [0x7efd2ae94387]
/lib64/libc.so.6(abort+0x148) [0x7efd2ae95a78]
/lib64/libstdc++.so.6(__gnu_cxx::__verbose_terminate_handler()+0x165) [0x7efd2b7a4a95]
/lib64/libstdc++.so.6(+0x5ea06) [0x7efd2b7a2a06]
/lib64/libstdc++.so.6(+0x5ea33) [0x7efd2b7a2a33]
/lib64/libstdc++.so.6(+0x5ec53) [0x7efd2b7a2c53]
/usr/lib64/galera4/libgalera_smm.so(+0x1dbea) [0x7efd1b46abea]
/usr/lib64/galera4/libgalera_smm.so(+0x94748) [0x7efd1b4e1748]
/usr/lib64/galera4/libgalera_smm.so(+0xac341) [0x7efd1b4f9341]
/usr/lib64/galera4/libgalera_smm.so(+0xa3dab) [0x7efd1b4f0dab]
/usr/lib64/galera4/libgalera_smm.so(+0xa703a) [0x7efd1b4f403a]
/usr/lib64/galera4/libgalera_smm.so(+0xae7ef) [0x7efd1b4fb7ef]
/usr/lib64/galera4/libgalera_smm.so(+0x8c8d0) [0x7efd1b4d98d0]
/usr/lib64/galera4/libgalera_smm.so(+0x1c64ee) [0x7efd1b6134ee]
/usr/lib64/galera4/libgalera_smm.so(+0x1c6612) [0x7efd1b613612]
/lib64/libpthread.so.0(+0x7ea5) [0x7efd2cba1ea5]
/lib64/libc.so.6(clone+0x6d) [0x7efd2af5cb0d]
You may download the Percona XtraDB Cluster operations manual by visiting
http://www.percona.com/software/percona-xtradb-cluster/. You may find information
in the manual which will help you identify the cause of the crash.

Our hardware has not changed. I have upgraded Percona Xtradb cluster from 5.7 to 8 and then proxysql from 1.x to 2.3.2.

These errors started happening after upgrading to proxysql 2.3.2.

I should note there are NO logged errors on the proxysql server. (There used to errors due to a scheduler we had running for 1.x, but I have since removed it).

Any idea what we can change in our config to stop this from happening? Thanks

1 Like

Many times this is a network related issue. What does your network health look like when this starts happening.

2 Likes

Thanks, we very often have times of network slowness and also times where our VM resources (CPU, disk io) become extremely sluggish. Is there any way to mitigate this with config (timeouts)?

Is this error-ing out something that would have changed with upgrading proxysql to v2 and percona xtradb cluster to v8?

We never experienced this with proxysql 1 and pxdbc 5.7.

1 Like