Shunned Proxysql nodes and SST errors lead to Node unable to initialize

Hi. Back again with more issues on upgrading from Percona XtraDB cluster 5.7 to 8.x, Proxysql 1.x to Proxysql 2.x, and Galera 3.x to Galera 4.x.

After upgrading percona xtradb cluster on our 3 nodes, I was able to at least get a somewhat working state. I had to rsync the data directory from one node to another to get it to sync/join the cluster. I thought that was fine, now we’re good. But after upgrading to 2.x, which I had to do because mysql 8 requires it, and reconfiguring proxysql as per this guide: Galera Configuration - ProxySQL I have a somewhat frequent issue where a node will go down and be unable to resync with the cluster when mysqld service is restarted. I then have to rsync the data directory from a working node, which works but is super annoying.

this is the relevant part of the logs

2022-09-19T17:31:09.629157Z 0 [ERROR] [MY-000000] [WSREP] Process completed with error: wsrep_sst_xtrabackup-v2 --role 'joiner' --address 'IP.62' --datadir '/mnt/data/mysql/' --basedir '/usr/' --plugindir '/usr/lib64/mysql/plugin/' --defaults-file '/etc/my.cnf' --defaults-group-suffix '' --parent '14665' --mysqld-version '8.0.28-19.1'   '' : 2 (No such file or directory)
2022-09-19T17:31:09.630222Z 0 [ERROR] [MY-000000] [WSREP] Failed to read uuid:seqno from joiner script.
2022-09-19T17:31:09.630319Z 0 [ERROR] [MY-000000] [WSREP] SST script aborted with error 2 (No such file or directory)
2022-09-19T17:31:09.630571Z 3 [Note] [MY-000000] [Galera] Processing SST received
2022-09-19T17:31:09.630681Z 3 [Note] [MY-000000] [Galera] SST received: 00000000-0000-0000-0000-000000000000:-1
2022-09-19T17:31:09.630774Z 3 [System] [MY-000000] [WSREP] SST completed
2022-09-19T17:31:09.631283Z 2 [Note] [MY-000000] [Galera]  str_proto_ver_: 3 sst_seqno_: -1 cc_seqno: 11760209 req->ist_len(): 74
2022-09-19T17:31:09.631404Z 2 [ERROR] [MY-000000] [Galera] Application received wrong state: 
	Received: 00000000-0000-0000-0000-000000000000
	Required: b392a4b7-a3c8-11e7-b022-632a7cf1c510

The proxysql 2 configuration/runtime does NOT match the guide (link above), and I’m not sure why or how to fix it.

mysql> select hostgroup_id,hostname,port,status,weight,max_connections from runtime_mysql_servers;
+--------------+---------------+-------+---------+--------+-----------------+
| hostgroup_id | hostname      | port  | status  | weight | max_connections |
+--------------+---------------+-------+---------+--------+-----------------+
| 2            | 49.39 | 13306 | SHUNNED | 100    | 1000            |
| 2            | 49.41 | 13306 | SHUNNED | 10     | 1000            |
| 2            | 49.62 | 13306 | ONLINE  | 100    | 1000            |
| 4            | 49.39 | 13306 | ONLINE  | 100    | 1000            |
| 4            | 49.41 | 13306 | ONLINE  | 10     | 1000            |
+--------------+---------------+-------+---------+--------+-----------------+
5 rows in set (0.00 sec)

I’m not sure how 2 nodes can be shunned but also online. I’m also not 100% positive this is leading to the SST/galera sync issue, but these are definitely not the behavior described in the guides.

Please let me know if you need more info. Our mysql config is essentially the same as what I’ve posted before:

Thanks!

1 Like

ProxySQL and SSTs have nothing to do with each other. These are two different issues. Your SST is failing for some reason. What you provided above is incomplete picture of the logs so it is difficult to determine the cause. Please ensure that ports 3306, 4444, 4567, and 4568 are open between all nodes. I suggest you set pxc_encrypt_cluster_traffic=OFF while working through this in order to rule out SSL as yet another point of failure/trouble.

Shut down your cluster. All nodes. Then bootstrap the first. Ensure it is online and that proxysql sees it online and that you can read/write through proxysql connection.

Then, after you do all that above, start node2. Let it IST/SST and sync. If this fails, you have a config issue somewhere that is preventing the SST/IST process. If this completes, check proxysql. Ensure everything is ONLINE. If not, you have a proxysql config issue. Check SELECT hostname, connect_error FROM mysql_server_connect_log ORDER BY time_start_us DESC LIMIT 3 and see if there’s errors connecting to the node.

Repeat for node3.

1 Like

So regarding the ports and encryption, we already have verified/done that.

I didn’t even get to do your tests yet when one of the nodes went down. It didn’t bring the cluster down at least. Here’s the log:

2022-09-20T14:07:15.769340Z 0 [Warning] [MY-000000] [Galera] unserialize error invalid protocol version 6: 71 (Protocol error)
	 at gcomm/src/gcomm/datagram.hpp:unserialize():133
2022-09-20T14:08:03.060248Z 0 [Warning] [MY-000000] [Galera] unserialize error invalid protocol version 2: 71 (Protocol error)
	 at gcomm/src/gcomm/datagram.hpp:unserialize():133
2022-09-20T14:08:08.078213Z 0 [Warning] [MY-000000] [Galera] unserialize error invalid protocol version 1: 71 (Protocol error)
	 at gcomm/src/gcomm/datagram.hpp:unserialize():133
2022-09-20T14:08:25.309815Z 0 [Warning] [MY-000000] [Galera] checksum failed, hdr: len=1 has_crc32=0 has_crc32c=0 crc32=1
2022-09-20T14:08:44.658716Z 0 [Warning] [MY-000000] [Galera] unserialize error invalid protocol version 2: 71 (Protocol error)
	 at gcomm/src/gcomm/datagram.hpp:unserialize():133
2022-09-20T14:08:54.056367Z 0 [Warning] [MY-000000] [Galera] unserialize error invalid protocol version 2: 71 (Protocol error)
	 at gcomm/src/gcomm/datagram.hpp:unserialize():133
2022-09-20T14:08:54.061281Z 0 [Warning] [MY-000000] [Galera] unserialize error invalid protocol version 2: 71 (Protocol error)
	 at gcomm/src/gcomm/datagram.hpp:unserialize():133
2022-09-20T14:09:14.687285Z 0 [Warning] [MY-000000] [Galera] unserialize error invalid protocol version 4: 71 (Protocol error)
	 at gcomm/src/gcomm/datagram.hpp:unserialize():133
terminate called after throwing an instance of 'boost::exception_detail::clone_impl<boost::exception_detail::error_info_injector<std::system_error> >'
  what():  remote_endpoint: Transport endpoint is not connected
2022-09-20T14:20:37.212537Z 0 [Note] [MY-000000] [WSREP] Initiating SST cancellation
14:20:37 UTC - mysqld got signal 6 ;
Most likely, you have hit a bug, but this error can also be caused by malfunctioning hardware.

Build ID: 197cca034159ea848cfc7c45f97087bb0d9c0428
Server Version: 8.0.28-19.1 Percona XtraDB Cluster (GPL), Release rel19, Revision f544540, WSREP version 26.4.3, wsrep_26.4.3

Thread pointer: 0x0
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 0 thread_stack 0x100000
/usr/sbin/mysqld(my_print_stacktrace(unsigned char const*, unsigned long)+0x3d) [0x20ef01d]
/usr/sbin/mysqld(print_fatal_signal(int)+0x323) [0x1182a03]
/usr/sbin/mysqld(handle_fatal_signal+0xc0) [0x1182ad0]
/lib64/libpthread.so.0(+0xf630) [0x7fa95ee10630]
/lib64/libc.so.6(gsignal+0x37) [0x7fa95d0fb387]
/lib64/libc.so.6(abort+0x148) [0x7fa95d0fca78]
/lib64/libstdc++.so.6(__gnu_cxx::__verbose_terminate_handler()+0x165) [0x7fa95da0ba95]
/lib64/libstdc++.so.6(+0x5ea06) [0x7fa95da09a06]
/lib64/libstdc++.so.6(+0x5ea33) [0x7fa95da09a33]
/lib64/libstdc++.so.6(+0x5ec53) [0x7fa95da09c53]
/usr/lib64/galera4/libgalera_smm.so(+0x1dbea) [0x7fa94d6d1bea]
/usr/lib64/galera4/libgalera_smm.so(+0x94748) [0x7fa94d748748]
/usr/lib64/galera4/libgalera_smm.so(+0xac341) [0x7fa94d760341]
/usr/lib64/galera4/libgalera_smm.so(+0xa3dab) [0x7fa94d757dab]
/usr/lib64/galera4/libgalera_smm.so(+0xa703a) [0x7fa94d75b03a]
/usr/lib64/galera4/libgalera_smm.so(+0xae7ef) [0x7fa94d7627ef]
/usr/lib64/galera4/libgalera_smm.so(+0x8c8d0) [0x7fa94d7408d0]
/usr/lib64/galera4/libgalera_smm.so(+0x1c64ee) [0x7fa94d87a4ee]
/usr/lib64/galera4/libgalera_smm.so(+0x1c6612) [0x7fa94d87a612]
/lib64/libpthread.so.0(+0x7ea5) [0x7fa95ee08ea5]
/lib64/libc.so.6(clone+0x6d) [0x7fa95d1c3b0d]
You may download the Percona XtraDB Cluster operations manual by visiting
http://www.percona.com/software/percona-xtradb-cluster/. You may find information
in the manual which will help you identify the cause of the crash.
1 Like