3 node cluster goes into Donor/Desynced state

jjengel11 · January 4, 2022, 5:37pm

I have a 3 node cluster running percona xtradb 8.0.21-12.1 inside a docker container on 3 separate nodes.

The cluster works fine however I am adding firewall rules to one of the nodes (node1) to allow only mysql connections from cluster members only. My rule opens tcp ports 3306,4444,4567,4568 to each member ip.

So something must be wrong with my rule b/c as I add the firewall rules to only node1, node2 and node3 go into a a “Donor/Desynced” state. I am unable to get node1 back in the cluster and can see this in node2/3 logs

[Galera] Member 2.0 (node1) requested state transfer from ‘any’, but it is impossible to select State Transfer donor: Resource Temporarily unavailable

I don’t understand why these 2 nodes did this as they can still communicate. I verified with a remote mysql connection and telnet to each port (no firewall on these 2 nodes).

This is the second time I ran into this while applying my firewall rules. If I shutdown node1, node2 and node3 stay synced. I am a little confused by this behavior.

Do I have to bootstrap a node to get out of this donor/desynced state?

Thanks for any insight.

matthewb · January 4, 2022, 5:58pm

Hello @jjengel11 ,

4567 is reserved for Galera Cluster Replication traffic. Multicast replication uses both TCP and UDP transport on this port.

Did you allow both TCP and UDP?

Some documentation: Firewall Settings — Galera Cluster Documentation

jjengel11 · January 4, 2022, 6:01pm

I did not. I will open UDP as well for 4567.

I’m still confused as to why with one node dropping out, my cluster is in this “donor/desynced” state.

Do I have to boostrap the cluster to get out of this state?

Thanks

matthewb · January 4, 2022, 6:07pm

With the loss of only 1 node, the other 2 should have maintained quorum. Bootstrap just means “start the very first node”. It is not a state of any particular node. You can run SET GLOBAL wsrep_provider_options="pc.bootstrap=true"; on node2 or node3 (only on 1 of them) to force them to re-establish quorum but that really shouldn’t be necessary. I would ensure that all 3 nodes are correctly online beforehand.

I do a similar firewall exercise in our PXC Training course and it works just fine.

With all 3 nodes connected, run this on node3 (for example):

   iptables -A INPUT -s mysql1 -j DROP; \
   iptables -A INPUT -s mysql2 -j DROP; \
   iptables -A OUTPUT -s mysql1 -j DROP; \
   iptables -A OUTPUT -s mysql2 -j DROP

This cuts off node 3 from the other 2. The other 2 will reform a stable cluster after about 10s. node3 will go into desync state.

jjengel11 · January 4, 2022, 6:22pm

I’ve removed the firewall rules on node1 and I’m still in a state where node1 can’t IST or SST. I am assuming it cannot b/c of the “donor/desynced” status.

I agree, they should maintain quorum and removing the firewall rule should get all 3 nodes back online however I keep seeing Resource Temporarily unavailable message on node1

Running this command

wsrep_provider_options=“pc.bootstrap=true”

Gets the output of

[Galera] ignoring ‘pc.bootstrap’ in state PRIM

Both node2/3 are wsrep_cluster_status | Primary and they are still wsrep_local_state_comment | Donor/Desynced

I mentioned the bootstrap b/c I’m unsure of how to get out of this donor/desync state, yet they say they are a primary but an insert times out.

ERROR 1205 (HY000): Lock wait timeout exceeded; try restarting transaction

Stopping node3 and node2 is still showing that it is in a primary state (and still donor/desync).

Starting node3 now gets the same as node1

This behavior is not making sense

jjengel11 · January 24, 2022, 6:36pm

So I was able to get this work. My FQDN was resolving to the openstack internal IP address and not the external IP address. Once that was resolved, the firewall rules as described in the percona documentation worked as expected.

Thanks

Topic		Replies	Views
Nodes Cannot Join the Percona XtraDBCluster 8.0.33 Percona XtraDB Cluster 8.x	8	722	October 23, 2023
Can't connect to cluster after wsrep error - xtrabackup_checkpoints missing Percona XtraDB Cluster 8.x	4	3986	October 13, 2022
State transfer failed: -13 (Permission denied) Percona XtraDB Cluster 8.x	17	1489	October 17, 2023
MySQL stops handling requests when restarting mysql on other nodes --- donor/desync Percona XtraDB Cluster 5.x	4	3916	July 1, 2014
MariaDB/Galera, donor node stops responding when SST fails to a new node, and brings the cluster down, MariaDB Server & Utilities	2	1747	July 6, 2021

3 node cluster goes into Donor/Desynced state

Related topics