I have a 3 node cluster running percona xtradb 8.0.21-12.1 inside a docker container on 3 separate nodes.
The cluster works fine however I am adding firewall rules to one of the nodes (node1) to allow only mysql connections from cluster members only. My rule opens tcp ports 3306,4444,4567,4568 to each member ip.
So something must be wrong with my rule b/c as I add the firewall rules to only node1, node2 and node3 go into a a “Donor/Desynced” state. I am unable to get node1 back in the cluster and can see this in node2/3 logs
[Galera] Member 2.0 (node1) requested state transfer from ‘any’, but it is impossible to select State Transfer donor: Resource Temporarily unavailable
I don’t understand why these 2 nodes did this as they can still communicate. I verified with a remote mysql connection and telnet to each port (no firewall on these 2 nodes).
This is the second time I ran into this while applying my firewall rules. If I shutdown node1, node2 and node3 stay synced. I am a little confused by this behavior.
Do I have to bootstrap a node to get out of this donor/desynced state?
Thanks for any insight.
1 Like
Hello @jjengel11 ,
4567
is reserved for Galera Cluster Replication traffic. Multicast replication uses both TCP and UDP transport on this port.
Did you allow both TCP and UDP?
Some documentation: Firewall Settings — Galera Cluster Documentation
1 Like
I did not. I will open UDP as well for 4567.
I’m still confused as to why with one node dropping out, my cluster is in this “donor/desynced” state.
Do I have to boostrap the cluster to get out of this state?
Thanks
1 Like
With the loss of only 1 node, the other 2 should have maintained quorum. Bootstrap just means “start the very first node”. It is not a state of any particular node. You can run SET GLOBAL wsrep_provider_options="pc.bootstrap=true";
on node2 or node3 (only on 1 of them) to force them to re-establish quorum but that really shouldn’t be necessary. I would ensure that all 3 nodes are correctly online beforehand.
I do a similar firewall exercise in our PXC Training course and it works just fine.
With all 3 nodes connected, run this on node3 (for example):
iptables -A INPUT -s mysql1 -j DROP; \
iptables -A INPUT -s mysql2 -j DROP; \
iptables -A OUTPUT -s mysql1 -j DROP; \
iptables -A OUTPUT -s mysql2 -j DROP
This cuts off node 3 from the other 2. The other 2 will reform a stable cluster after about 10s. node3 will go into desync state.
1 Like
I’ve removed the firewall rules on node1 and I’m still in a state where node1 can’t IST or SST. I am assuming it cannot b/c of the “donor/desynced” status.
I agree, they should maintain quorum and removing the firewall rule should get all 3 nodes back online however I keep seeing Resource Temporarily unavailable message on node1
Running this command
wsrep_provider_options=“pc.bootstrap=true”
Gets the output of
[Galera] ignoring ‘pc.bootstrap’ in state PRIM
Both node2/3 are wsrep_cluster_status | Primary and they are still wsrep_local_state_comment | Donor/Desynced
I mentioned the bootstrap b/c I’m unsure of how to get out of this donor/desync state, yet they say they are a primary but an insert times out.
ERROR 1205 (HY000): Lock wait timeout exceeded; try restarting transaction
Stopping node3 and node2 is still showing that it is in a primary state (and still donor/desync).
Starting node3 now gets the same as node1
This behavior is not making sense
1 Like
So I was able to get this work. My FQDN was resolving to the openstack internal IP address and not the external IP address. Once that was resolved, the firewall rules as described in the percona documentation worked as expected.
Thanks
1 Like