Percona XtraDB write-set replication traffic keep alive

Giovanni_Vecchi · September 23, 2021, 8:45am

Hi there,

my config is a 7 nodes Percona XtraDB 5.7 geo cluster: 5 nodes are EC2 AWS instances and 2 are vSphere VM in our on-prem environment; on-prem nodes can contact cloud nodes thanks to a Site-to-Site tunnesl between our on-prem firewalls and AWS VPC (the subnets are routed, not NATted).
The cluster is the backend of our internal DNS solution (powerdns) and provide a native multi-master configuration also for DNS servers; due the nature of our infrastructure, DNS records rarely change and so there is few replica traffic: because of this our on-prem firewall periodically send TCP reset to on-prem nodes for connections to aws nodes on port 4567 and this cause a dayly nodes evictions and re-join.
Is it possible to configure any sort of keep alive functionality in order to prevent TCP connection reset by the firewalls?

Thanks

matthewb · September 23, 2021, 2:35pm

https://galeracluster.com/library/documentation/galera-parameters.html#evs-keepalive-period

Keepalive heartbeat packets are sent to every node, every 1s by default. This is not enough to keep your firewall happy?

Another option is to simply write something to a dummy table every 1s. Use something like pt-heartbeat, which UPDATEs 1 record every 1s. That will create constant traffic. If your nodes are still being evicted, then you should check the config of your firewall.

Giovanni_Vecchi · September 23, 2021, 4:24pm

Hi @matthewb

thanks for your suggestions: are the keepalive heartbeat packets sent through TCP/4567 port? If yes, I think I’m on the wrong way to solve my issue.

Thanks again!

matthewb · September 23, 2021, 5:18pm

4567 is reserved for Galera Cluster Replication traffic. Multicast replication uses both TCP and UDP transport on this port.

Yes, everything Galera traffic is over 4567

Giovanni_Vecchi · September 27, 2021, 9:21am

Thanks for pointing me to the right documentation: is the keepalive messages logged? May I need to start mysql daemon with an higher verbosity option?
In the meantime, I will go ahead with firewall troubleshooting.

Thanks again!

matthewb · September 27, 2021, 4:17pm

The messages are part of the core Galera protocol and are not logged, AFAIK. What is logged is when the heartbeats are missed, and you’ve seen that already.

Giovanni_Vecchi · September 28, 2021, 8:04am

I was able to confirm keepalive packets running a tcpdump capture on servers, something like that:

sudo tcpdump -n -i <server_interface> ‘dst (<node1_ip> or <node2_ip> or <node3_ip> or <node4_ip> or <node5_ip> or <node6_ip>) and dst port 4567’

TCP reset are logged about once or twice a day for every node ip on the firewall but I’m still unable to find the root cause.
Saying that, I’ll go ahead closing this thread.

Thank you all guys!

Topic		Replies	Views
Percona XtraDB MySQL Cluster not replicate between nodes Percona XtraDB Cluster 5.x	2	952	May 29, 2013
Nodes not reconnecting after failure Percona XtraDB Cluster 5.x	1	838	August 16, 2013
How really secure Percona cluster? Percona XtraDB Cluster 5.x	2	691	January 28, 2013
Have an issue with cluster 2 Nodes keep dropping offline every day & rejoin issues. Percona XtraDB Cluster 5.x	2	1652	June 19, 2015
Replication + Galera = Timeout? Percona XtraDB Cluster 5.x	2	14043	July 31, 2012

Percona XtraDB write-set replication traffic keep alive

Related topics