Issue with replicated xtradb cluster accross multiple datacenters

zozotk · January 2, 2024, 9:00am

Hello,

I have a master-master xtradb cluster configured accross multiple datacenters. All is good but from time to time one of the nodes goes down as it cannot check the status of the replication (possibly a timeout of some sort).
All I can see in the logs is

2024-01-02T08:58:28.135066Z 0 [Warning] [MY-000000] [Galera] last inactive check more than PT1.5S (3*evs.inactive_check_period) ago (PT2.70677S), skipping check

Can I configure something in the cluster so that I can avoid the above situation?

Regards,

matthewb · January 2, 2024, 2:29pm

@zozotk,
You can increase the timeout period/intervals but this can lead to potential split-brain and/or complete node disconnects. Have a look at these 3 parameters:

https://galeracluster.com/library/documentation/galera-parameters.html#evs-delay-margin
and evs.inactive_check_period

I would also set up a simple script to ping host A from B every 1s and monitor general network issues. If you are experiencing packet loss between your datacenters, you need to talk with your provider and find out why their networks are bad.

zozotk · January 2, 2024, 2:31pm

Thank you, will investigate further.

Topic		Replies	Views
Replication + Galera = Timeout? Percona XtraDB Cluster 5.x	2	14031	July 31, 2012
Questions on Timeout Options Percona XtraDB Cluster 5.x	0	498	September 28, 2012
Cluster behavior with high packet loss Percona XtraDB Cluster 5.x	0	874	October 9, 2013
Percona Cluster DR 1 node in aws Percona XtraDB Cluster 5.x	2	519	June 29, 2017
Problem about Master - Master replication and lost connection Percona XtraDB Cluster 5.x	0	1050	July 5, 2014

Issue with replicated xtradb cluster accross multiple datacenters

Related topics