I have a master-master xtradb cluster configured accross multiple datacenters. All is good but from time to time one of the nodes goes down as it cannot check the status of the replication (possibly a timeout of some sort).
All I can see in the logs is
2024-01-02T08:58:28.135066Z 0 [Warning] [MY-000000] [Galera] last inactive check more than PT1.5S (3*evs.inactive_check_period) ago (PT2.70677S), skipping check
Can I configure something in the cluster so that I can avoid the above situation?
@zozotk,
You can increase the timeout period/intervals but this can lead to potential split-brain and/or complete node disconnects. Have a look at these 3 parameters:
I would also set up a simple script to ping host A from B every 1s and monitor general network issues. If you are experiencing packet loss between your datacenters, you need to talk with your provider and find out why their networks are bad.