Cluster randomly going NON-PRIM and then recovering.

sgonzalez.pa · September 19, 2017, 12:22pm

Hello all, I am having an issue where some/all the nodes in my cluster are losing connection to each other. This was happening pretty infrequently, say once a week or less, but has in the past week or so become pretty commonplace. Today it has happened 3 times. In the process of trying to debug this I have ruled out any basic network connectivity issues as being the culprit. My gut tells me it has to do with our write-heavy loads and these servers going out of sync with each other or something like that. Unfortunately I lack the experience to properly asses this and pinpoint the issues. Any help would be greatly appreciated.

The cluster is comprised of 5 nodes, spread across 2 data centers with a GRE tunnel between them, latency over the link is usually in the 5-15ms range. 3 nodes at site A and 2 at site B. We restrict writes to a single node at A and are doing reads from only the 3 nodes at A, with B existing as a DR site. There is an HAProxy instance in front of the cluster to handle the load balancing.

When first spinning up the nodes in the second site I did have some issues due to the added latency but tweaking some settings in my.cnf seemed to alleviate it. I am unsure exactly what is best to post here in way of logs or listings to help debug this. All nodes are running Centos7 w/ Percona XtraDB Cluster 5.7.18-29.20.1.el7.x86_64

Topic		Replies	Views
Percona XtraDB Cluster Node Crashing Once a Day Percona XtraDB Cluster 5.x	2	1266	June 6, 2013
Node is alone in cluster Percona XtraDB Cluster 8.x	2	983	March 30, 2023
High availability is not working in Percona xtra db cluster Percona XtraDB Cluster 5.x	5	1417	November 9, 2018
Database cluster hang problem Percona XtraDB Cluster 8.x mysql , percona	5	1373	March 14, 2023
Percona XtraDB MySQL Cluster not replicate between nodes Percona XtraDB Cluster 5.x	2	981	May 29, 2013

Cluster randomly going NON-PRIM and then recovering.

Related topics