Percona Operator for MySQL based on Percona xtradb cluster HA problems

Pavlo_Tkachenko · August 17, 2023, 2:59pm

Description:

Hello,I have 3 workers available. We are using the Percona Operator for MySQL solution based on Percona xtradb cluster in a helm deployment with 3 replicas for HA[kubernetes].
But when we disabled two workers, the cluster stopped responding, although one worker was available ha proxy and pxc hung in this state:

xmysql-pxc-db-haproxy-0                          2/2     Terminating   5 (50m ago)   70m
xmysql-pxc-db-haproxy-1                          1/2     Running       8 (11s ago)   69m
xmysql-pxc-db-haproxy-2                          2/2     Terminating   5 (49m ago)   68m
xmysql-pxc-db-pxc-0                              3/3     Terminating   1 (46m ago)   48m
xmysql-pxc-db-pxc-1                              2/3     Running       4 (12m ago)   69m
xmysql-pxc-db-pxc-2                              3/3     Terminating   1 (46m ago)   47m

Is this the correct behavior of the cluster?
Is it possible to make a cluster with 3 nodes work after excluding 2 nodes from the cluster, that is, on 1 node and will this not damage the operation of the database?

matthewb · August 17, 2023, 6:55pm

Hey @Pavlo_Tkachenko,
By “disabled” and “Terminating”, I’m thinking this is similar to kill -9? If that is the case, then killing 2 PXC nodes at the same time will certainly lead to this situation. The quorum count of a PXC increases when a node connects and joins. The count only decreases on graceful shutdown of a node.

48m ago, your pxc-1 saw a count of 3/3 and all was good. 2m later, the count is now 1/3. pxc-1 no longer has a majority of nodes functioning thus the cluster goes into a non-primary state. This is quite typical behavior of any cluster (ie: loss of quorum / loss of majority = cluster shutdown).

If you want to make this single node begin working again, you must force it back online from this unstable state. Connect to MySQL within that pod and run

SET GLOBAL wsrep_provider_options='pc.bootstrap=true';

This will reset the quorum counter to 1 on this node and it will come back online.

Had you terminated just 1 of the 3, the cluster would have remained as 2/3 is a majority. Or, had you gracefully shut down pxc-0, then pxc-2, that would have decremented the quorum counter and eventually reached 1/1 and it would have remained online.

What you experienced is correct and expected behavior to protect the data and the cluster from errant writes and potential split-brain situations.

Pavlo_Tkachenko · August 18, 2023, 5:32am

Thank you for your reply!

Topic		Replies	Views
PXC cluster fails after single pod failure Percona Operator for MySQL	4	547	March 11, 2024
Cluster status and backups not working Percona Operator for MySQL	5	1057	October 28, 2021
HA-Proxy doesn't start to allow Connections as soon as a minimum number of required pxc Pods are ready Percona Operator for MySQL	2	1401	May 1, 2023
PXC cluster for mysql is not choosing the secondary as primary Percona XtraDB Cluster 5.x mysql , percona	2	347	September 10, 2024
Can't set up 1 node pxc on kubernetes cluster Percona XtraDB Cluster 5.x mysql , percona	2	502	February 1, 2024

Percona Operator for MySQL based on Percona xtradb cluster HA problems

Description:

Related topics