3 node Group Replication cluster is accepting writes with single primary online upon 2 pod failure

Rachit_Saxena · December 19, 2023, 5:14am

Objective: Test behaviour of cluster upon 2 consecutive pod failure
Cluster Detail: 3 node Group replication cluster
Rejoin disabled (for testing): SET PERSIST group_replication_start_on_boot=OFF;
Platform: Kubernetes
Image used: percona/percona-server:8.0.34-aarch64

Steps:

On a 3 node healthy cluster setup, we disabled the rejoin: SET PERSIST group_replication_start_on_boot=OFF;
Issued pod delete command: kubectl delete mysql-1 mysql-2
Identified both the primary (mysql-1) and secondary (mysql-2) were shutdown via mysql logs and left the replica group.
The only available node was mysql-0 as PRIMARY and ONLINE, accepting writes.
PFA screenshot:

Screenshot 2023-12-19 at 12.02.46 AM3456×358 113 KB
Same behaviour happens when we delete mysql-0 and mysql-1. mysql-2 eventually gets elected as PRIMARY in few seconds and is PRIMARY & ONLINE accepting writes.

Question:

As per the behaviour, the cluster must be in RO mode when majority of the mysql nodes are down. Why is the above cluster still writable?
Is there any mysql configuration we need to explicitly set to disable this behaviour?

Rachit_Saxena · December 19, 2023, 10:59am

Was able to identify the reason:

Reference: https://dev.mysql.com/doc/refman/8.0/en/group-replication-network-partitioning.html

On the other hand, if servers exit the group voluntarily, they instruct the 
group that it should reconfigure itself. In practice, this means that a server 
that is leaving tells others that it is going away. This means that other 
members can reconfigure the group properly, the consistency of the membership 
is maintained and the majority is recalculated. For example, in the above 
scenario of 5 servers where 3 leave at once, if the 3 leaving servers warn
the group that they are leaving, one by one, then the membership is able
to adjust itself from 5 to 2, and at the same time, securing quorum while 
that happens.

Rachit_Saxena · March 4, 2024, 7:53am

@matthewb / team

Is there any mysql setting / configuration using which if a cluster size gets reduced to 1, even after voluntary exit of members, the primary will get set with READ_ONLY = ON?
As i want to restrict writes to a single GR node as in case its down will cause data loss.

Topic		Replies	Views
Momentarily Inconsistent group replication status during auto rejoin of a network partitioned member MySQL & MariaDB	10	348	February 13, 2024
MySQL stops handling requests when restarting mysql on other nodes --- donor/desync Percona XtraDB Cluster 5.x	4	3923	July 1, 2014
Rollback on group replication, no error logged Other MySQL® Questions	3	140	October 4, 2024
Percona Operator for MySQL based on Percona xtradb cluster HA problems Percona Operator for MySQL	2	828	August 18, 2023
Auto Cloning for Distributed Recovery is not working in Mysql GR cluster Percona Server for MySQL 8.0 mysql , percona	7	121	July 19, 2024

3 node Group Replication cluster is accepting writes with single primary online upon 2 pod failure

Related topics