Momentarily Inconsistent group replication status during auto rejoin of a network partitioned member

Rachit_Saxena · February 8, 2024, 11:09am

Cluster Topology: 3 node, MySQL Group Replication
Failure Scenario: Primary network partitioned with both secondaries.

When a primary gets network partitioned then other GR members (two secondaries) elect a primary within themselves as they have majority. All updates to the older PRIMARY gets blocked but the status of the older PRIMARY (as seen from itself is having status: ONLINE with role: PRIMARY. Also read_only & super_read_only variables were OFF. But when network partition is removed, for a very small period this old primary joins the group as PRIMARY only. Below are the details:

mysql> SELECT * FROM performance_schema.replication_group_members WHERE MEMBER_STATE='ONLINE';
+---------------------------+--------------------------------------+-----------------------------------------------------------------------+-------------+--------------+-------------+----------------+----------------------------+
| CHANNEL_NAME              | MEMBER_ID                            | MEMBER_HOST                                                           | MEMBER_PORT | MEMBER_STATE | MEMBER_ROLE | MEMBER_VERSION | MEMBER_COMMUNICATION_STACK |
+---------------------------+--------------------------------------+-----------------------------------------------------------------------+-------------+--------------+-------------+----------------+----------------------------+
| group_replication_applier | 45cdeacc-c64b-11ee-b0e2-0aa877aa3d42 | mysql-0.mysql.mysql.svc.cluster.local |        3306 | ONLINE       | PRIMARY     | 8.0.34         | XCom                       |
| group_replication_applier | 9e129538-c64a-11ee-b1d7-3613268f30d4 | mysql-1.mysql.mysql.svc.cluster.local |        3306 | ONLINE       | SECONDARY   | 8.0.34         | XCom                       |
| group_replication_applier | f29f5229-c64a-11ee-b153-be35069292ea | mysql-2.mysql.mysql.svc.cluster.local |        3306 | ONLINE       | SECONDARY   | 8.0.34         | XCom                       |
+---------------------------+--------------------------------------+-----------------------------------------------------------------------+-------------+--------------+-------------+----------------+----------------------------+
3 rows in set (0.00 sec)

But eventually this gets updated with the latest elected master.

mysql> SELECT * FROM performance_schema.replication_group_members WHERE MEMBER_STATE='ONLINE';
+---------------------------+--------------------------------------+-----------------------------------------------------------------------+-------------+--------------+-------------+----------------+----------------------------+
| CHANNEL_NAME              | MEMBER_ID                            | MEMBER_HOST                                                           | MEMBER_PORT | MEMBER_STATE | MEMBER_ROLE | MEMBER_VERSION | MEMBER_COMMUNICATION_STACK |
+---------------------------+--------------------------------------+-----------------------------------------------------------------------+-------------+--------------+-------------+----------------+----------------------------+
| group_replication_applier | 45cdeacc-c64b-11ee-b0e2-0aa877aa3d42 | mysql-0.mysql.mysql.svc.cluster.local |        3306 | ONLINE       | SECONDARY   | 8.0.34         | XCom                       |
| group_replication_applier | 9e129538-c64a-11ee-b1d7-3613268f30d4 | mysql-1.mysql.mysql.svc.cluster.local |        3306 | ONLINE       | PRIMARY     | 8.0.34         | XCom                       |
| group_replication_applier | f29f5229-c64a-11ee-b153-be35069292ea | mysql-2.mysql.mysql.svc.cluster.local |        3306 | ONLINE       | SECONDARY   | 8.0.34         | XCom                       |
+---------------------------+--------------------------------------+-----------------------------------------------------------------------+-------------+--------------+-------------+----------------+----------------------------+
3 rows in set (0.00 sec)

My doubts are:

Is this expected?
Would this have any effect on writes when monitored via ProxySQL?

matthewb · February 8, 2024, 2:36pm

The older PRIMARY should have gone into offline state. Are you using group_replication_paxos_single_leader? You should enable this if not. It does require a complete re-bootstrap of the GR. Single_leader=ON changes the way decisions are made in GR and helps with situations like yours.

Rachit_Saxena · February 9, 2024, 5:18am

Yes, our use-case is single PRIMARY only.

group_replication_paxos_single_leader is disabled on the GR cluster. Will try to reproduce the scenario after enabling this variable.

Additionally, if group replication is used with group_replication_paxos_single_leader disabled, are there any inconsistencies expected like split-brain? Or is it focused for consensus improvement?

Rachit_Saxena · February 9, 2024, 9:44am

@matthewb even with disabled group_replication_paxos_single_leader i see same logs and status.
The older PRIMARY is ONLINE, PRIMARY

matthewb · February 9, 2024, 3:47pm

Yes, you need to stop your cluster, enable paxos single leader, bootstrap your cluster, and then re-create your outage scenario. You won’t see any difference until you enable paxos single leader.

Note: paxos_single_leader is not the same thing as single primary/multi-primary. These are two different concepts.

Rachit_Saxena · February 11, 2024, 6:52am

Hi @matthewb
I enabled group_replication_paxos_single_leader=ON and completed cluster bootstrap. Still i observe the network partitioned primary is having ONLINE, PRIMARY state throughout network partitioning window.

matthewb · February 12, 2024, 7:27pm

If the node is network partitioned, why is it reporting ONLINE in your _members table?

Rachit_Saxena · February 13, 2024, 4:55am

Let me detail this out to clear confusions:

Pod-0: This is fully network partitioned.
I did kubectl exec to this Pod-0 and queried GR status, which shows ONLINE, PRIMARY with only single member row, which is itself.

Pod-1 and Pod-2: These pods are in majority and have elected Pod-1 as Primary.
Here both the nodes are online which are serving the read and writes.

As per my original question and after bootstrapping the cluster with group_replication_paxos_single_leader=ON, why the status of Pod-0 queried from itself is still ONLINE, PRIMARY?

matthewb · February 13, 2024, 2:59pm

You have two independent clusters with the same name. Your test is inaccurate. You need to first create a 3-node cluster, and then network partition off one of the members. When you do this, node1 will show only itself in offline state and node2/node3 will be online with a single primary member.

We do this very same lab in our Group Replication 101 Training class, but not with K8S and we use iptables to block all traffic from node1 to node2/node3. When we do this, node1 goes into offline state because it is no longer able to talk to node2/node3 and the other nodes elect a new primary.

Please provide all commands used to recreate your scenario so we can better understand exactly what you are doing.

Rachit_Saxena · February 13, 2024, 3:53pm

@matthewb this is single GR cluster only. Below output shows the configured cluster status.

mysql> SELECT * FROM performance_schema.replication_group_members WHERE MEMBER_STATE='ONLINE';
+---------------------------+--------------------------------------+-----------------------------------------------------------------------+-------------+--------------+-------------+----------------+----------------------------+
| CHANNEL_NAME              | MEMBER_ID                            | MEMBER_HOST                                                           | MEMBER_PORT | MEMBER_STATE | MEMBER_ROLE | MEMBER_VERSION | MEMBER_COMMUNICATION_STACK |
+---------------------------+--------------------------------------+-----------------------------------------------------------------------+-------------+--------------+-------------+----------------+----------------------------+
| group_replication_applier | 45cdeacc-c64b-11ee-b0e2-0aa877aa3d42 | mysql-0.mysql.mysql.svc.cluster.local |        3306 | ONLINE       | PRIMARY     | 8.0.34         | XCom                       |
| group_replication_applier | 9e129538-c64a-11ee-b1d7-3613268f30d4 | mysql-1.mysql.mysql.svc.cluster.local |        3306 | ONLINE       | SECONDARY   | 8.0.34         | XCom                       |
| group_replication_applier | f29f5229-c64a-11ee-b153-be35069292ea | mysql-2.mysql.mysql.svc.cluster.local |        3306 | ONLINE       | SECONDARY   | 8.0.34         | XCom                       |
+---------------------------+--------------------------------------+-----------------------------------------------------------------------+-------------+--------------+-------------+----------------+----------------------------+

Post network partitioning Pod-0 below status is seen on each of the PODs:

Pod-0:

mysql> SELECT * FROM performance_schema.replication_group_members WHERE MEMBER_STATE='ONLINE';
+---------------------------+--------------------------------------+-----------------------------------------------------------------------+-------------+--------------+-------------+----------------+----------------------------+
| CHANNEL_NAME              | MEMBER_ID                            | MEMBER_HOST                                                           | MEMBER_PORT | MEMBER_STATE | MEMBER_ROLE | MEMBER_VERSION | MEMBER_COMMUNICATION_STACK |
+---------------------------+--------------------------------------+-----------------------------------------------------------------------+-------------+--------------+-------------+----------------+----------------------------+
| group_replication_applier | 45cdeacc-c64b-11ee-b0e2-0aa877aa3d42 | mysql-0.mysql.mysql.svc.cluster.local |        3306 | ONLINE       | PRIMARY     | 8.0.34         | XCom                       |
+---------------------------+--------------------------------------+-----------------------------------------------------------------------+-------------+--------------+-------------+----------------+----------------------------+

Pod 1 and Pod 2:

mysql> SELECT * FROM performance_schema.replication_group_members WHERE MEMBER_STATE='ONLINE';
+---------------------------+--------------------------------------+-----------------------------------------------------------------------+-------------+--------------+-------------+----------------+----------------------------+
| CHANNEL_NAME              | MEMBER_ID                            | MEMBER_HOST                                                           | MEMBER_PORT | MEMBER_STATE | MEMBER_ROLE | MEMBER_VERSION | MEMBER_COMMUNICATION_STACK |
+---------------------------+--------------------------------------+-----------------------------------------------------------------------+-------------+--------------+-------------+----------------+----------------------------+
| group_replication_applier | 9e129538-c64a-11ee-b1d7-3613268f30d4 | mysql-1.mysql.mysql.svc.cluster.local |        3306 | ONLINE       | PRIMARY   | 8.0.34         | XCom                       |
| group_replication_applier | f29f5229-c64a-11ee-b153-be35069292ea | mysql-2.mysql.mysql.svc.cluster.local |        3306 | ONLINE       | SECONDARY   | 8.0.34         | XCom                       |
+---------------------------+--------------------------------------+-----------------------------------------------------------------------+-------------+--------------+-------------+----------------+----------------------------+

You have two independent clusters with the same name

Can you help me understand from which point you arrived to this conclusion?

matthewb · February 13, 2024, 5:54pm

If you have pod0 in PRIMARY state with 1 member, and you have pod1/pod2, also in PRIMARY state, then you have two independent clusters with the same name. That’s the result of whatever testing you are doing here.

For whatever reason, when pod0 is partitioned, it is bootstrapping itself thereby creating a new cluster. Have you looked at pod0’s logs? There is most likely evidence that bootstrapping is taking place.

Once again, I ask please provide all exact, complete, commands used to recreate your scenario so we can better understand exactly what you are doing.

Additionally, are you using the Percona Operator, or is this something else?

Topic		Replies	Views
3 node Group Replication cluster is accepting writes with single primary online upon 2 pod failure MySQL & MariaDB	2	426	March 4, 2024
Network Parition results in two non-primary components. Percona XtraDB Cluster 5.x	1	1964	October 14, 2018
Rollback on group replication, no error logged Other MySQL® Questions	3	144	October 4, 2024
MySQL stops handling requests when restarting mysql on other nodes --- donor/desync Percona XtraDB Cluster 5.x	4	3927	July 1, 2014
Network split caused a quorum number of members to be non-prim Percona XtraDB Cluster 5.x	12	1335	January 22, 2019

Momentarily Inconsistent group replication status during auto rejoin of a network partitioned member

Related topics