Momentarily Inconsistent group replication status during auto rejoin of a network partitioned member

Cluster Topology: 3 node, MySQL Group Replication
Failure Scenario: Primary network partitioned with both secondaries.

When a primary gets network partitioned then other GR members (two secondaries) elect a primary within themselves as they have majority. All updates to the older PRIMARY gets blocked but the status of the older PRIMARY (as seen from itself is having status: ONLINE with role: PRIMARY. Also read_only & super_read_only variables were OFF. But when network partition is removed, for a very small period this old primary joins the group as PRIMARY only. Below are the details:

mysql> SELECT * FROM performance_schema.replication_group_members WHERE MEMBER_STATE='ONLINE';
+---------------------------+--------------------------------------+-----------------------------------------------------------------------+-------------+--------------+-------------+----------------+----------------------------+
| CHANNEL_NAME              | MEMBER_ID                            | MEMBER_HOST                                                           | MEMBER_PORT | MEMBER_STATE | MEMBER_ROLE | MEMBER_VERSION | MEMBER_COMMUNICATION_STACK |
+---------------------------+--------------------------------------+-----------------------------------------------------------------------+-------------+--------------+-------------+----------------+----------------------------+
| group_replication_applier | 45cdeacc-c64b-11ee-b0e2-0aa877aa3d42 | mysql-0.mysql.mysql.svc.cluster.local |        3306 | ONLINE       | PRIMARY     | 8.0.34         | XCom                       |
| group_replication_applier | 9e129538-c64a-11ee-b1d7-3613268f30d4 | mysql-1.mysql.mysql.svc.cluster.local |        3306 | ONLINE       | SECONDARY   | 8.0.34         | XCom                       |
| group_replication_applier | f29f5229-c64a-11ee-b153-be35069292ea | mysql-2.mysql.mysql.svc.cluster.local |        3306 | ONLINE       | SECONDARY   | 8.0.34         | XCom                       |
+---------------------------+--------------------------------------+-----------------------------------------------------------------------+-------------+--------------+-------------+----------------+----------------------------+
3 rows in set (0.00 sec)

But eventually this gets updated with the latest elected master.

mysql> SELECT * FROM performance_schema.replication_group_members WHERE MEMBER_STATE='ONLINE';
+---------------------------+--------------------------------------+-----------------------------------------------------------------------+-------------+--------------+-------------+----------------+----------------------------+
| CHANNEL_NAME              | MEMBER_ID                            | MEMBER_HOST                                                           | MEMBER_PORT | MEMBER_STATE | MEMBER_ROLE | MEMBER_VERSION | MEMBER_COMMUNICATION_STACK |
+---------------------------+--------------------------------------+-----------------------------------------------------------------------+-------------+--------------+-------------+----------------+----------------------------+
| group_replication_applier | 45cdeacc-c64b-11ee-b0e2-0aa877aa3d42 | mysql-0.mysql.mysql.svc.cluster.local |        3306 | ONLINE       | SECONDARY   | 8.0.34         | XCom                       |
| group_replication_applier | 9e129538-c64a-11ee-b1d7-3613268f30d4 | mysql-1.mysql.mysql.svc.cluster.local |        3306 | ONLINE       | PRIMARY     | 8.0.34         | XCom                       |
| group_replication_applier | f29f5229-c64a-11ee-b153-be35069292ea | mysql-2.mysql.mysql.svc.cluster.local |        3306 | ONLINE       | SECONDARY   | 8.0.34         | XCom                       |
+---------------------------+--------------------------------------+-----------------------------------------------------------------------+-------------+--------------+-------------+----------------+----------------------------+
3 rows in set (0.00 sec)

My doubts are:

  1. Is this expected?
  2. Would this have any effect on writes when monitored via ProxySQL?

The older PRIMARY should have gone into offline state. Are you using group_replication_paxos_single_leader? You should enable this if not. It does require a complete re-bootstrap of the GR. Single_leader=ON changes the way decisions are made in GR and helps with situations like yours.

Yes, our use-case is single PRIMARY only.

group_replication_paxos_single_leader is disabled on the GR cluster. Will try to reproduce the scenario after enabling this variable.

Additionally, if group replication is used with group_replication_paxos_single_leader disabled, are there any inconsistencies expected like split-brain? Or is it focused for consensus improvement?

@matthewb even with disabled group_replication_paxos_single_leader i see same logs and status.
The older PRIMARY is ONLINE, PRIMARY

Yes, you need to stop your cluster, enable paxos single leader, bootstrap your cluster, and then re-create your outage scenario. You won’t see any difference until you enable paxos single leader.

Note: paxos_single_leader is not the same thing as single primary/multi-primary. These are two different concepts.

Hi @matthewb
I enabled group_replication_paxos_single_leader=ON and completed cluster bootstrap. Still i observe the network partitioned primary is having ONLINE, PRIMARY state throughout network partitioning window.

If the node is network partitioned, why is it reporting ONLINE in your _members table?

Let me detail this out to clear confusions:

Pod-0: This is fully network partitioned.
I did kubectl exec to this Pod-0 and queried GR status, which shows ONLINE, PRIMARY with only single member row, which is itself.

Pod-1 and Pod-2: These pods are in majority and have elected Pod-1 as Primary.
Here both the nodes are online which are serving the read and writes.

As per my original question and after bootstrapping the cluster with group_replication_paxos_single_leader=ON, why the status of Pod-0 queried from itself is still ONLINE, PRIMARY?

You have two independent clusters with the same name. Your test is inaccurate. You need to first create a 3-node cluster, and then network partition off one of the members. When you do this, node1 will show only itself in offline state and node2/node3 will be online with a single primary member.

We do this very same lab in our Group Replication 101 Training class, but not with K8S and we use iptables to block all traffic from node1 to node2/node3. When we do this, node1 goes into offline state because it is no longer able to talk to node2/node3 and the other nodes elect a new primary.

Please provide all commands used to recreate your scenario so we can better understand exactly what you are doing.

@matthewb this is single GR cluster only. Below output shows the configured cluster status.

mysql> SELECT * FROM performance_schema.replication_group_members WHERE MEMBER_STATE='ONLINE';
+---------------------------+--------------------------------------+-----------------------------------------------------------------------+-------------+--------------+-------------+----------------+----------------------------+
| CHANNEL_NAME              | MEMBER_ID                            | MEMBER_HOST                                                           | MEMBER_PORT | MEMBER_STATE | MEMBER_ROLE | MEMBER_VERSION | MEMBER_COMMUNICATION_STACK |
+---------------------------+--------------------------------------+-----------------------------------------------------------------------+-------------+--------------+-------------+----------------+----------------------------+
| group_replication_applier | 45cdeacc-c64b-11ee-b0e2-0aa877aa3d42 | mysql-0.mysql.mysql.svc.cluster.local |        3306 | ONLINE       | PRIMARY     | 8.0.34         | XCom                       |
| group_replication_applier | 9e129538-c64a-11ee-b1d7-3613268f30d4 | mysql-1.mysql.mysql.svc.cluster.local |        3306 | ONLINE       | SECONDARY   | 8.0.34         | XCom                       |
| group_replication_applier | f29f5229-c64a-11ee-b153-be35069292ea | mysql-2.mysql.mysql.svc.cluster.local |        3306 | ONLINE       | SECONDARY   | 8.0.34         | XCom                       |
+---------------------------+--------------------------------------+-----------------------------------------------------------------------+-------------+--------------+-------------+----------------+----------------------------+

Post network partitioning Pod-0 below status is seen on each of the PODs:

Pod-0:

mysql> SELECT * FROM performance_schema.replication_group_members WHERE MEMBER_STATE='ONLINE';
+---------------------------+--------------------------------------+-----------------------------------------------------------------------+-------------+--------------+-------------+----------------+----------------------------+
| CHANNEL_NAME              | MEMBER_ID                            | MEMBER_HOST                                                           | MEMBER_PORT | MEMBER_STATE | MEMBER_ROLE | MEMBER_VERSION | MEMBER_COMMUNICATION_STACK |
+---------------------------+--------------------------------------+-----------------------------------------------------------------------+-------------+--------------+-------------+----------------+----------------------------+
| group_replication_applier | 45cdeacc-c64b-11ee-b0e2-0aa877aa3d42 | mysql-0.mysql.mysql.svc.cluster.local |        3306 | ONLINE       | PRIMARY     | 8.0.34         | XCom                       |
+---------------------------+--------------------------------------+-----------------------------------------------------------------------+-------------+--------------+-------------+----------------+----------------------------+

Pod 1 and Pod 2:

mysql> SELECT * FROM performance_schema.replication_group_members WHERE MEMBER_STATE='ONLINE';
+---------------------------+--------------------------------------+-----------------------------------------------------------------------+-------------+--------------+-------------+----------------+----------------------------+
| CHANNEL_NAME              | MEMBER_ID                            | MEMBER_HOST                                                           | MEMBER_PORT | MEMBER_STATE | MEMBER_ROLE | MEMBER_VERSION | MEMBER_COMMUNICATION_STACK |
+---------------------------+--------------------------------------+-----------------------------------------------------------------------+-------------+--------------+-------------+----------------+----------------------------+
| group_replication_applier | 9e129538-c64a-11ee-b1d7-3613268f30d4 | mysql-1.mysql.mysql.svc.cluster.local |        3306 | ONLINE       | PRIMARY   | 8.0.34         | XCom                       |
| group_replication_applier | f29f5229-c64a-11ee-b153-be35069292ea | mysql-2.mysql.mysql.svc.cluster.local |        3306 | ONLINE       | SECONDARY   | 8.0.34         | XCom                       |
+---------------------------+--------------------------------------+-----------------------------------------------------------------------+-------------+--------------+-------------+----------------+----------------------------+

You have two independent clusters with the same name

Can you help me understand from which point you arrived to this conclusion?

If you have pod0 in PRIMARY state with 1 member, and you have pod1/pod2, also in PRIMARY state, then you have two independent clusters with the same name. That’s the result of whatever testing you are doing here.

For whatever reason, when pod0 is partitioned, it is bootstrapping itself thereby creating a new cluster. Have you looked at pod0’s logs? There is most likely evidence that bootstrapping is taking place.

Once again, I ask please provide all exact, complete, commands used to recreate your scenario so we can better understand exactly what you are doing.

Additionally, are you using the Percona Operator, or is this something else?