Rollback on group replication, no error logged

Baptiste_Mille-Mathi · October 3, 2024, 12:29pm

Hello

mysql version: 5.7.42

We have a simple python application which is working fine in a one-node setup.
To have high-availability we’ve setup a 3 nodes group replication in multi-primary mode, with only one mysql server used for now for doing the queries.
On a regular basis, since then the application receives rollbacks with this message

pymysql.err.OperationalError: (3101, 'Plugin instructed the server to rollback the current transaction.')

We cannot find in the various log files (general, error or slow) a relevant information stating why the operation fails on group replication.

Is there place we could find information ?

matthewb · October 3, 2024, 3:46pm

This isn’t a recommended mode. Are you able to repeat your issue in single-primary?

I would have also recommended you enable paxos_single_leader, but you have not upgraded.

Keep in mind that any bug you discover from this in GR won’t get fixed since you are on a dead MySQL version. You should upgrade first to the latest 8.0 and see if your issue remains.

Baptiste_Mille-Mathi · October 4, 2024, 5:51am

Hi @matthewb

This isn’t a recommended mode.

Sorry but I did not see this, perhaps I overlooked some document, do you have a reference ? We take this decision based on https://dev.mysql.com/doc/refman/5.7/en/group-replication-deploying-instances.html

We tried to run the group_replication with only one node active, and we get the same result.

Upgrade to MySQL 8.0 is on the way, this is just we wanted to provide more availability before migrating,

edit: We can replicate on a dev environment, and switching to single primary mode fixes the issue but this is not our target implementation, in case of failure we want the switchover to happen automatically, so still hearing for advice to debug this.

Best.

matthewb · October 4, 2024, 2:22pm

It’s not in the docs. It’s a best-practices recommendation. When in single primary, only 1 server accepts writes. Coordination and trx certification are more efficient as only one node processes these requests. In multi-primary mode, it is more complicated and performance can suffer.

I suggest you re-read over the Group Replication documentation. I think you might have an incorrect understanding of the difference between single-primary and multi-primary.

Neither mode have anything do to with automatic switchover. If you are in single primary mode, and that node fails, one of the existing secondaries is automatically chosen as the new primary. This new primary disables the read-only flag and begins to accept writes. This is core design of GR.

I’ll assume you are using some sort of router (ie: proxysql or mysql router) that is “group replication aware” meaning that when a new primary is selected, the router automatically updates its rules and your application is none the wiser. This is also core to GR (or any ‘cluster’) architecture design.

Topic		Replies	Views
Concerns regarding the MySQL group replication	4	612	March 4, 2024
Group replication failures and recovery MySQL & MariaDB	3	892	October 18, 2023
Group Replication / Worker failed executing transaction Percona Server for MySQL 8.0	1	3777	March 9, 2023
3 node Group Replication cluster is accepting writes with single primary online upon 2 pod failure MySQL & MariaDB	2	419	March 4, 2024
Backup Error on Xtradb cluster Percona Operator for MySQL mysql	5	1127	March 7, 2021

Rollback on group replication, no error logged

Related topics