Rollback on group replication, no error logged

Hello

mysql version: 5.7.42

We have a simple python application which is working fine in a one-node setup.
To have high-availability we’ve setup a 3 nodes group replication in multi-primary mode, with only one mysql server used for now for doing the queries.
On a regular basis, since then the application receives rollbacks with this message

pymysql.err.OperationalError: (3101, 'Plugin instructed the server to rollback the current transaction.')

We cannot find in the various log files (general, error or slow) a relevant information stating why the operation fails on group replication.

Is there place we could find information ?

This isn’t a recommended mode. Are you able to repeat your issue in single-primary?

I would have also recommended you enable paxos_single_leader, but you have not upgraded.

Keep in mind that any bug you discover from this in GR won’t get fixed since you are on a dead MySQL version. You should upgrade first to the latest 8.0 and see if your issue remains.

Hi @matthewb

This isn’t a recommended mode.

Sorry but I did not see this, perhaps I overlooked some document, do you have a reference ? We take this decision based on https://dev.mysql.com/doc/refman/5.7/en/group-replication-deploying-instances.html

We tried to run the group_replication with only one node active, and we get the same result.

Upgrade to MySQL 8.0 is on the way, this is just we wanted to provide more availability before migrating,

edit: We can replicate on a dev environment, and switching to single primary mode fixes the issue but this is not our target implementation, in case of failure we want the switchover to happen automatically, so still hearing for advice to debug this.

Best.

It’s not in the docs. It’s a best-practices recommendation. When in single primary, only 1 server accepts writes. Coordination and trx certification are more efficient as only one node processes these requests. In multi-primary mode, it is more complicated and performance can suffer.

I suggest you re-read over the Group Replication documentation. I think you might have an incorrect understanding of the difference between single-primary and multi-primary.

Neither mode have anything do to with automatic switchover. If you are in single primary mode, and that node fails, one of the existing secondaries is automatically chosen as the new primary. This new primary disables the read-only flag and begins to accept writes. This is core design of GR.

I’ll assume you are using some sort of router (ie: proxysql or mysql router) that is “group replication aware” meaning that when a new primary is selected, the router automatically updates its rules and your application is none the wiser. This is also core to GR (or any ‘cluster’) architecture design.