wsrep: in pre-commit stage hanging cluster

OK, so we finally got our cluster replicated.

However, we are now suffering from constant application hangs due to the connections staying in wsrep: in pre-commit stage for up to 2 minutes.

Our hardware is all brand new with no load; 24GB RAM, Quad Core Xeon, SSD (ZFS mirror), 10Gbit Network. There is extremely low latency between the 3 nodes in our cluster, we performed 330GB SST in 20 minutes.

I’ve tried tweaking flow control settings, and file limits are all as recommended (or higher).

Running out of options to try.

If I tear down 2 nodes leaving a standalone cluster node then performance is fine. As soon as I add a 2nd or 3rd node, the above state is realised.

Any help would be much appreciated.

PXC version 5.7.14-8-57
Galera 3.17(r447d194)

We’re currently only writing to a single master. We were originally writing to multiple masters but were experiencing transaction locks and hangs, so we reverted to writing to a single master to try to debug the issue.

  1. Do you have long running transaction.

  2. Do you have DML workload intermixed with DDL workload (CREATE/DROP/TRUNCATE/ALTER ???)

  3. Do you have MyISAM tables as part of workload.

  1. Perhaps
  2. No
  3. No

I wonder what if you try traditional MySQL replication - do you observe async slave lag on secondary nodes with same workload?
wsrep: in pre-commit stagestate usually means the secondary nodes cannot keep up with applying replication stream. One of the reasons may be updates to tables without primary key, so worth checking.