We have a client who is running three, three node PXC clusters (nine total nodes) in OpenShift using operator version 8.0.29-21.1. In this configuration, the primary cluster is replicating to the other two clusters simultaneously. Similar to the following diagram below.
An alert was received this morning that replication was lagging behind the primary on one of the replicas. I went to the replica in question and executed the show replica status command and noted the Replica_SQL_Running_State has a value of “Waiting for dependent transaction to commit” (see below).
mysql> show replica status\G
*************************** 1. row ***************************
Replica_IO_State: Waiting for source to send event
Replica_SQL_Running_State: Waiting for dependent transaction to commit
All nodes in all clusters are using the same configuration.
I checked the replica on the other cluster and it’s not having any replication lag issues. I then checked the replica_parallel_type and replica_parallel_workers variables and noted the parallel workers variable is set to the default value of 4.
This replica has been like this for a while now and it’s behaving like it’s in a deadlock situation. Therefore, I’m trying to determine how to fix this without losing any data.