We are trying to get on a 5 node Percona cluster. We are replicating from our current environment which is 2 masters and 2 slaves. Of course we’re reading the binary log from one master to a node in XtraDB cluster. We are running only a very light demo instance on the cluster currently. We are finding periods of time where the async replication causes a flow control state in XtraDB Cluster.
My theory is that there is a stored procedure being run that would be one transaction which modifies a large amount of data causing the flow control to back up. However, I’m having trouble proving that. I only know about the event after the event. I am looking for a needle in a hay farm. The stored procedure is my latest guess as to what is going on.
I am looking for advice on how to precisely capture the moment and transactions which are causing the flow control state, and fix that in my application.
I am guessing that any other writes, no matter how quickly they came in from async replication would be absorbed by galera as nothing.
Any advice on how to pinpoint the transaction/transactions causing the Flow Control state would be appreciated. We’ve seen up to 20 minute halt on the server due to this.