Hi we have put together a percona cluster with Galera and it works great from 5:05 am in the morning to 4:50am the next morning but everyday at 5 am we see the whole system basically freeze for 5 minutes. The only realy change I see is that the first line of the process table says Queueing Master rather then the normal all slaves have read all. Also all the queries that show yellow then red beofre it starts up again say wsrep in pre-commit.
Im not sure what causing this. My guess is it has to do with replication and a daily quorem vote but thats just a guess. Anyone have any Ideas.
It may be some large transaction scheduled at this time, did you check for possible cron jobs? Output from
will be helpful to diagnose what’s happening.
This sounds similar to an issue that I’m currently attempting to resolve - disk bound operations on even inactive cluster nodes (by which I mean they’re in the cluster but we’re not sending any traffic to them) cause active cluster nodes to wait on wsrep operations.
Are you performing backups or other disk-intensive operations on any cluster node at that time, even if you’re not sending traffic to that node?