I’ve two MySQL servers running version 5.7.24-27 in a master-master replication configuration (all the traffic goes to the “primary” server and the “secondary” node is only here for backup).
To try to improve the replication, I decided to enable “slave_parallel_workers” to 4 and slave_parallel_type to “LOGICAL_CLOCK”. Since then, I observe, sometimes, a replication lag on the primary server (and not the backup master) to a high value (30 minutes, 1 hour, …) and just after the value drops to 0 again, nothing on the secondary node.
I found this article (https://www.percona.com/blog/2013/09/16/possible-reasons-when-mysql-replication-lag-is-flapping-between-0-and-xxxxx/). I’m very close to the first case (even if we didn’t change the server-id), I tried to reset the replication without any successes.
This behavior was not here before activating the slave_parallel_workers. We can have 1,2 or 3 hours between two spikes or only 10 minutes. I can’t find a correlation between a spike and others MySQL metrics.
Does someone already seen something like that?