MySQL group replication - how to find queries causing replication lag?

We have a MySQL Group Replication cluster with 9 servers, and we are currently experiencing replication lag (i.e high values in performance_schema.replication_group_member_stats.COUNT_TRANSACTIONS_REMOTE_IN_APPLIER_QUEUE for several servers).

How can we determine which queries are causing a set of MySQL servers running Group Replication to have replication lag? We noticed that neither slow queries nor frequent queries are necessarily correlated with lag across servers.
How can we systematically identify which queries are causing the lag? We have noticed that some specific joins we had were causing lag spikes and, once we removed them, the lag reduced. This is, however, not the case for all joins.

Thanks for any help!

1 Like

Hi @aurelien_shz ,
Do you have PMM set up and monitoring all 9 servers? Without something like this, that let’s you correlate query load, cpu/disk/memory, and mysql stats, it will be very difficult to figure out what is causing the issue. Are all 9 servers exactly the same, both hardware and software config?

9 server is quite high and quite unusual. Have you tried running with less, say 5? Do you experience the same issue with less members?

1 Like