We have a MySQL Group Replication cluster with 9 servers, and we are currently experiencing replication lag (i.e high values in performance_schema.replication_group_member_stats.COUNT_TRANSACTIONS_REMOTE_IN_APPLIER_QUEUE for several servers).
How can we determine which queries are causing a set of MySQL servers running Group Replication to have replication lag? We noticed that neither slow queries nor frequent queries are necessarily correlated with lag across servers.
How can we systematically identify which queries are causing the lag? We have noticed that some specific joins we had were causing lag spikes and, once we removed them, the lag reduced. This is, however, not the case for all joins.
Thanks for any help!