we’re facing performance issues within master-master gtid replication. Secondary master(read only) is trying to catch Primary master with no luck. Write-bound workload is producing around 1GB of binlog per 10 minutes. Secondary master is having a lot of “System lock” in “Slave_SQL_Running_State”, according our monitoring 3/4 of time. More precisely from performance_schema.threads - processlist_command = “Connect” is very frequently connected with processlist_state = “System lock” within slave_sql thread. Slow queries are pointing to slave thread and also trigger user (see picture in attachment). Statistics are fresh with 300 sample pages.
We’re suspecting storage layer, although these i/o should be completely fine with current enterprise sata ssd.
Any advices highly appreciated!
Percona Server version: 5.7.32-35 with RBR, Debian Buster.
Hi, have you tried any of the parallel replication options? also other common tricks to make replicas run faster are sync_binlog=0, innodb_flush_log_at_trx_commit=2, and master_info_repository/relay_log_info_repository=FILE.
I tried recommended setup ^^ , no change. Also NVMe disks were added into server with dedicated logs location with no change. We also changed HW completely into Epyc platform, that should be really fastest we are having. Much better in all aspects but busy hours are resulting in a lag “hill”, peaking around 2000s.
For a next step we asked developers if there’s a chance to change UPDATEs rates, as I feel we are limited by some component of replication.
Anyway I’m marking your answer as solution, as the root cause is probably within application design/queries themselves/replication speed. Thank you for your time!