Replication lag on MySQL server after increasing the number of inserts on the master

spirit1984 · July 12, 2018, 3:41am

I’ve ran recently to a problem. We have a MySQL 5.7 Innodb replica with RBR replication (8 slave parallel workers). Now, until this Monday, our master executed 2500 inserts per second (in average), and the replica seemed to be fine with that. But recently the number of inserts went to 2800 inserts per second, and the replica does not seem to catch on (at evenings maybe). I checked the slave status. It seems like the IO thread seems to work just fine (there is no problem with getting events from the master), but the SQL threads don’t run that fast. So the question is - how can I improve that - increase of parallel workers does not seem to help at all.

By the way, here is the result of sql show status:

Slave_IO_State: Waiting for master to send event
Master_User: repl
Master_Port: 3306
Connect_Retry: 60
Master_Log_File: mysql-bin.001634
Read_Master_Log_Pos: 268027451
Relay_Log_File: mysql-relay-bin.004672
Relay_Log_Pos: 386106741
Relay_Master_Log_File: mysql-bin.001632
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
Replicate_Do_DB:
Replicate_Ignore_DB:
Replicate_Do_Table:
Replicate_Ignore_Table:
Replicate_Wild_Ignore_Table:
Last_Errno: 0
Last_Error:
Skip_Counter: 0
Exec_Master_Log_Pos: 386106528
Relay_Log_Space: 2415514767
Until_Condition: None
Until_Log_File:
Until_Log_Pos: 0
Master_SSL_Allowed: No
Master_SSL_CA_File:
Master_SSL_CA_Path:
Master_SSL_Cert:
Master_SSL_Cipher:
Master_SSL_Key:
Seconds_Behind_Master: 1688
Master_SSL_Verify_Server_Cert: No
Last_IO_Errno: 0
Last_IO_Error:
Last_SQL_Errno: 0
Last_SQL_Error:
Replicate_Ignore_Server_Ids:
Master_Server_Id: 75
Master_UUID: 4db1d460-6b9b-11e8-a372-005056956c40
Master_Info_File: /opt/mysql-data/master.info
SQL_Delay: 0
SQL_Remaining_Delay:
Slave_SQL_Running_State: Waiting for dependent transaction to commit
Master_Retry_Count: 86400
Master_Bind:
Last_IO_Error_Timestamp:
Last_SQL_Error_Timestamp:

spirit1984 · July 12, 2018, 4:23am

I have a MySQL 5.7 innodb replica with row based replication. I’ve noticed something curious. The master has about 3000 inserts per second, and the replica seems to catch up fine with that. But if I run a long-time query (let’s say for a minute or so) scanning the big table with 300 million rows, the replica starts to lag. I’ve checked the slave status, the IO thread seems to be reading from master just fine, but the slave sql thread seems not to be doing so well. Not sure why is that. Now zabbix shows me, that my select query seems to be causing high disk utilization (since it needs parts of the table that are on disk), I guess that could slow down the reading of relay log file or applying the transactions. Not sure what I have to tune here in order to get rid of this - how come a single select query causes such a heavy replication log (the number of inserts on the replica falls down to 1500 inserts per second instead of 3000).

spirit1984 · July 17, 2018, 6:26am

I finally found the solution myself. First, I increased the number of slave_parallel_workers to 4, but that gave me nothing (even with LOGICAL_CLOCK), because the transactions were highly dependent. However, it turned out, that after I increased on master binlog_group_commit_sync_delay to 10000 (that is, 10 milliseconds), the lag disappeared. This setting is very important, since it is the setting that actually allows the replication servers to execute something in parallel.

lorraine.pocklington · July 17, 2018, 6:54am

Hi there, thank your for posting your solution it may well help someone else. Sorry we didn’t get to you more quickly but really pleased that you found an answer. I’ll highlight your solution to the team here too.

spirit1984 · July 17, 2018, 8:02am

I actually suggest two things. One - someone from the mysql community should really fix the documentation for slave_parallel_workers giving at least some hint for binlog_group_commit_sync_delay. Second - I would like to write something in the blog about our experience, since it was a really curious one.

lorraine.pocklington · July 17, 2018, 8:22am

Well I would love that, it could be very suitable for our new Community Blog [URL]https://www.percona.com/community-blog/[/URL]
Could you email me? lorraine.pocklington@percona.com reaches me. Thank you!!

Topic		Replies	Views
slow slave IO_THREAD Percona Server for MySQL 5.7	1	920	June 25, 2018
Why MySQL 5.7 replication is verry slow on my new server? Other MySQL® Questions	9	3503	February 8, 2021
replication lag Other MySQL® Questions	1	449	October 26, 2007
MySQL replication slave parallel workers only work 1 worker MySQL & MariaDB	19	3852	June 22, 2024
Sporadic MySQL lag with master-master replication and slave_parallel_workers Percona Server for MySQL 5.7	0	642	January 29, 2019

Replication lag on MySQL server after increasing the number of inserts on the master

Related topics