Replication generates huge disk write IO

Noticed that MySQL replication is generating enormous disk write activity

iotop 10-20Mb/s

AWS volume monitor - 700-800 IOPS (while only 150 IOPS on master)

And it lasts for hours. Historical graphs show that this repeats every day for 8-10 hours. (I think until the moment when catch up is finished)

Replication processes in the state: “Reading event from the relay log” and “Waiting for master to send event”
Replication is lagging behind master, it is running but VERY slow.

There are no active processes in MySQL that can generate this activity. No backups at this time. Replication activity is constant but not so high to generate this problem.

If I stop slave SQL thread - activity drops to zero immediately. After starting slave it increases again. Restarting mysqld doesn’t help.

What parameters/configs/metrics sould I check?

Any help would be appreciated.

Percona server 5.5 on CentOS
Replicated database about 500Gb, about 300 tables.

Hi, I would like to suggest you to use pt-stalk (Percona utility for collecting data about MySQL when problem occur including trace and tcpdump ). You can get more information here that how to use it.

Generally you have to specify some trigger condition so it can start to collect data. i.e with some function, variable values, threshold etc. If you want to collect information right now and without waiting for any trigger occur then you can simply run pt-stalk --no-stalk and check the results files to figure out where is the problem.

Thank you.
What impact on the system will it have? Because any additional workload can crash the system.


It will not impact that much on system but I would suggest to read documentation properly with all options and test is on stage server before running on prod.

I met the same situation.and more ,where the slave process start,the slave will eat up all the memory,all the swap space , and finally ,the mysqld process run outofmemory,and the kernel killed the mysqld worker process ,then ,mysql restart.

Could it be the case ?

no ,it’s not the same case

If you are using row-based replication and tables missing primary key/unique key then you probably hitting this bug
Try to find out If any of database tables have missing PK/UK And add an PK auto-inc If required.