Percona Master-Master lag issues

I have 2 servers with master-master replication using Percona MySQL 5.6 and I am having replication lag in the second server: db-02, however db-01 never had a lag situation.

What could be causing this lag? It rises sometimes up to >1000 secs of lag and sunddenly it fells to 0 secs. I have checked it with Percona’s pt-heartbeat tool and is real lag.

This are the server configurations, both have 8 cores, 32GB RAM and SSD disks:

DB-01 (Xeon E3-1245 V2 @ 3.40GHz)


[mysqld]
server-id = 1
log_bin = /var/lib/mysql/mysql-bin.log
binlog-ignore-db = mysql
binlog-ignore-db = test
datadir = /var/lib/mysql

expire-logs-days = 10

auto_increment_increment= 2
auto_increment_offset = 1

max_connections=3000

query_cache_size=0
query_cache_type=0

innodb_file_per_table
innodb_file_format=barracuda
innodb_flush_method=O_DIRECT
innodb_log_file_size=128M
innodb_thread_concurrency=0

innodb_flush_log_at_trx_commit=1
sync_binlog=1

# ~90% of memory
innodb_buffer_pool_size=27000M

# 4 * # of CPU cores
innodb_read_io_threads=32
innodb_write_io_threads=32

innodb_io_capacity = 5000

DB-02 (i7-4770 CPU @ 3.40GHz)

// same that db-01 but with this options different:
server-id = 2
auto_increment_offset = 2

The output of SHOW SLAVE STATUS on each server:

DB-01


Slave_IO_State: Waiting for master to send event
Master_Host: db-02
Master_Log_File: mysql-bin.000022
Read_Master_Log_Pos: 3348962
Relay_Log_File: db-01-relay-bin.000042
Relay_Log_Pos: 3349125
Relay_Master_Log_File: mysql-bin.000022
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
Last_Errno: 0
Last_Error:
Skip_Counter: 0
Exec_Master_Log_Pos: 3348962
Relay_Log_Space: 3349462
Until_Condition: None
Until_Log_File:
Until_Log_Pos: 0
Seconds_Behind_Master: 0
Last_IO_Errno: 0
Last_IO_Error:
Last_SQL_Errno: 0
Last_SQL_Error:
Replicate_Ignore_Server_Ids:
Master_Server_Id: 2
Master_Info_File: /var/lib/mysql/master.info
SQL_Delay: 0
SQL_Remaining_Delay: NULL
Slave_SQL_Running_State: Slave has read all relay log; waiting for the slave I/O thread to update it
Master_Retry_Count: 86400
Last_IO_Error_Timestamp:
Last_SQL_Error_Timestamp:

DB-02


Slave_IO_State: Waiting for master to send event
Master_Host: db-01
Master_Log_File: mysql-bin.000019
Read_Master_Log_Pos: 38905060
Relay_Log_File: db-02-relay-bin.000048
Relay_Log_Pos: 17834447
Relay_Master_Log_File: mysql-bin.000019
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
Last_Errno: 0
Last_Error:
Skip_Counter: 0
Exec_Master_Log_Pos: 28196753
Relay_Log_Space: 28542928
Until_Condition: None
Until_Log_File:
Until_Log_Pos: 0
Seconds_Behind_Master: 310
Last_IO_Errno: 0
Last_IO_Error:
Last_SQL_Errno: 0
Last_SQL_Error:
Replicate_Ignore_Server_Ids:
Master_Server_Id: 1
Master_Info_File: /var/lib/mysql/master.info
SQL_Delay: 0
SQL_Remaining_Delay: NULL
Slave_SQL_Running_State: Slave has read all relay log; waiting for the slave I/O thread to update it
Master_Retry_Count: 86400
Last_IO_Error_Timestamp:
Last_SQL_Error_Timestamp:

Hi blacksoul;

Likely the lag is being caused by long running DML statements on the master (db-01). For instance a large UPDATE runs on db-01, and then once it finishes and flows through to db-02 the slave lag jumps up once it realizes it has a lot of work to do to catch up. If this is the case, whatever is doing the long running DML probably only hits db01 for whatever reason (might be hard coded there, might always get sent there by whatever load balancing logic you have).

To get around this, generally you would want to try and split up these long running DMLs, i.e. instead of deleting 1 million rows, delete the rows in small chunks so all of it does not flow through to the slave at once.

-Scott

Hi Scott, thanks for your answer. So, I could know that is caused by long running DML statements just logging slow queries on DB-01 and studying them. Right?

Hi blacksoul;

Yeah you can see long running DML in the slow query log as long as they meet the long query time requirement (10 seconds by default). You may also want to enable “log-slow-admin-statements” in-case it happens to be getting caused by one of those.

http://dev.mysql.com/doc/refman/5.5/en/server-options.html#option_mysqld_log-slow-admin-statements

-Scott