Strange Seconds_Behind_Master behavior

Hello, i’ve got one master and two replics from this master (all the same versions 5.7.44-48-log), it’s GTID replication. One replica works fine, and the second one has issue.
It’s “Seconds_Behind_Master” value constantly changes: i mean it’s 0 for several ‘show slave status’ command, and then, after several seconds it’s become much more - 3836 for example. Next call show slave status immediatly shows 0 again.
No errors in log-files, “Slave_IO_Running” and “Slave_SQL_Running” always Yes, time on all servers are the same and ntp synced.
In fact - this replica is far beyond master, but why then it almost always show “Seconds_Behind_Master”: 0? And when it’s 0 “Retrieved_Gtid_Set” and “Executed_Gtid_Set” are the same.
From monitoring tool it’s look like this

Anyone has some clue?

Ok, i stoped this strange replica, made xtrabackup from master (as always), transfer files snd start replica again. It’s all takes about 14 hours. Ok, i started replica again and this strange behavior sill there ^(
One seconds ‘show slave status’ shows

Seconds_Behind_Master: 51211
Slave_SQL_Running_State: Reading event from the relay log

another seconds it show

Seconds_Behind_Master: 0
Slave_SQL_Running_State: Slave has read all relay log; waiting for more updates

Don’t have any idea, why is ths

Hi @Deniska80,

What was the replica executing while you saw the increased lag?
Did you see while your lag was increasing did you have exec_master_log_pos moving?
Is the binlog size too large (or larger than usual)?
Check contents of binary log using mysqlbinlog command and see what the replica is trying to execute. ( mysqlbinlog --base64-output=decode-rows -vv --start-position=<Exec_Master_Log_Pos> <Relay_Master_Log_File> )

Ref: How to Read Simplified SHOW REPLICA STATUS Output


What was the replica executing while you saw the increased >lag?
Not sure, what i understand you clearly. It executes commands from relay-bin.log, what it fetch from master. No other load on this server, it’s dedicated DB. By the way - it’s hard to catch moment, when it’s not null. I just make 10 consecutive calls of ‘show slave status’ in 2 seconds and only once i saw ‘Seconds behind master’ defferent from 0.

Did you see while your lag was increasing did you have >exec_master_log_pos moving?
Yes, it’s moving allways

Is the binlog size too large (or larger than usual)?
Bin log on master? No, usually size

see what the replica is trying to execute.
looks like it normally executes commands from master, about 500 op/s

One thing, i noted. May be it’s important. At this time this replica has “Master_Log_File: mysql-bin.016728”, while master allready has mysql-bin.017213. That’s OK, replica behind master. But at the same time replica has only two relay-bin.xxxx files (as i understand one is some kind of index, and other constantly gowing, while reach max_binlog_size). Then this files instantly removes and next two files appears. As i remeber from past, when replica try to reach master, there are a lot of relay-bin.xxxx files, coz fetching binlog is faster, than executes it.

May be that’s the reason? Low network speed?

hmmm! 400+ binlogs behind?! Well then I think you surely need to figureout why do we have this case. May be it is low network… not sure if your replication down for long…

[quote=“kedarpercona, post:5, topic:31349”]
400+ binlogs behind?!
my binlog just 100Mb size, so 400 files is just about 8 hours

main question for me is “why sometimes Seconds behind master is 0”. May be it’s 0 in time, while next binlog not fully fetched from master, due to network perfomance… but it’s strange.

Ok, i iperf network perfomance between this replica and master, results are about 20…40Mbit/s. But speed of growing current relay-bin.log is far slower, it’s transfer one 100Mb file about 1 minute. That’s really strange, i think replica must fetch bin logs on full speed.