seconds behind master

Hi.

I’m running Percona MySQL 5.1.54 and my systems are configured in master / master with slaves coming off each.

I have two different databases setup like this across 8 or so EC2 Instances. The EC2 instances are m2.xlarge. The data dir is on a single EBS 400GB disk formatted XFS with defaults and noatime.

My problem is, after getting them in master master yesterday when doing a ‘SHOW SLAVE STATUS\G’ the Seconds_Behind_Master value fluctuates between a big value and 0.

The big value seems to be increasing. Yesterday it would fluctuate from 0 to 500. Today it is now going between 0 and 60k. This is happening on all of the instances, including slaves.

SELECT curtime(); is sync’d across all instances.

Does anyone have any ideas? I see nothing in the error log.

I searched and found an article saying to use mk-heartbeat instead of looking at seconds_behind_master, but couldn’t find much documentation on mk-heartbeat or an example of an implementation of it.

Seconds behind master value just doesn’t make any sense to me, so I’m assuming it’s incorrect, but I worry that it may be a symptom of something else…

If you have a master/master chain of replications the seconds_behind_master can fluctuate due to that it calculates this value as the difference between the last record executed by the SQL_THREAD and the first execution time of the last received entry by the IO_THREAD and since you are running master/master in a chain, this entry could have been replicated anywhere from one to seven times which would be noticeable on slow connections.

On top of this you also have the time difference calculation that is only calculated once and then assumed to be constant.

Read more about the details under seconds_behind_master on the show slave status page.

And I don’t know if you have found this page about mk-heartbeat:
[URL=“http://www.maatkit.org/doc/mk-heartbeat.html”]http://www.maatkit.org/doc/mk-heartbeat.html[/URL]

I know this “behaviour” from a replication over a WAN-Link ( Germany → US-Westcoast ).
Do you maybe have bad network latency between different EC2 availability zones ?