Replication lag for more than 10 hours

mamatha · February 4, 2021, 12:07pm

Hi Team,

node1 - primary
node 2 - secondary
node 3 - secondary

Node 1 and node 2 are in same datacenter and always in sync but node 3 is in different datacenter and we see it is always lagging for long hours from primary.

We have also increased the oplog to a quite large about 4TB and in our case lot of data gets loaded and modified every minute.

The memory and load average is also not occupying much and we have baremetals configured and even when we check with network team they say there is no issue between both the datacenters and after 35 hours of lag it is going to recovery state.

We have done initial sync twice → from startup2 it changes to secondary but still i see it is lagging for 2 hours and it continues to lag, I have also tried restarting the node but still lagging is increasing.

Please can anyone help here what else I need to do to make this sync again to primary as this is a production env?

Thanks in advance.

Akira_Kurogane · February 19, 2021, 1:45pm

Hi Mamantha.

It sounds like node 3 has lower capacity, or there is a network bottleneck between it and the other node (whichever it is syncing from. Probably the primary, you can find out easily looking in rs.status()).

If there is no disk usage strain on node 3, especially during the WiredTiger checkpoint that happens each minute, then that would point to it being a network bottleneck.

I haven’t considered CPU as a bottleneck because that’s rare with all the cores modern servers have now. But if node 3 has a low amount, say 4 or less cores, then CPU can be considered too.

Topic		Replies	Views
Getting Similar value of replication lag for all secondary nodes MongoDB	1	323	February 29, 2024
MongoDB replica set member replication lag issue Database Monitoring and Management mongodb	4	684	April 27, 2024
PMM 2.28.0 - MongoDB - Replication lag with delayed nodes PMM 2.x	1	899	May 31, 2022
slave lag on XtraDB Cluster node Percona XtraDB Cluster 5.x	3	893	December 27, 2013
Percona Master-Master lag issues Other MySQL® Questions	3	1007	March 23, 2015

Replication lag for more than 10 hours

Related topics