Hello all!
At work, we are dealing with a situation where one of our replicas is out of sync with its primary on a couple of key databases.
We’ve worked up a script using pt-table-sync to attempt to bring the replica back into sync. However, when running it (in read-only mode using --print, for now), replication lag creeps steadily upwards. In our particular case, replication lag itself can cause “soft data” loss, due to the way the data is analysed as it accumulates. As a result, we have to keep it to a bare minimum (under ~1 minute).
Using --wait reduces the speed at which replication lag accumulates, but eventually lag will spike up significantly and we have to stop the replication script.
I’ve noticed that pt-table-sync lacks the --check-slave-lag parameter that (e.g.) pt-table-checksum has. I’m considering attempting to add similar “lag checking” functionality to pt-table-sync, but I’m wondering if there might be a reason why this functionality has not yet been added to pt-table-sync. Is there a reason why pt-table-sync would not work properly with lag-checking functionality similar to pt-table-checksum?