Mysql8 Replication Is Huge Lag and unable to sync with Master

Hi Team,

I have configured the new slave from the existing Master. Why the replication state is always showing like below. I don’t see any errors in the error log. But slave relay log position is not at all changing for more than 6 hours.

Please help with the solution for the below.

root@gladiator:/home/swdev# mysql --version
mysql Ver 8.0.19 for Linux on x86_64 (MySQL Community Server - GPL)
root@gladiator:/home/swdev# mysql -uroot -p

   SQL_Remaining_Delay: NULL
**Replica_SQL_Running_State: Applying batch of row changes (update)**
       Source_Retry_Count: 86400
              Source_Bind:
  Last_IO_Error_Timestamp:


image

Hi Naresh,

The reason for replication lag can be various which would require some insights:

  1. What restoration method is used while configuring the backup i.e mysqldump or hot backup ?

  2. You need to check statement on that position, to do so execute below statement on master :

    show binlog events in ‘mysql-bin.006740’ from ‘<< from output Exec_source_log_pos>>’;

  3. Looking into the state " Applying batch of row changes" feels like any update/delete is fired on master which is bulky & does full scan.

1 Like

Hi @Naresh9999,
I agree with Jyoti. You need to find out what SQL is being executed at that binlog position. Maybe it’s a 1M+ update or something similar doing many, many updates/deletes? Do you have parallel replication enabled?

1 Like

Thank you @Jyoti_Rajai and @matthewb for the quick response.

Yes, I have checked the binary logs and huge updates on master db.
How we can speed up the replication from master if we have a huge updates from master.

1 Like

Hi @Naresh9999 ,

Strange is it still stuck on same position, it’s been 3 days I guess ?

What is table size have you checked ?

Is parallel replication enabled ? if No, I think now it’s not right time to stop slave and start the parallel replication again. Because it would start doing rollback and it will begin from starting.

Do you have update query, any indexing can be done ?

And one more thing are the resource allocation on master and slave are same ?

1 Like

Hi @Jyoti_Rajai

No I stopped the replication, I am looking for the permanent solution…
We found the table does not have an index, so creating them on master.
Resource on Master and Slave are same.

Table count is around 1M.

Any configuration changes suggest me, please suggest me to configure the parallel replication to avid such a big lags in future?

1 Like

The best solution is to make sure you have indexes with proper configuration. UPDATE will use index for any WHERE clause columns, just like SELECT does.

SET GLOBAL slave_parallel_type = ‘LOGICAL_CLOCK’;
SET GLOBAL slave_parallel_workers = 4; (also update my.cnf to persist)

Make sure InnoDB buffer pools are 80% of OS RAM. Make sure InnoDB read logs are at least 1GB.

Also, try and break up your large batch into smaller batches. This is best practice. Single transactions that update 10M rows are BAD for performance and replication. Instead, run 1000 transactions with less UPDATEs per transaction.

1 Like

Before starting parallel replication, please also add the index on slave server also then only it will process faster. And Hope you made changes what @matthewb has recommended.

1 Like

Hi @matthewb and @Jyoti_Rajai

Thank you so much for the inputs, we are started doing the changes.
I ill let you know once everything is done.
Once again thanks for the help.

2 Likes

Hi @matthewb @Jyoti_Rajai

I have a no idea about GTID based replication, so can you please suggest me on the below.
Which method will be good for replication?

  1. Binlog replication with Mixed binlog format.
  2. Binlog replication with Row binlog format
  3. GTID replication with Row binlog format

Which method is better and safer for parallel replication? If I have huge updates or inserts on Master…

1 Like

Row + GTID

If I have huge updates or inserts on Master

Change the application to do smaller batches. This is best practice. Don’t do large single writes.

1 Like

Hi @matthewb

Thanks for the quick response.
Previously I have tried with BINLOG+ROW based replication with parallel option. So I got a above replication lag issue.
So is the GTID+ROW replication could be better than BINLOG+ROW replication with parallel replication?

and sorry for asking more questions…

One last thing Data consistency, which repl method is good for Data consistency?

1 Like

BINLOG+ROW or GTID+ROW, same thing. All transactions are recorded in binlog files. GTID is just a different way to “point to” a transaction. You could say INSERT INTO … VALUES (1, 2,3) is at “binlog.003212, offset 442344” or “3E11FA47-71CA-11E1-9E33-C80AA9429562:23” Same thing. You will not see any noticeable difference in performance between position-based vs GTID-based. Both versions can do parallel replication with LOGICAL_CLOCK.

One last thing Data consistency

1 Like

Thank you so much @matthewb
I ill work on which you suggested and update you on the same.

1 Like