MySQL8 replication lagging

Hi all,

we upgraded to MySQL8, and we started seeing replication lagging. Here are some of the configs:
binlog-format = MIXED
relay_log_info_repository = TABLE
relay_log_recovery = ON
log_replica_updates
expire_logs_days = 30
read_only = 1

innodb_buffer_pool_size = 360G
innodb_buffer_pool_instances = 32
innodb_buffer_pool_chunk_size = 256M
innodb_page_cleaners = 16
innodb_log_file_size = 4G
innodb_log_files_in_group = 32
innodb_purge_threads = 8
innodb_read_io_threads = 16
innodb_write_io_threads = 16
innodb_log_buffer_size = 64M
innodb_io_capacity = 2000
innodb_open_files = 256k
innodb_flush_method = O_DIRECT
innodb_numa_interleave = 1
skip-innodb-adaptive-hash-index

##########################################
read-only = ON
[xtrabackup]
innodb_read_io_threads = 2
innodb_write_io_threads = 2
##########################################
[mysqldump]
quick
max_allowed_packet = 512M

Any help is appreciated.
Thanks

There’s not a lot of details in your post. How much lag? Just 1-2s? Or ever increasing lag then suddenly catch up? Use a monitoring tool like Percona Monitoring and Management to visualize your lag issues. What type of queries are you running across replication? Mostly INSERT? UPDATE? DELETE? Any temporary tables (which are replicated in MIXED)?

This can be an issue as MIXED will default to STATEMENT unless otherwise needed. You should always be in ROW, especially now that STATEMENT is deprecated.

Please have a read https://www.percona.com/blog/give-love-to-your-ssds-reduce-innodb_io_capacity_max/

Curious as to why you modified this parameter. What benefit are you seeing with a larger chunk size?

For helping with replica lag, you can set sync_binlog=0 and innodb_flush_log_at_trx_commit=0 on the replica.

Hi @matthewb :

Thank you so much for replying ! After upgrade to MySQL8, our DBs are lagging along with traffic, the most busy DB lags the most, and it could lag up to 2 hours or more in some days, it is a mixed combination of INSERTs, UPDATEs and DELETEs.

I do not remember what were the reasons we override the below 2 parameters, but do you have any recommendations?
innodb_io_capacity = 2000
innodb_buffer_pool_chunk_size = 256M

We will definitely try
sync_binlog=0
innodb_flush_log_at_trx_commit=0

Thank you so much

Hi @matthewb ,

we made changes to
innodb_io_capacity = 100
sync_binlog=0
innodb_flush_log_at_trx_commit=0

But it did not seem to make any improvement. Anything else should I try?

You made those changes on the replica, correct? And you changed the source to use ROW? And you have replica_parallel_workers >= 8 and have LOGICAL_CLOCK as the parallel type?

Hi @matthewb ,

we made below changes on the replica/slave:

  • innodb_io_capacity = 100
  • sync_binlog=0
  • innodb_flush_log_at_trx_commit=0 (reduce IO)

We have NOT made any changes to Primary DB yet, this is PROD we need approval.

I will confirm replica_parallel_workers >= 8 ,

But so far no improvement.

Thanks

If 99% of your replication is STATEMENT (which is possible because you used MIXED setting), then you won’t see much improvement on the replica.