A Replication lag on Percona Server 8.0.29-21

Hi,
Our Infrastructure has an InnoDB ClusterSet with two InnoDB clusters. Our ClusterSet has total 10 dedicated servers with 128GB Memory and 40/64 core CPU.

What should I do about it ?

Thanks!
my.cnf

# Tuning Mysql 
skip-external-locking
# skip-name-resolve = ON

key_buffer_size = 128M
max_allowed_packet      = 64M
max_connections  = 7500
back_log = 4096
open_files_limit = 65535
innodb_open_files = 20240
max_connect_errors = 3072
table_open_cache = 4096
table_definition_cache = 4096

innodb_dedicated_server = ON
innodb_buffer_pool_instances = 64
innodb_read_io_threads = 32
innodb_write_io_threads = 32
innodb_thread_concurrency = 0
innodb_flush_log_at_trx_commit = 0
performance_schema = ON
innodb-file-per-table = 1
innodb_io_capacity=20000
innodb_table_locks = 0
innodb_lock_wait_timeout = 0
innodb_deadlock_detect = 0
innodb_rollback_on_timeout=1
innodb_flush_log_at_trx_commit=1

join_buffer_size = 5500144000
read_rnd_buffer_size = 5500144000
sort_buffer_size = 5500144000

sql-mode=""
transaction_isolation="READ-COMMITTED"

# Disabling symbolic-links is recommended to prevent assorted security risks
symbolic-links=0

#
## Disable other storage engines
##
disabled_storage_engines="MyISAM,BLACKHOLE,FEDERATED,ARCHIVE,MEMORY"

server_id                                                   = XXX
bind-address                                                = 0.0.0.0
gtid_mode                                                   = ON
enforce_gtid_consistency                                    = ON
binlog_checksum                                             = NONE           # Not needed from 8.0.21
master_info_repository                                      = TABLE
relay_log_info_repository                                   = TABLE
log_slave_updates                                           = ON
log_bin                                                     = /mnt/HDD2/mysql/mysql-bin.log
log_bin_index                                               = /mnt/HDD2/mysql/mysql-bin.log.index
relay_log                                                   = /mnt/HDD2/mysql/mysql-relay-bin
relay_log_index                                             = /mnt/HDD2/mysql/mysql-relay-bin.index
expire_logs_days                                            = 7
max_binlog_size                                             = 400M
log_replica_updates                                         = 1
auto-increment-increment                                    = 2
auto-increment-offset                                       = 1
binlog_format                                               = ROW
transaction_write_set_extraction                            = XXHASH64
log-slave-updates                                           = 1
binlog-transaction-dependency-tracking                      = WRITESET
1 Like

You have this parameter twice. The 2nd overrides the first. Remove the 2nd one.

All 3 *_buffer_size are set to 5GB. That is insane. Reduce these to defaults or 2x the default value.

Add sync_binlog=1000 to your replicas.

If you still have lag, then you need to determine which tables are causing the lag.

1 Like

Thanks @matthewb

We pretty much solved the issue after changing innodb_flush_log_at_trx_commit and some other parameters

Our Percona InnoDB ClusterSet is handling 91.23K QPS with 9.72 K QPS on primary cluster’s primary server at pick load.

our new configurations are as follow:
my.cnf

#
# The Percona Server 8.0 configuration file.
#
# For explanations see
# http://dev.mysql.com/doc/mysql/en/server-system-variables.html

[mysqld]
pid-file        = /var/run/mysqld/mysqld.pid
socket          = /var/run/mysqld/mysqld.sock
datadir         = /var/lib/mysql
log-error       = /var/log/mysql/error.log

# Tuning Mysql 
skip-external-locking
# skip-name-resolve = ON

key_buffer_size = 64M
max_allowed_packet      = 128M
max_connections  = 7500
back_log = 8192
open_files_limit = 65535
innodb_open_files = 65535
max_connect_errors = 3072
table_open_cache = 16000
table_definition_cache = 16000

innodb_dedicated_server = ON
innodb_buffer_pool_instances = 64
innodb_read_io_threads = 64
innodb_write_io_threads = 64
innodb_thread_concurrency = 40
performance_schema = ON
innodb-file-per-table = 1
innodb_io_capacity=60000
innodb_table_locks = 0
innodb_lock_wait_timeout = 1
innodb_deadlock_detect = 0
innodb_rollback_on_timeout=0
innodb_flush_log_at_trx_commit=2

join_buffer_size = 100M
read_rnd_buffer_size = 100M
sort_buffer_size = 100M

sql-mode=""
transaction_isolation="READ-COMMITTED"
# Disabling symbolic-links is recommended to prevent assorted security risks
symbolic-links=0

# New fix
sync_binlog = 1000
innodb_flush_log_at_timeout = 1800
replica_parallel_workers = 40
replica_parallel_type = "LOGICAL_CLOCK"

#
## Disable other storage engines
##
disabled_storage_engines="MyISAM,BLACKHOLE,FEDERATED,ARCHIVE,MEMORY"

server_id                                                   = 5003
bind-address                                                = 0.0.0.0
gtid_mode                                                   = ON
enforce_gtid_consistency                                    = ON
binlog_checksum                                             = NONE           # Not needed from 8.0.21
master_info_repository                                      = TABLE
relay_log_info_repository                                   = TABLE
log_replica_updates                                         = ON
log_bin                                                     = /var/log/mysql/mysql-bin.log
log_bin_index                                               = /var/log/mysql/mysql-bin.log.index
relay_log                                                   = /var/log/mysql/mysql-relay-bin
relay_log_index                                             =/var/log/mysql/mysql-relay-bin.index
binlog_expire_logs_seconds                                  = 36000
max_binlog_size                                             = 400M
log_replica_updates                                         = 1
auto-increment-increment                                    = 2
auto-increment-offset                                       = 1
binlog_format                                               = ROW
transaction_write_set_extraction                            = XXHASH64
binlog-transaction-dependency-tracking                      = WRITESET

sysctl.conf

fs.aio-max-nr = 1048576
vm.swappiness = 1

still we’re facing occasional replication delay.

1 Like

still we’re facing occasional replication delay.

Then you need to use mysqldbinlog and determine if your lag is the result of bad singular queries or large transactions. If large transaction, try to reduce their size and have smaller more frequent txns.

1 Like

Let me add that this parameter is crazy high:
innodb_io_capacity=60000

Normally you’d like to keep it in default and just adjust innodb_io_capacity_max if necessary (defaults works very well in most of cases) as this parameter can add a lot of IO overhead.

1 Like