PXC8.0.37 Very high thread activity during replication

Hello!

There is a cluster 8.0.37 of 3 nodes:

Server version: 8.0.37-29.1 Percona XtraDB Cluster (GPL), Release rel29, Revision d29a325, WSREP version 26.1.4.3

In front of it is ProxySQL with balancing rules: read/write.
Write always on 1 node, read from 2 nodes.

PXC bare-metal servers:

OS: Ubuntu 22.04.5 LTS
CPU: Intel W-2255 (10x3.7 GHz HT)
RAM: 256 GB — 4 × 64 GB DDR4 ECC Reg
Disk:
2 × 480 GB SSD SATA Enterprise - for OS
2000 GB SSD NVMe M.2 - for database

PXC mysqld.cnf:

Summary
# Ansible managed

#check_point
# Template my.cnf for PXC
# Edit to your requirements.
[client]
socket                          = /var/run/mysqld/mysqld.sock

[mysqld]
datadir                         = /nvme/mysql
log-error                       = /var/log/mysql/error.log
pid-file                        = /var/run/mysqld/mysqld.pid
server-id                       = 22
socket                          = /var/run/mysqld/mysqld.sock

authentication_policy           = mysql_native_password # mysql_native_password - deprecated

# Binary log expiration period is 604800 seconds, which equals 7 days
binlog_expire_logs_seconds      = 28800

######## wsrep ###############
# Path to Galera library
wsrep_provider                  = /usr/lib/libgalera_smm.so

# Cluster connection URL contains IPs of nodes
#If no IP is found, this implies that a new cluster needs to be created,
#in order to do that you need to bootstrap this node
wsrep_cluster_address           = gcomm://X.X.X.X,X.X.X,X.X.X.X
# In order for Galera to work correctly binlog format should be ROW
# binlog_format                   = ROW # deprecated

# Slave thread to use
wsrep_applier_threads           = 16

wsrep_log_conflicts

# This changes how InnoDB autoincrement locks are managed and is a requirement for Galera
innodb_autoinc_lock_mode        = 2

# Node IP address
wsrep_node_address              = X.X.X.X
# Cluster name
wsrep_cluster_name              = pxc-cluster

#If wsrep_node_name is not specified,  then system hostname will be used
#wsrep_node_name=pxc-cluster-node-1

#pxc_strict_mode allowed values: DISABLED,PERMISSIVE,ENFORCING,MASTER
pxc_strict_mode                 = ENFORCING

# SST method
wsrep_sst_method                = xtrabackup-v2
#encrypt
pxc-encrypt-cluster-traffic     = OFF

pxc_maint_mode                  = DISABLED
pxc_maint_transition_period     = 10
wsrep_applier_FK_checks         = ON
wsrep_applier_UK_checks         = OFF
wsrep_auto_increment_control    = ON
wsrep_certify_nonPK             = ON
wsrep_debug                     = NONE
wsrep_dirty_reads               = OFF
wsrep_load_data_splitting       = ON
wsrep_max_ws_rows               = 800000
wsrep_max_ws_size               = 2147483648
wsrep_provider_options          = "pc.weight=1;gmcast.segment=1;socket.ssl_compression=no;gcache.keep_pages_size=4;gcache.page_size=32M;gcache.size=1024M;evs.send_window=512;evs.user_send_window=512;evs.use_aggregate = true;gcs.fc_factor = 0.9;gcs.fc_limit = 5000;gcs.fc_single_primary = no;gcs.max_packet_size = 131072;gcs.fc_debug = 0;evs.inactive_check_period = PT1S;evs.inactive_timeout = PT20S;evs.suspect_timeout = PT5S;evs.stats_report_period = PT1M;"
wsrep_recover                   = OFF
wsrep_replicate_myisam          = OFF
wsrep_restart_replica           = OFF
wsrep_retry_autocommit          = 3
wsrep_sst_receive_address       = AUTO
wsrep_sync_wait                 = 0

##################INNODB and other options#####################
connect_timeout                 = 60
group_concat_max_len            = 2048
innodb_adaptive_flushing_lwm    = 20
innodb_adaptive_hash_index      = OFF
innodb_buffer_pool_instances    = 16
innodb_buffer_pool_size         = 214748364800
innodb_flush_log_at_trx_commit  = 2
innodb_flush_method             = O_DIRECT
innodb_io_capacity              = 2000
innodb_io_capacity_max          = 5000
innodb_print_all_deadlocks      = OFF
innodb_redo_log_capacity        = 5G # innodb_log_file_size deprecated
interactive_timeout             = 1200
log_error_suppression_list      = MY-013360
log_error_verbosity             = 3
long_query_time                 = 1
max_allowed_packet              = 512M
max_connect_errors              = 99999999999999
max_connections                 = 3000
max_prepared_stmt_count         = 131072
range_optimizer_max_mem_size    = 44040191
skip_name_resolve               = 1
slow_query_log                  = ON
slow_query_log_file             = /var/log/mysql/mysql-slow.log
sql_mode                        = 'NO_ENGINE_SUBSTITUTION'
sync_binlog                     = 2
table_open_cache                = 8192
thread_cache_size               = 256
userstat                        = ON
wait_timeout                    = 1200

[sst]
streamfmt                       = xbstream
#compressor='zstd -1 -T2'
#decompressor='zstd -d -T2'

[xtrabackup]
parallel                        = 16
rebuild-threads                 = 16

There is a problem when data is written/modified on the writing node.

As a result, a strong burst of connections is visible on ProxySQL:


Which is not typical for us. The traffic has increased on the reading nodes, these are 2 nodes. Due to the fact that the load on them has increased significantly, requests have become very slow to execute and there is a queue from services.

We look at metrics on MySQL/MySQL Table Details charts:


Here you can see a large number of data changes over time from one of the services.

MySQL/MySQL Instances Compare:


The graphs show interruptions in connections on 2 nodes for reading.


The graphs show a very high number of Threads on 2 nodes for reading.


The graphs show a very high number of slow queries on 2 nodes for reading. This is not typical for them, as there are practically no slow queries even during the whole day.


And also graphs for Transaction Handlers and Process States.

There are no problems with resource usage: CPU, Memory or Disk.
node1 (writer):

node2 (reader):

node3 (reader):

And here are the graphs for Galera.
MySQL/PXC/Galera Nodes Compare

And also from interesting graphics MySQL/MySQL InnoDB Details
node1:




node2:



Increased activity of InnoDB Log IO and InnoDB FSyncs is visible.


Too large on InnoDB Row Lock Wait Load.

node3:



Increased activity of InnoDB Log IO and InnoDB FSyncs is visible.


Too large on InnoDB Row Lock Wait Load.

Help me understand the situation, what is the impact due to changes in the DB that caused a very large queue during replication and led to a very large number of threads?
And this is not always the case. What else can I look at, do?

These graphs indicate a significant write event during the issue, with a peak of 12.24MB/s for wait/io/file/innodb/innodb_log_file. This resulted in excessive flushing to disk, leading to increased I/O activity and contributing to the problem.

Thanks for the answer!
Is there any recommendation, solution?
Maybe it makes sense to increase innodb_redo_log_capacity (deprecated innodb_log_file_size)? Current innodb_redo_log_capacity=5G (innodb_log_file_size=50331648 bytes).
For my part, I notified the developers to reduce the load and distribute the changes in this task with a limit and by time.

@shigaev.s I would increase innodb_redo_log_capacity to 8G, and also set innodb_flush_log_at_trx_commit=2, and if you have binlogs enabled, set sync_binlog=1000. These will lessen IO impact.

1 Like

PXC works on synchronous based replication so whenever your writes will increase it’ll impact on majority of nodes at same time and disk IOPs will increase.

In addition, innodb_doublewrite can be disabled for saving disk IOPs.

Thanks, I’ll increase innodb_redo_log_capacity to 8G.
And innodb_flush_log_at_trx_commit is already set to 2, sync_binlog = 2.
I’ll check the work after changing the settings.

Thank you, I’ll look at this parameter.

Be sure to understand the risk with disabling dblwrite buffer. The only way InnoDB protects itself against partial page writes is using this buffer. However, you do have a 3-node PXC, and the likelihood that all 3 nodes partially write a page on flush would be very rare.

sync_binlog=2 isn’t saving much. Try 500 or 1000.

I read about the innodb_doublewrite parameter, I won’t disable it.
But I think sync_binlog can be increased.
Thanks.

I agree, but as they have 3 nodes setup or you can say no single-point failure, I recommended this option.

and if they are using PXC cluster why they have binlog enable and need to play with **sync_binlog**

Just like “RAID is not a backup”, a cluster does not protect against disaster. You still need to do regular backups. In order to have point-in-time-recovery, binary logs are required.

1 Like