Increased I/O Wait and CPU Load after Upgrading MySQL from 5.7 to 8.0

Hello,

I’m dealing with an issue where immediately after upgrading MySQL from version 5.7 to 8.0, there’s been a significant increase in I/O wait and CPU load. Please refer to the attached PPM graph. I have a cluster consisting of 3 nodes, with 2 nodes designated for read-only and one node for read-write operations. The CPU load on the read-write node is exceeding the threshold value. Is there any configuration adjustment, different from version 5.7, that can alleviate the load on these nodes?

Thank you for any advice.

These nodes are running as VPS on Proxmox virtualization with configuration:
4x core (Intel(R) Xeon(R) Gold 6138 CPU) , 24GB RAM , SSD Samsung SSD 870 EVO

Below is the my.cnf configuration:

[mysqld]
user = mysql
pid-file = /var/run/mysqld/mysqld.pid
socket = /var/run/mysqld/mysqld.sock
port = 3306
basedir = /usr
datadir = /var/lib/mysql
tmpdir = /tmp
lc-messages-dir = /usr/share/mysql
skip-external-locking
skip-name-resolve
bind-address = 0.0.0.0

#key_buffer = 16M
max_allowed_packet = 32M
thread_stack = 2048K
thread_cache_size = 8

log_error = /var/log/mysql/error.log

percona SETTINGS
server-id=1

innodb_file_per_table=1
innodb_redo_log_capacity=640M
innodb_buffer_pool_size=16G
innodb_buffer_pool_instances=16
innodb_buffer_pool_chunk_size=128M
innodb_flush_method=O_DIRECT
innodb_thread_concurrency=15
innodb_flush_log_at_trx_commit=0

wsrep_sst_method=xtrabackup-v2
wsrep_provider=/usr/lib/libgalera_smm.so
wsrep_provider_options=“gcache.size = 5G;cert.log_conflicts=YES;gmcast.peer_timeout=PT10S”
pxc-encrypt-cluster-traffic=OFF
wsrep_log_conflicts=ON
wsrep_cluster_name=xxx
wsrep_cluster_address=gcomm://x.x.x.x.x.x.x.x.x.x.x.x.x.x…x
wsrep_slave_threads=8
wsrep_node_name=x.x.x.x
wsrep_node_address=x.x.x.x
wsrep_sst_donor=x.x.x.x
wsrep_retry_autocommit=5

pxc_strict_mode=ENFORCING

sql_mode=“NO_ZERO_IN_DATE,NO_ZERO_DATE,ERROR_FOR_DIVISION_BY_ZERO,NO_ENGINE_SUBSTITUTION”
log_timestamps=SYSTEM
group_concat_max_len = 64M

tmp_table_size = 128M
sort_buffer_size = 2M
join_buffer_size = 2M
table_open_cache=2048
max_connections=500
max_heap_table_size = 512M
innodb_lru_scan_depth = 256

binlog_format=ROW
default_storage_engine=InnoDB
innodb_autoinc_lock_mode=2

[mysqldump]
quick
quote-names
max_allowed_packet = 64M

[client]
port = 3306
socket = /var/run/mysqld/mysqld.sock

[mysqld_safe]
socket = /var/run/mysqld/mysqld.sock
nice = 0
service_startup_timeout = 2880

PPM

userstat=ON

Screenshot of increse load and i/o wait from PPM

The upgrade took place on 03/25, coinciding with a noticeable increase

Hello @cesanek,
It’s PMM, not PPM. :slight_smile: Anyways, is there anything notable in MySQL’s error log? Have you looked at the performance schema waits tables in PMM to see what the IO is waiting on?

innodb_lru_scan_depth I would revert this back to the default, and instead opt to set innodb_flush_neighbors=0 instead.

Thanks for your reply! I haven’t found anything relevant in the MySQL’s error log. I adjusted innodb_lru_scan_depth and innodb_flush_neighbors as you suggested, but unfortunately, these changes haven’t had a significant impact on CPU load or I/O waits. There was a change in these parameters around 14:45 according to the graph.

Can you please show that Perf Schema waits graph using the same time range as your CPU graph above?

Sure!

On the Waits(Load), yea, there’s an overall increase in everything, and your iowait increases as well. Maybe try sync_binlog=1000, or innodb_thread_concurrency=0 Side note, tmp_table_size and max_heap_table_size should be the same, as the lower of the two is the actually used value. The nice=0 i’ve never seen before; I would remove that.

My bad, binary logs are enabled by default since MySQL version 8.X. However, we don’t need them because we don’t use master-slave replication, so I disabled binary logs completely. This, however, did not reduce the high CPU load. So, following your recommendation, I tried to set the parameter ‘innodb_thread_concurrency=0’, but this also had no noticeable effect on CPU load. Do you think there are any other options?

And thank you for mentioning other additional incorrect settings.

BTW:
I will add that the problem only occurs on the node that is designated as the write node using a proxy. The cluster is configured with 2 nodes for reading and 1 node for writing only.