Hi there,
We just upgraded a 3 nodes cluster from 5.7 to 8.0 for an write intensive environment. Right after the upgrade, we noticed that the disk IO went from about 40% to 100% all the time.
Seems it is generated from innoDB, but the total data written is the same before and after the disk usage increase
I’m not sure if this is an expected change from 5.7 to 8.0, but this seems to be triggering an issue that was not present in 5.7. A few times a day, the cluster enters into pause due to flow control from one of the servers (a random one) and there is no resume message. The cluster get stuck in pauses status
As soon as we stop mysql in the server that originated the pause, the other 2 servers will get up to date in just a second. We then resume mysql in the offending server, it does IST and all works again for some time until we get stuck again (all the time disk usages are at 100%).
I’m not a MySQL expert but wonder if that 100% disk usage can be the reason for the issue (and what can be causing that high disk usage). The cluster is receiving the exact same workload than before the upgrade.
All 3 servers are running on ubuntu 20.04, 32 cores, 192GB, 3TB SSD disks
All 3 servers have similar config files:
server-id=1
datadir=/var/lib/percona-xtradb-cluster
socket=/var/run/mysqld/mysqld.sock
log-error=/var/log/mysql/error.log
pid-file=/var/run/mysqld/mysqld.pid
log-bin=galera1-bin
log_slave_updates=1
expire_logs_days=7
symbolic-links=0
wsrep_provider=/usr/lib/galera4/libgalera_smm.so
wsrep_cluster_address=gcomm://galera1,galera2,galera3
binlog_format=ROW
default_storage_engine=InnoDB
innodb_autoinc_lock_mode=2
wsrep_node_address=192.168.212.162
wsrep_cluster_name=gorilla-cluster
wsrep_node_name=galera1
pxc_strict_mode=ENFORCING
wsrep_sst_method=xtrabackup-v2
innodb_flush_log_at_trx_commit=0
innodb_buffer_pool_size=52224M
bind-address=192.168.212.162
innodb_lock_wait_timeout = 120
character_set_server=utf8mb4
collation-server = utf8mb4_unicode_ci
sql_mode=NO_ZERO_IN_DATE,NO_ZERO_DATE,ERROR_FOR_DIVISION_BY_ZERO,NO_ENGINE_SUBSTITUTION
gtid_mode=ON
enforce_gtid_consistency
log_output=file
slow_query_log=ON
long_query_time=0
log_slow_rate_limit=50
log_slow_rate_type=query
log_slow_verbosity=full
log_slow_admin_statements=ON
log_slow_slave_statements=ON
slow_query_log_always_write_time=1
slow_query_log_use_global_control=all
innodb_monitor_enable=all
userstat=1
max_allowed_packet = 64M
innodb_log_file_size = 10G
binlog_space_limit = 70G
wsrep_retry_autocommit=4
skip_name_resolve=ON
thread_pool_size=24
join_buffer_size=8M
wsrep_notify_cmd=/var/www/gorilla/server-scripts/galeracheck.py
innodb_flush_method = O_DIRECT
pxc-encrypt-cluster-traffic=ON
wait_timeout=10800
wsrep_log_conflicts
wsrep_slave_threads=80
wsrep_provider_options="gcache.size=20G;gcs.fc_limit=400;gcs.fc_factor=0.8;gcache.recover=yes"
max_connections=3000
innodb_buffer_pool_instances=51
open_files_limit=20480
innodb_open_files=10240
table_open_cache=10240
Again, this is the config migrated from 5.7 and was working just fine. Any help on the right direction to look at will be greatly appreciated.
Sorry for my English
Sergio