Hi everyone,
I have a MariaDB server with these setup:
Master
Version: 10.3.39
Memory: 64GB
CPU: 16vCPU
Disk: Network attached 500GB
I also have another 2 slaves connected to this master with lower spec setup:
Slave
Version: 10.3.39
Memory: 16GB
CPU: 8vCPU
Disk: Network attached 500GB
Slave-2
Version: 10.3.39
Memory: 8GB
CPU: 4vCPU
Disk: Network attached 500GB
However I found an issue with my master, where there is a very important process where it’s single threaded, and whenever we have a performance degradation in the database it blocks the whole process.
As you can see, the red box indicates the time where we did a restart on our mariadb service either by reboot or by restart the service.
However since this has becomes a repetitive issue we are quite stuck on how to debug this now that we have encounter another issue where the latency has spiking again after a few days of restart.
This is our .cnf
configuration along with other configuration that we adjust on the fly
#
# These groups are read by MariaDB server.
# Use it for options that only the server (but not clients) should see
#
# See the examples of server my.cnf files in /usr/share/mysql
# this is read by the standalone daemon and embedded servers
[server]
# this is only for the mysqld standalone daemon
[mysqld]
#
# * Basic Settings
#
user = mysql
pid-file = /run/mysqld/mysqld.pid
socket = /run/mysqld/mysqld.sock
#port = 3306
basedir = /usr
#datadir = /var/lib/mysql
datadir = /mnt/data/mysql
tmpdir = /tmp
lc-messages-dir = /usr/share/mysql
performance-schema
sql_mode = ""
#skip-external-locking
# Instead of skip-networking the default is now to listen only on
# localhost which is more compatible and is not less secure.
bind-address = <REDACTED>
#
# * Fine Tuning
#
#key_buffer_size = 16M
max_allowed_packet = 16M
#thread_stack = 192K
#thread_cache_size = 8
# This replaces the startup script and checks MyISAM tables if needed
# the first time they are touched
#myisam_recover_options = BACKUP
#max_connections = 100
#table_cache = 64
#thread_concurrency = 10
#
# * Query Cache Configuration
#
#query_cache_limit = 1M
query_cache_size = 32M
key_buffer_size = 16M
max_allowed_packet = 100M
thread_stack = 192K
thread_cache_size = 8
max_connections = 5000
open_files_limit = 100000
table_definition_cache = 10000
table_open_cache = 10000
tmp_table_size = 32M
max_heap_table_size = 32M
#
# * Logging and Replication
#
# Both location gets rotated by the cronjob.
# Be aware that this log type is a performance killer.
# As of 5.1 you can enable the log at runtime!
#general_log_file = /var/log/mysql/mysql.log
#general_log = 1
#
# Error log - should be very few entries.
#
log_error = /var/log/mysql/error.log
#
# Enable the slow query log to see queries with especially long duration
#slow_query_log_file = /var/log/mysql/mariadb-slow.log
#long_query_time = 10
#log_slow_rate_limit = 1000
#log_slow_verbosity = query_plan
#log-queries-not-using-indexes
#
# The following can be used as easy to replay backup logs or for replication.
# note: if you are setting up a replication slave, see README.Debian about
# other settings you may need to change.
#server-id = 1
#log_bin = /var/log/mysql/mysql-bin.log
#expire_logs_days = 10
#max_binlog_size = 100M
#binlog_do_db = include_database_name
#binlog_ignore_db = exclude_database_name
#
# * Security Features
#
# Read the manual, too, if you want chroot!
#chroot = /var/lib/mysql/
#
# For generating SSL certificates you can use for example the GUI tool "tinyca".
#
#ssl-ca = /etc/mysql/cacert.pem
#ssl-cert = /etc/mysql/server-cert.pem
#ssl-key = /etc/mysql/server-key.pem
#
# Accept only connections using the latest and most secure TLS protocol version.
# ..when MariaDB is compiled with OpenSSL:
#ssl-cipher = TLSv1.2
# ..when MariaDB is compiled with YaSSL (default in Debian):
#ssl = on
#
# * Character sets
#
# MySQL/MariaDB default is Latin1, but in Debian we rather default to the full
# utf8 4-byte character set. See also client.cnf
#
character-set-server = utf8mb4
collation-server = utf8mb4_general_ci
#
# * InnoDB
#
# InnoDB is enabled by default with a 10MB datafile in /var/lib/mysql/.
# Read the manual for more InnoDB related options. There are many!
#
# * Unix socket authentication plugin is built-in since 10.0.22-6
#
# Needed so the root database user can authenticate without a password but
# only when running as the unix root user.
#
# Also available for other users if required.
# See https://mariadb.com/kb/en/unix_socket-authentication-plugin/
# this is only for embedded server
[embedded]
# This group is only read by MariaDB servers, not by MySQL.
# If you use the same .cnf file for MySQL and MariaDB,
# you can put MariaDB-only options here
[mariadb]
server_id = 1
log-bin = /var/log/mysql/log-bin/mysql-bin.log
expire_logs_days = 7
binlog-format = ROW
sync_binlog = 1
gtid-domain-id = 1
gtid_strict_mode = 1
slave_parallel_threads = 4
# This group is only read by MariaDB-10.3 servers.
# If you use the same .cnf file for MariaDB of different versions,
# use this group for options that older servers don't understand
[mariadb-10.3]
thread_handling = pool-of-threads
thread_pool_size = 128
thread_pool_max_threads = 65536
thread_pool_stall_limit = 30
innodb_read_io_threads = 100
innodb_write_io_threads = 100
innodb_thread_concurrency = 100
innodb_doublewrite = 0
innodb_flush_method = O_DIRECT
innodb_log_files_in_group = 2
innodb_log_file_size = 128M
innodb_flush_log_at_trx_commit = 1
innodb_file_per_table = 1
innodb_buffer_pool_size = 48G
innodb_buffer_pool_chunk_size = 375M
innodb_buffer_pool_instances = 64
slow_query_log = 1
long_query_time = 0.1
slow_query_log_file = /var/log/mysql/slow-query.log
use_stat_tables = PREFERABLY
wait_timeout = 300
innodb_max_dirty_pages_pct = 90
innodb_max_dirty_pages_pct_lwm = 10
Also this is the screenshot of the PMM metrics during the time when the latency increased suddenly
InnoDB stats
Also we noticed a condition when we ran SHOW ENGINE INNODB STATUS
command, there are a lot of things:
- High Number of Idle Transactions:
- A large number of transactions (identified by
---TRANSACTION
entries) are listed as “not started,” indicating they are idle. - Each transaction has
0 lock struct(s)
, which means they are not holding any locks, andheap size 1128
, which is the memory allocation for transaction metadata. - Buffer Pool and Memory:
- Buffer pool usage seems efficient, with a high buffer pool hit rate of 999/1000, indicating that most data requests are being served from memory rather than disk.
- However, there is a significant number of free buffers in the buffer pools, indicating that the pool size might be larger than necessary for the workload.
- I/O Threads and Pending Operations:
- The I/O threads are predominantly in a waiting state, suggesting that I/O operations are not the bottleneck.
- No pending I/O operations were observed, indicating efficient disk I/O handling.
- Log and Checkpoint Activity:
- Log sequence numbers and flushed positions are up-to-date, with no pending log flushes or checkpoint writes, indicating efficient log management.
We have also done a couple of few things such as:
- Increased the buffer pool from 24GB to 48GB (however the performance degradation still occurs)
- Reduce the number of wait timeout (in hoping reducing the number of transaction not started in from the innodb status engine command)
- Adjusting
innodb_max_dirty_pages_pct = 90
innodb_max_dirty_pages_pct_lwm = 10
to these values from 0 (not set) - Executed use_stats_tables to
PREFERRABLY
We will do another execution such as:
- Upgrading mariaDB version from 10.3 to 11.4 because there were multiple threads that discuss that stated there is a problem with the SQL OPTIMIZER here mysql - MariaDB 10.4 random performance degradation - Stack Overflow
- We will try to execute
OPTIMIZE
to the tables which have slower latency and affecting our critical process
Thanks!