Hello,
We faced some issue which causes noticeable performance degradation of mysql server. The change is significant - we saw 3 times slower performance than before and it remains at its low level until its full restart. As far as we can tell from all our tests, the issue is related to establishing and closing mysql connections to the server (all technical details and tests are below). If anyone can provide any useful information, it would be very nice.
Software: MySQL Percona Server versions 5.6.32 and 5.7.20
Environment: initially discovered on production server (5.6), later reproduced on test servers (5.6 and 5.7)
Recent Changes: no changes in software or hardware before its occurrence
Relevant Files (for 5.7):
/etc/my.cnf:
[client]
port = 3306
socket = /tmp/mysql.sock
[mysqld_safe]
log-error=/var/10.8.107.143/mysql/mysqld.log
pid-file=/var/run/mysqld/mysqld.pid
[mysqld]
datadir = /var/10.8.107.143/mysql
port = 3306
socket = /tmp/mysql.sock
skip-external-locking
key_buffer_size = 64M
max_allowed_packet = 32M
relay-log = relay-bin
sort_buffer_size = 2M
read_buffer_size = 2M
read_rnd_buffer_size = 8M
myisam_sort_buffer_size = 32M
thread_cache_size = 8
query_cache_size = 64M
default-storage-engine = InnoDB
innodb_buffer_pool_size = 640M
innodb_log_file_size = 100M
innodb_log_buffer_size = 8M
innodb_flush_log_at_trx_commit = 1
innodb_lock_wait_timeout = 50
log-bin-trust-function-creators = 1
innodb_stats_on_metadata = 0
[mysqldump]
quick
max_allowed_packet = 32M
[mysql]
default-character-set=utf8
[isamchk]
key_buffer = 64M
sort_buffer_size = 256M
read_buffer = 2M
write_buffer = 2M
[myisamchk]
key_buffer = 256M
sort_buffer_size = 256M
read_buffer = 2M
write_buffer = 2M
[mysqlhotcopy]
interactive-timeout
!include /var/10.8.107.143/mysql/my.cnf
/var/10.8.107.143/mysql/my.cnf:
[mysqld_safe]
innodb_sched_priority_cleaner = 20
[mysqld]
bind-address = 10.8.107.143
port = 3306
server-id = 2021072113
tmpdir = /var/10.8.107.143/mysql-tmpdir
max_connections = 10000
log-bin = mysql-bin
binlog_format = ROW
slave_type_conversions=ALL_NON_LOSSY
sql_mode=NO_ENGINE_SUBSTITUTION
expire_logs_days=5
userstat = ON
thread_statistics = OFF
performance_schema = ON
event_scheduler = ON
key_buffer_size = 64M
read_rnd_buffer_size = 8M
read_buffer_size = 2M
sort_buffer_size = 2M
max_heap_table_size = 128M
tmp_table_size = 128M
query_cache_type = OFF
query_cache_size = 1M
innodb_file_format = barracuda
innodb_strict_mode = ON
expand_fast_index_creation = ON
innodb_stats_auto_recalc = ON
innodb_stats_persistent_sample_pages = 40
innodb_log_buffer_size = 8M
innodb_log_file_size = 100M
innodb_buffer_pool_size = 44G
innodb_buffer_pool_instances = 8
innodb_buffer_pool_load_at_startup = OFF
innodb_buffer_pool_dump_at_shutdown = OFF
innodb_flush_method = O_DIRECT
innodb_flush_log_at_trx_commit = 1
innodb_flush_neighbors = 1
innodb_thread_concurrency = 24
innodb_commit_concurrency = 0
table_open_cache = 8000
table_open_cache_instances = 8
metadata_locks_hash_instances = 8
innodb_adaptive_hash_index = ON
innodb_io_capacity = 200
innodb_write_io_threads = 4
innodb_read_io_threads = 4
innodb_purge_threads = 2
innodb_checksum_algorithm = innodb
binlog_checksum = NONE
log_bin_use_v1_row_events = ON
sync_binlog = 0
innodb_data_file_path = ibdata1:10G:autoextend
max_heap_table_size=128M
tmp_table_size=128M
log-slave-updates
replicate-do-db=db1
replicate-do-db=db2
replicate-do-db=db3
replicate-do-db=db4
replicate-do-db=db5
replicate-do-db=db6
replicate-ignore-table = db1.Session_Deposits
replicate-ignore-table = db1.Session_Connect_Attempts
replicate-ignore-table = db1.Active_Calls
replicate-ignore-table = db2.active_nodes
replicate-ignore-table = db2.shared_cookies
innodb_log_file_size=400M
innodb_lock_wait_timeout=300
skip-slave-start
report-host=psh
No errors or warnings in mysqld logs.
All our steps for reproducing it (not in 100% of cases, but pretty frequent):
-
restart mysqld
-
check its initial performance. We just started some number of parallel mysql clients running a simple select query on an empty table
(Create test DB/table if they are absent):
mysql -u root -h 10.8.107.143 -e “create database if not exists ls_performance_test”;
mysql -u root -h 10.8.107.143 ls_performance_test -e “create table if not exists ls_performance_test (p int(10) unsigned not null auto_increment, k int(10) unsigned not null, v varchar(100) default null, primary key (p), key (k)) engine innodb default charset=latin1”;
for i in seq 1 48
;
do
sleep 0.5;
while true; do echo “select * from ls_performance_test where k = 27;”; done | mysql -u root -h 10.8.107.143 ls_performance_test &
done
then checked values of the status variable “Queries” now and in a couple of minutes and calculated number of queries processed per minute
-
stop all clients started above
-
run a loop making connects and disconnects:
for i in seq 1 30
;
do
for i in seq 1 1024
;
do
timeout 8 mysql -u root -h 10.8.107.143 -A ls_performance_test &
done;
sleep 10;
done
- when everything finishes, check server performance again. Most likely you will get noticeably lower performance. We were able to lower it from ~90000 queries per second to ~30000 queries per second one time
Two interesting things we noticed:
-
performance changes affect new connections only, already existing connections were not affected in our tests
-
we could not reproduce it after changing thread_cache_size to 2048, however we do not know whether it just masks the issue somehow (e.g., already existing threads remain in the “good” state, so, it is harder to notice the issue) or really affects its root cause. However, its increasing does not look like a proper fix