MySQL server performance degradation after multiple connects and disconnects of clients

Hello,

We faced some issue which causes noticeable performance degradation of mysql server. The change is significant - we saw 3 times slower performance than before and it remains at its low level until its full restart. As far as we can tell from all our tests, the issue is related to establishing and closing mysql connections to the server (all technical details and tests are below). If anyone can provide any useful information, it would be very nice.

Software: MySQL Percona Server versions 5.6.32 and 5.7.20

Environment: initially discovered on production server (5.6), later reproduced on test servers (5.6 and 5.7)

Recent Changes: no changes in software or hardware before its occurrence

Relevant Files (for 5.7):

/etc/my.cnf:

[client]
port = 3306
socket = /tmp/mysql.sock
[mysqld_safe]
log-error=/var/10.8.107.143/mysql/mysqld.log
pid-file=/var/run/mysqld/mysqld.pid
[mysqld]
datadir = /var/10.8.107.143/mysql
port = 3306
socket = /tmp/mysql.sock
skip-external-locking
key_buffer_size = 64M
max_allowed_packet = 32M
relay-log = relay-bin
sort_buffer_size = 2M
read_buffer_size = 2M
read_rnd_buffer_size = 8M
myisam_sort_buffer_size = 32M
thread_cache_size = 8
query_cache_size = 64M
default-storage-engine = InnoDB
innodb_buffer_pool_size = 640M
innodb_log_file_size = 100M
innodb_log_buffer_size = 8M
innodb_flush_log_at_trx_commit = 1
innodb_lock_wait_timeout = 50
log-bin-trust-function-creators = 1
innodb_stats_on_metadata = 0
[mysqldump]
quick
max_allowed_packet = 32M
[mysql]
default-character-set=utf8
[isamchk]
key_buffer = 64M
sort_buffer_size = 256M
read_buffer = 2M
write_buffer = 2M
[myisamchk]
key_buffer = 256M
sort_buffer_size = 256M
read_buffer = 2M
write_buffer = 2M
[mysqlhotcopy]
interactive-timeout
!include /var/10.8.107.143/mysql/my.cnf

/var/10.8.107.143/mysql/my.cnf:

[mysqld_safe]
innodb_sched_priority_cleaner = 20
[mysqld]
bind-address = 10.8.107.143
port = 3306
server-id = 2021072113
tmpdir = /var/10.8.107.143/mysql-tmpdir
max_connections = 10000
log-bin = mysql-bin
binlog_format = ROW
slave_type_conversions=ALL_NON_LOSSY
sql_mode=NO_ENGINE_SUBSTITUTION
expire_logs_days=5
userstat = ON
thread_statistics = OFF
performance_schema = ON
event_scheduler = ON
key_buffer_size = 64M
read_rnd_buffer_size = 8M
read_buffer_size = 2M
sort_buffer_size = 2M
max_heap_table_size = 128M
tmp_table_size = 128M
query_cache_type = OFF
query_cache_size = 1M
innodb_file_format = barracuda
innodb_strict_mode = ON
expand_fast_index_creation = ON
innodb_stats_auto_recalc = ON
innodb_stats_persistent_sample_pages = 40
innodb_log_buffer_size = 8M
innodb_log_file_size = 100M
innodb_buffer_pool_size = 44G
innodb_buffer_pool_instances = 8
innodb_buffer_pool_load_at_startup = OFF
innodb_buffer_pool_dump_at_shutdown = OFF
innodb_flush_method = O_DIRECT
innodb_flush_log_at_trx_commit = 1
innodb_flush_neighbors = 1
innodb_thread_concurrency = 24
innodb_commit_concurrency = 0
table_open_cache = 8000
table_open_cache_instances = 8
metadata_locks_hash_instances = 8
innodb_adaptive_hash_index = ON
innodb_io_capacity = 200
innodb_write_io_threads = 4
innodb_read_io_threads = 4
innodb_purge_threads = 2
innodb_checksum_algorithm = innodb
binlog_checksum = NONE
log_bin_use_v1_row_events = ON
sync_binlog = 0
innodb_data_file_path = ibdata1:10G:autoextend
max_heap_table_size=128M
tmp_table_size=128M
log-slave-updates
replicate-do-db=db1
replicate-do-db=db2
replicate-do-db=db3
replicate-do-db=db4
replicate-do-db=db5
replicate-do-db=db6
replicate-ignore-table = db1.Session_Deposits
replicate-ignore-table = db1.Session_Connect_Attempts
replicate-ignore-table = db1.Active_Calls
replicate-ignore-table = db2.active_nodes
replicate-ignore-table = db2.shared_cookies
innodb_log_file_size=400M
innodb_lock_wait_timeout=300
skip-slave-start
report-host=psh

No errors or warnings in mysqld logs.

All our steps for reproducing it (not in 100% of cases, but pretty frequent):

  • restart mysqld

  • check its initial performance. We just started some number of parallel mysql clients running a simple select query on an empty table

(Create test DB/table if they are absent):
mysql -u root -h 10.8.107.143 -e “create database if not exists ls_performance_test”;
mysql -u root -h 10.8.107.143 ls_performance_test -e “create table if not exists ls_performance_test (p int(10) unsigned not null auto_increment, k int(10) unsigned not null, v varchar(100) default null, primary key (p), key (k)) engine innodb default charset=latin1”;

for i in seq 1 48;
do
sleep 0.5;
while true; do echo “select * from ls_performance_test where k = 27;”; done | mysql -u root -h 10.8.107.143 ls_performance_test &
done

then checked values of the status variable “Queries” now and in a couple of minutes and calculated number of queries processed per minute

  • stop all clients started above

  • run a loop making connects and disconnects:

for i in seq 1 30;
do
for i in seq 1 1024;
do
timeout 8 mysql -u root -h 10.8.107.143 -A ls_performance_test &
done;
sleep 10;
done

  • when everything finishes, check server performance again. Most likely you will get noticeably lower performance. We were able to lower it from ~90000 queries per second to ~30000 queries per second one time

Two interesting things we noticed:

  • performance changes affect new connections only, already existing connections were not affected in our tests

  • we could not reproduce it after changing thread_cache_size to 2048, however we do not know whether it just masks the issue somehow (e.g., already existing threads remain in the “good” state, so, it is harder to notice the issue) or really affects its root cause. However, its increasing does not look like a proper fix

1 Like

I would disable userstat. That’s a heavy hitting metric. innodb_checksum_algorithm=innodb is slow; Switch to crc32. If on ssd, innodb_flush_neighbors=0. innodb_log_file_size looks pretty low. You guys are never writing more than 100M/hr? event_scheduler=off unless you are using it.

1 Like

Thank you for your suggestions. We tried userstat = OFF; innodb_checksum_algorithm = crc32; event_scheduler = OFF, however the issue is still there. We did not touch innodb_flush_neighbors (we are using HDDs) and innodb_log_file_size (we do not have any writing activity there during our tests).

1 Like

One more thing we noticed - when we ran tests for measuring its performance (48 parallel mysql clients with simple query on empty table), they had random states in mysql processlist (sleep/sending/etc.) when everything was okay. However, when the same test was done when the server was in the bad state, majority of them were shown in the state “statistics”. According to documentation it usually means slow disk operations, however disk activity was close to 0 during these periods (0% disk load; 0 KB/s disk read; 0-3 KB/s disk write)

1 Like

Can you try also query_cache_size=0? I suggest you install PMM to see the metrics and identify the problem more easily

1 Like

Thank you. query_cache_size=0 did not change anything. Regarding PMM - we have it. Is there anything specific? Basically, it shows very expected picture - number of queries (and related metrics) are 2 times lower after these multiple reconnects. Process states graphs show already described change - increased number of “statistics” states (from ~4 to ~30 in our last test). Perhaps the only new information we found there is “Client thread activity” - average threads running is ~12 for “good” periods and ~40 for “bad” periods for the same test conditions.

1 Like