Percona MySQL 5.7.19 Performance Decline When Running for a Week.

We recently migrated our MySQL 5.5 community edition servers to Percona MySQ. 5.7.19. Migration went smooth. New serves have the following specification:

  • 54 core CPUs
  • 256 GB memory
  • Mirroring (RAID 1+0) - 894GB of usable space
  • CentOS 6.9

Percona MySQ. 5.7.19 was installed on new servers, which were slaved to servers running MySQL 5.5. There were no issues during the migration.

All our web properties are running perfectly, but there are two, which share code base and have similar data patters, that started showing performance decline after two weeks, which we detected using NewRelic. These sites are running in PHP. We collected some data from PMM (see attachments - they show before and after restart graphs).

Temporary solution to this problem currently is: Restarting mysql service. Note that innodb buffer pool is not saved and restored. The servers performance is restored immediately after restart.

There were other changes done during that period:

  • We installed pmm-client on each server (We love PMM - Kudos to the PMM team :).
  • Some of the tables were migrated from latin1 to utf8mb4 - We ran optimize on tables after data migration

This is the servers’ my.cnf file:

[mysqld]
init-file=/etc/memory_table_startup.sql
default_time_zone=US/Eastern
validate_password_policy=LOW
validate_password_length=0
sql_mode = “ERROR_FOR_DIVISION_BY_ZERO,NO_AUTO_CREATE_USER,NO_ENGINE_SUBSTITUTION”
server-id = 203
auto_increment_increment = 3
auto_increment_offset = 1
character-set-server=utf8mb4
collation-server=utf8mb4_unicode_ci

user=mysql

performance_schema
skip-name-resolve
autocommit=on
datadir=/var/lib/mysql
socket=/var/lib/mysql/mysql.sock
binlog_cache_size=2M
expire_logs_days=8
replicate-same-server-id = 0
max_connections=1600
max_connect_errors=999999999
key_buffer_size=16M
max_allowed_packet=16MB
table_open_cache = 10000
#sort_buffer_size = 2M
read_buffer_size = 512K
read_rnd_buffer_size = 8M
myisam_sort_buffer_size = 8M
thread_cache_size = 50
query_cache_size = 0
query_cache_type = 0
query_cache_limit= 400k
query_cache_min_res_unit= 1400
long_query_time = 1
slow_query_log_file=/var/log/mysql/slow_query.log
slow_query_log=1
log_queries_not_using_indexes = 0
tmp_table_size=128M
max_heap_table_size=128M
join_buffer_size = 32M
innodb_file_per_table
innodb_buffer_pool_size = 158640840kB
innodb_buffer_pool_instances=41
innodb_log_file_size=2G
innodb_log_buffer_size=8M
innodb_flush_log_at_trx_commit=1

write performance recommendation 5.5 performance tuning

innodb_flush_method=O_DIRECT
binlog_cache_size=1M
sync_binlog = 1
slave_load_tmpdir=/var/lib/mysql
tmpdir=/var/lib/mysql
innodb_read_io_threads=8
innodb_write_io_threads=8
innodb_io_capacity=2700

log-bin = mysql-bin
log-slave-updates
binlog-format=STATEMENT

Percona specific config

userstat=1
thread_statistics=1
query_response_time_stats = on
performance-schema-consumer-events-statements-history-long=ON

[mysql.server]
user=mysql
basedir=/usr

[client]
user=mysql
socket=/var/lib/mysql/mysql.sock
#default-character-set=utf8mb4

We also changed these variables suggested by “mysqltunner”, but reverted the changes because there were no improvements.
tmp_table_size=256M
max_heap_table_size=256M
thread_pool_size=36
join_buffer_size=32

Before this last restart innodb_buffer_pool_instances = 8.

Do you have any suggestion on what else I can try?

Thank you,
Alex Malberty
BabyCenter, LLC

Any ideas or advice? Anything I should try?

Hi there AlexM, just want you to know I’ve seen your query and I’m seeing if there’s anyone available with any suggestions. Sometimes if it’s very specific to your environment it’s a bit of a tricky one to catch on this open source forum, but let’s see if anyone has any general pointers.

Hi Alex,

Thanks for your contact. Can you please share MySQL InnoDB Metrics dashboards and System overview too? If possible, please share the option to share dashboard->snapshot so we can zoom in/out on the metrics.

If possible, please send the System Summary and MySQL Summary of the affected host. You can check it here:

[URL]Grafana

I can see the connections reaching the max allowed on the MySQL connections graph, so the server might get the connections exhausted and the restart is freeing those. This usually indicates bad connection handling by the application.

Finally, let me ask a few questions:

  • Are you receiving writes on this server? Is this the master server now? Is this still a slave?
  • What is your performance degrade perception? Slow queries? Resource consumption?

Hello Vinicious,

Thanks for your reply.

Here are the snapshots you requested:

[url]Percona – The Database Performance Experts
[url]Percona – The Database Performance Experts

System and MySQL summary:
[url]Percona – The Database Performance Experts

Regarding maxed out connections, this was due to a network problem we had and connection did not die properly. We observed this problem before we had that network problem.

This server is an active master in a master-master configuration. Application writes and reads from this server. Each master has one additional slave, which are used as read/only servers. On those two read/only servers we see degradation, but at slower pace. Let me know if you want any snapshot from those two servers as well.

In my opinion, degradation shows as slow queries, not resource consumption.

Again, thank you very much in advance. We really appreciate any suggestions.

Alex.

Aside from Vinicius’ suggestions, one thing the popped out of the graphs are table scans and temporary tables. Looks like they might be related - I would suggest looking over Query Analytics on PMM to find these queries. You will need to use slow log file for QAN instead of performance_schema plus log_slow_verbosity to full, long_query_time to 0 and more important log_slow_rate_limit to only sample a subset of the queries.

This might be of interest as well (see if you have transparent huge pages enabled) [url]MySQL Bugs: #84003: MySQL Consumes All Available System Memory || OS Swapping

Hello Vinicius,

Happy new year!

Did you find anything abnormal in the information I sent you? We upgraded our servers to Percona 5.7.23. Interestingly, the affected servers took longer to show degradation, but it eventually happened again and we had to restart them.
I am also going to make the changes to slow query Revin suggested to see if I can gather more data.
Regarding transparent huge pages, we are not running these servers as VM, which seem to be affected the most by this problem. I don’t see any memory issues on the servers when they show degradation.

Thank you again,
Alex.

This problem is solved. It turns out the issue was the information_schema.thread_statistics table. We use PMM and had enabled thread_statistics=1. The PHP application created a lot of threads, which caused this table to grow too big over time.

This is the post that helped us find the problem:
[URL]https://www.percona.com/blog/2017/07/11/thread_statistics-and-high-memory-usage/[/URL]