MySQL Galera: Partition Maintenance Very Slow During Heavy Load — Fast After Restart

Hi,
I have been using dedicated MySQL servers configured in Galera cluster for my application data, the application uses many tables. The servers specs are
model: Intel(R) Xeon(R) CPU E5-2686 v4 @ 2.30GHz
Memory: 157GB
Dual socket with 72 Cores.
i have dedicated 100GB to buffer_pool.
My tables are partitioned by Range when i alter table to drop/add partitions the operation takes lots of time which highly impacts the solution access to DB as these operations locks tables.
but when the server is restarted the buffer_pool is empty these operations are quickly performed, old partitions are deleted and new ones are created.
below are my main configuration for the mysql.
innodb_read_io_threads = 20
innodb_write_io_threads = 20
innodb_purge_threads = 20
innodb_thread_concurrency = 40
innodb_flush_neighbors = 0
innodb_io_capacity = 1000000
innodb_io_capacity_max = 10000000
innodb_doublewrite = 0
innodb_buffer_pool_size = 100G
innodb_flush_method = O_DIRECT
innodb_page_cleaners = 15
innodb_buffer_pool_instances = 15
#innodb_force_recovery=5
log_timestamps = SYSTEM
innodb_max_dirty_pages_pct=10
innodb_lru_scan_depth = 256
transaction-isolation = READ-UNCOMMITTED
performance_schema=0

can anyone help how this issue can be tackled? what additional configuration can be set to make the partition operations faster when the DB is in use and buffer is full. i can provide additional logs/configs

thank you.

Hello @mrehman786

My tables are partitioned by Range when i alter table to drop/add partitions the operation takes lots of time which highly impacts the solution access to DB as these operations locks tables.
but when the server is restarted the buffer_pool is empty these operations are quickly performed, old partitions are deleted and new ones are created.

The 1st thing I would suggest is collecting and review pt-stlak samples when you see slowness issues when dropping/adding partitions.
and review OS stats like IO/CUP usage and MySQL stats like innodbstatus, processlist output.

In innodbstatus look for buffer pool usage, pending IO (read,writes) etc.

pt-satlk sample collection:

PTDEST=/tmp/pt/collected/$(hostname)/
mkdir -p "${PTDEST}/samples";
cd /tmp/pt;
wget percona.com/get/pt-stalk;
chmod +x pt-stalk;
sudo ./pt-stalk --no-stalk --iterations=2 --sleep=30 --dest="${PTDEST}" --user=root --password=<mysql-root-pass>;

The objective is to look for OS resource saturation, on MySQL side processlist, buffer pool size and if current size is sufficient, locking, IO read writes, etc to find an issue.

Thank you for your reply.
I have observed the I/O stats through the prometheus/Grafana and iostat tool but everything look normal. I haven’t used this pt-stalk tool specifically. I will configure it and will let you know of it’s stats.
Thank you

These values are wildly out of spec. Please read over these two blogs:

This is incredibly low and will result in more aggressive disk IO due to rapid flushing. The default for this parameter is 90 (75 back in 5.7). Please set this back to something like 75.

1 Like

Thank you for your reply
i have tested multiple values for innodb_max_dirty_pages_pct but none helped.
The problem is with the innodb_buffer_pool memory, if the buffer is full the alter table add/drop partition takes lot of time but after the restart it works well.

Also i have dedicated SSDs configured in RAID0 for the datadir. that’s why i have set the innodb_io_capacity and capacity_max variables high. but i have tested it with reduced values 20000 and 40000 as well. so far none has helped.

Update
i have tested with different configuration for the innodb buffer pool size.
as i have Galera cluster i have set one node with innodb_buffer_pool_size with 25G. as i have all primary nodes in cluster if i execute the partition management script that creates daily partitions it takes far less time than on node with 100GB of buffer pool.

Just because you have 10000 IOPs does not mean “set innodb io=10000”. If you read the blogs I linked above, you’ll see that doing this can cause performance degradations.

That makes perfect sense. When you drop tables/partitions, innodb has to scan through the buffer pool linked lists (LRU) and remove the associated pages. When the pool is empty (ie; restart), there’s nothing to scan. You also have many buffer pool instances, which can cause pages from the same table to land in different pools (due to partitioning), so there’s more internal locking/concurrency issues to manage.

You might consider lowering the number of pool instances to 4, or 2, and see if that helps when the pool is not empty.

I have set the pool instances to 4 but it has negatively impacted the performance, it has increased the time for alter operations instead of reducing the time..

Currently i have adopted the procedure to reduce the total innodb_buffer_pool size dynamically from 100G to 20G then perform the partition operation it quicly creates partitions then again i increase the innodb_buffer_pool_size to 100G.
im not sure if it’s the correct way to deal with it. trying to figure out other options. Any help would be highly appreciated…