MySQL Galera: Partition Maintenance Very Slow During Heavy Load — Fast After Restart

mrehman786 · May 12, 2025, 12:21pm

Hi,
I have been using dedicated MySQL servers configured in Galera cluster for my application data, the application uses many tables. The servers specs are
model: Intel(R) Xeon(R) CPU E5-2686 v4 @ 2.30GHz
Memory: 157GB
Dual socket with 72 Cores.
i have dedicated 100GB to buffer_pool.
My tables are partitioned by Range when i alter table to drop/add partitions the operation takes lots of time which highly impacts the solution access to DB as these operations locks tables.
but when the server is restarted the buffer_pool is empty these operations are quickly performed, old partitions are deleted and new ones are created.
below are my main configuration for the mysql.
innodb_read_io_threads = 20
innodb_write_io_threads = 20
innodb_purge_threads = 20
innodb_thread_concurrency = 40
innodb_flush_neighbors = 0
innodb_io_capacity = 1000000
innodb_io_capacity_max = 10000000
innodb_doublewrite = 0
innodb_buffer_pool_size = 100G
innodb_flush_method = O_DIRECT
innodb_page_cleaners = 15
innodb_buffer_pool_instances = 15
#innodb_force_recovery=5
log_timestamps = SYSTEM
innodb_max_dirty_pages_pct=10
innodb_lru_scan_depth = 256
transaction-isolation = READ-UNCOMMITTED
performance_schema=0

can anyone help how this issue can be tackled? what additional configuration can be set to make the partition operations faster when the DB is in use and buffer is full. i can provide additional logs/configs

thank you.

lalit.choudhary · May 13, 2025, 8:04am

Hello @mrehman786

My tables are partitioned by Range when i alter table to drop/add partitions the operation takes lots of time which highly impacts the solution access to DB as these operations locks tables.
but when the server is restarted the buffer_pool is empty these operations are quickly performed, old partitions are deleted and new ones are created.

The 1st thing I would suggest is collecting and review pt-stlak samples when you see slowness issues when dropping/adding partitions.
and review OS stats like IO/CUP usage and MySQL stats like innodbstatus, processlist output.

In innodbstatus look for buffer pool usage, pending IO (read,writes) etc.

pt-satlk sample collection:

PTDEST=/tmp/pt/collected/$(hostname)/
mkdir -p "${PTDEST}/samples";
cd /tmp/pt;
wget percona.com/get/pt-stalk;
chmod +x pt-stalk;
sudo ./pt-stalk --no-stalk --iterations=2 --sleep=30 --dest="${PTDEST}" --user=root --password=<mysql-root-pass>;

The objective is to look for OS resource saturation, on MySQL side processlist, buffer pool size and if current size is sufficient, locking, IO read writes, etc to find an issue.

mrehman786 · May 13, 2025, 12:03pm

Thank you for your reply.
I have observed the I/O stats through the prometheus/Grafana and iostat tool but everything look normal. I haven’t used this pt-stalk tool specifically. I will configure it and will let you know of it’s stats.
Thank you

matthewb · May 13, 2025, 4:19pm

These values are wildly out of spec. Please read over these two blogs:

This is incredibly low and will result in more aggressive disk IO due to rapid flushing. The default for this parameter is 90 (75 back in 5.7). Please set this back to something like 75.

mrehman786 · May 14, 2025, 1:43pm

Thank you for your reply
i have tested multiple values for innodb_max_dirty_pages_pct but none helped.
The problem is with the innodb_buffer_pool memory, if the buffer is full the alter table add/drop partition takes lot of time but after the restart it works well.

Also i have dedicated SSDs configured in RAID0 for the datadir. that’s why i have set the innodb_io_capacity and capacity_max variables high. but i have tested it with reduced values 20000 and 40000 as well. so far none has helped.

mrehman786 · May 14, 2025, 2:17pm

Update
i have tested with different configuration for the innodb buffer pool size.
as i have Galera cluster i have set one node with innodb_buffer_pool_size with 25G. as i have all primary nodes in cluster if i execute the partition management script that creates daily partitions it takes far less time than on node with 100GB of buffer pool.

matthewb · May 14, 2025, 4:00pm

Just because you have 10000 IOPs does not mean “set innodb io=10000”. If you read the blogs I linked above, you’ll see that doing this can cause performance degradations.

matthewb · May 14, 2025, 4:03pm

That makes perfect sense. When you drop tables/partitions, innodb has to scan through the buffer pool linked lists (LRU) and remove the associated pages. When the pool is empty (ie; restart), there’s nothing to scan. You also have many buffer pool instances, which can cause pages from the same table to land in different pools (due to partitioning), so there’s more internal locking/concurrency issues to manage.

You might consider lowering the number of pool instances to 4, or 2, and see if that helps when the pool is not empty.

mrehman786 · May 16, 2025, 6:53am

I have set the pool instances to 4 but it has negatively impacted the performance, it has increased the time for alter operations instead of reducing the time..

mrehman786 · May 16, 2025, 6:55am

Currently i have adopted the procedure to reduce the total innodb_buffer_pool size dynamically from 100G to 20G then perform the partition operation it quicly creates partitions then again i increase the innodb_buffer_pool_size to 100G.
im not sure if it’s the correct way to deal with it. trying to figure out other options. Any help would be highly appreciated…

mrehman786 · May 23, 2025, 9:31am

Update
upon looking into the innodb engine status it was observed Adaptive hash index had only around 30% searches while no-hash searches were around 70% so i disabled the adaptive hash index, Also i have set the max_dirty_pages_pct to 50% insetead of default 90 and saw huge performance improvement, Now partition operation takes around 4 mins instead of hours.

Topic		Replies	Views
Percona server performance degradation Other MySQL® Questions	8	1002	May 7, 2013
Table alter to drop a partition takes 10mins or more to complete Percona Server for MySQL 8.0 mysql	11	535	February 26, 2025
Possible performance problem with Percona Mysql Other MySQL® Questions	2	801	January 10, 2011
Performance Problems - High IO Wait Other MySQL® Questions	3	1456	May 28, 2008
Whenever rebooting the DB instance which is in AWS EC2 the mysql service went to down state and getting crashed	15	960	March 26, 2024

MySQL Galera: Partition Maintenance Very Slow During Heavy Load — Fast After Restart

Related topics