This may end up being a more general MySQL memory management question but we recently had problems with a client who’s 3 node percona 5.7.35 cluster was regularly running out of memory and, although we seemed to have solved it, we didn’t quite understand why.
The 3 PXC nodes were running on dedicated servers with 8 cores (16 VCPU) and 128GB RAM (16GB swap). We’d set the innodb_buffer_pool_size at 96GB (75% System RAM) and innodb_buffer_pool_instances to 11 (based on some previous advice we’d had to divide the buffer pool into approx 8GB chunks). Our application hits the DB hard and we don’t do any smart query routing so one node always takes all the traffic and the other 2 are just there as hot standby.
With the above settings the PXC node would typically sail well past its 96GB buffer pool size (in resident memory usage) within a few hours, maybe a day at most, heading right up to approx 126GB before it was typically OOM-killed by the OS. We tried to do some analysis but struggled to understand why this was happening.
The client had another identical environment with the only difference being that innodb_buffer_pool_instances was set to 8 and, under the same workload, this seemed stable with the PXC node topping out at about 108GB resident memory usage.
We changed the innodb_buffer_pool_instances to 8 on the problem environment and lo and behold, that seemed to do the trick.
It’s feasible there may have been some other differences we weren’t made aware of but, assuming it was just down to having 3 less buffer pool instances, can anyone help us understand why this was leading to a memory runaway?
Our general advice to clients deploying our system with PXC is to set innodb_buffer_pool_size to 80% system ram (even though we’d used 75% in above example) and innodb_buffer_pool_instances to divide that buffer pool into 8GB chunks. Is that wise on servers above a certain RAM perhaps?
Yes, using 75-80% of total server memory for buffer pool is the usual rule of thumb. BP can use an extra 5-10% of the allocated memory due to overhead and extra structures. I.e a Bp of 96 GB can use up to around 105 GB.
Apart from the memory reserved to buffer pool, MySQL will also need some memory for connections, temporary tables, other buffers, mysql process itself, etc…
Problem is likely something else is using all of the extra memory.
To check where memory is used, you can setup memory instrumentation as explained here : MySQL :: MySQL 8.0 Reference Manual :: 184.108.40.206 Memory Summary Tables
Memory instrumentation is not exact but an estimate that can be used to -more or less- check what is consuming the extra memory.
I would also suggest you install PMM that will collect all of MySQL specific information in real time. Next time there is a OOM crash, you can check memory consumption, amount of temp tables etc… right before the crash.
PMM is free and open source and you can check a free demo here: Grafana
Install instructions are here: Install Percona Monitoring and Management
For now my suggestion would be to decrease BP size a bit (2-3 GB will do) while you check amount and size of temporary tables and others things that might be spiking memory usage.
Thank you Carlos.
Yes we did suggest the client setup PMM but I don’t think they did it until after we “solved” the problem - but it will be interesting to see what it turns up. I did also recently discover the memory instrumentation stuff - was looking at switching on the memory counters but again too late to catch the problem actually happening.
I’d love to know what part the number buffer pool instances played in it - perhaps there was a tipping point.
Appreciate the response and will report back if we discover anything.
Hi again Neil,
The problem is not likely related to buffer pool instances number, but to total number of used memory. According to our CTO’s blogpost, the more buffer pool instances the better for certain workloads, but it might not always be the case: How Many innodb_buffer_pool_instances Do You Need in MySQL 8? - Percona Database Performance Blog . For a 96 GB BP switching from 12 to 8 BP instances should not make any noticeable difference in total memory consumption, just decrease contention.
Just in case, check that you don’t have high values for tmp tables variables, and that you don’t create too many of them.
If it’s not that, keep digging with memory instrumentation and PMM.