In our company, we are using Percona XtraDB Cluster operator to provide one MySQL cluster for each one of our customers due to internal reasons not worth diving into right now.
As our tests approach the final stages of development, we set Kubernetes resource limits to cap the RAM each PXC pod would be able to use but we had to remove this specific configuration because it appears the PXC pods can’t/won’t respect any memory boundaries. Once an OOM kills any PXC pod, usually in the middle of a regular workload, the cluster can’t come back online on it’s own and is precisely why we just turned the limit off for the time being.
As our undestanding of RAM usage in MySQL is limited and PXC also deploys the Galera replication, not to mention the RAM management done by Kubernetes could be a factor here, we are reaching out to see what could be done to enforce the resources as some configuration to mysqld and/or other programs so they never even allocate more RAM than they should, avoiding any OOM kill in the first place.
Currently, we are using my.cnf to configure the PXC pod, but to no avail. Any time we try to restore a database with huge INSERT INTO queries or other operations like that the pod inevitably allocates however much memory it wants to… Also, once allocated, the pods never deallocate it, hence possibly being a Kubernetes behavior after some memory spike.
This is a huge problem since, as there will be a whole lot of clusters in our environment, we need to set a reasonably low memory usage to each pod as CPU/RAM is the bulk of our cost structure. Even considering the memory limit is an arbitrary value, the fact that we can’t even set 1GB as per the example is the problem we are trying to figure a solution for.
I did, yes.Keeping to the example configuration, I set them both to 1GB of RAM.The OOM kill occurs once the pod goes beyond that limit. We see this happening anytime a mysqldump is used to restore a database.
All right, thanks for confirming. This is indeed suspicious. Would you mind filing a bug report here https://jira.percona.com/projects/K8SPXC ? If not i can do it for yourself. Obrigado!
can you show your my.cnf ? If you are using huge INSERT INTO that may force Galera to allocate memory to accommodate your transaction. How huge is huge INSERT INTO?
@vadimtk I’m attaching the my.cnf file below and I did more tests to gather information I hope helps in some capacity.
From one of the mysqldump files we have, I’ve isolated one table that usually triggers the memory spike and I’ve restored only said table. The SQL file (plaintext) has ~1.3G divided into 1255 INSERT INTO queires, yielding roughly 1.04MB per query (a simple division). This specific table does not have any blob column, it’s all text.
Using the configuration at the end, I imported the 1.3GB SQL. The result was a database with total size amounting to 1.72 GB (308.55 MB of index, 1.42 MB of data). And now the RAM is perennially at 1180 MB, 18% above the would be limit of 1GB in the example CR configuration.
Here is the file (mainly the default values we see in PXC):
@davi I think this is amount of memory that Galera requires to process mysqldump. It would be better if replication could respect memory limits, but it will allocate memory needed to handle transactions.
So, I’ve set my cluster on pause, than removed PVC on pod that was OOMkilled and start cluster again - works like a charm. But I still don’t understand main reason of this
Above Settings are not optimized for memory usage. tmp_table_size and max_heap_table_size allow each connection to use max 1G memory so it can easily run into OOM. I would suggest keeping it ~100M max, you can start with 32M
Thanks for suggesting about tmp and heap. But whats wrong with innodb_buffer_pool_chunk_size ? In my case
innodb_buffer_pool_size (6Gb) = innodb_buffer_pool_chunk_size (1Gb) * innodb_buffer_pool_instances (6)
In our experience after one year dealing with the OOMKill the best solution is to remove the memory limit so no enforcement ever happens.
As far as we know, there is no way to preemptively know how much RAM the PXC pod will eventually allocate. Also, in all our clusters with varying my.cfg configurations to each (e.g. varying buffer sizes), some PXC pods stabilize in some “random” RAM usage, others keeps going way further than the MySQL documentation suggests — as in innodb_buffer_pool_size below 1Gi and the allocated RAM above 2Gi.
Anyhow, the operator is amazing, besides this ever increasing RAM usage we don’t have any complain and not single operational problem after one year and more than 200 PXC clusters in production (OOMKill aside obviously)…
It seems my team has stumbled into the same issue. Our pxc pods start increasing memory consumption and then stabilizes outside of our initial memory request.
Have there been any updates as to what we can configure on pxc to properly handle this?