Memory management and RAM OOM kill for PXC pods

Hello,

In our company, we are using Percona XtraDB Cluster operator to provide one MySQL cluster for each one of our customers due to internal reasons not worth diving into right now.

As our tests approach the final stages of development, we set Kubernetes resource limits to cap the RAM each PXC pod would be able to use but we had to remove this specific configuration because it appears the PXC pods can’t/won’t respect any memory boundaries. Once an OOM kills any PXC pod, usually in the middle of a regular workload, the cluster can’t come back online on it’s own and is precisely why we just turned the limit off for the time being.

As our undestanding of RAM usage in MySQL is limited and PXC also deploys the Galera replication, not to mention the RAM management done by Kubernetes could be a factor here, we are reaching out to see what could be done to enforce the resources as some configuration to mysqld and/or other programs so they never even allocate more RAM than they should, avoiding any OOM kill in the first place.

Currently, we are using my.cnf to configure the PXC pod, but to no avail. Any time we try to restore a database with huge INSERT INTO queries or other operations like that the pod inevitably allocates however much memory it wants to… Also, once allocated, the pods never deallocate it, hence possibly being a Kubernetes behavior after some memory spike.

This is a huge problem since, as there will be a whole lot of clusters in our environment, we need to set a reasonably low memory usage to each pod as CPU/RAM is the bulk of our cost structure. Even considering the memory limit is an arbitrary value, the fact that we can’t even set 1GB as per the example is the problem we are trying to figure a solution for.

Regards from Brazil, DAVI

2 Likes

Hi Davi!
Did you set both pxc.resources.requests.memory and pxc.resources.limits.memory ?

1 Like

I did, yes.Keeping to the example configuration, I set them both to 1GB of RAM.The OOM kill occurs once the pod goes beyond that limit. We see this happening anytime a mysqldump is used to restore a database.

1 Like

All right, thanks for confirming. This is indeed suspicious. Would you mind filing a bug report here https://jira.percona.com/projects/K8SPXC ? If not i can do it for yourself. Obrigado!

1 Like

can you show your my.cnf ?
If you are using huge INSERT INTO that may force Galera to allocate memory to accommodate your transaction. How huge is huge INSERT INTO?

1 Like

@DGB I’ve opened a report, the link is: https://jira.percona.com/browse/K8SPXC-441

@vadimtk I’m attaching the my.cnf file below and I did more tests to gather information I hope helps in some capacity.

From one of the mysqldump files we have, I’ve isolated one table that usually triggers the memory spike and I’ve restored only said table. The SQL file (plaintext) has ~1.3G divided into 1255 INSERT INTO queires, yielding roughly 1.04MB per query (a simple division). This specific table does not have any blob column, it’s all text.

Using the configuration at the end, I imported the 1.3GB SQL. The result was a database with total size amounting to 1.72 GB (308.55 MB of index, 1.42 MB of data). And now the RAM is perennially at 1180 MB, 18% above the would be limit of 1GB in the example CR configuration.

Here is the file (mainly the default values we see in PXC):

$ cat /etc/percona-xtradb-cluster.conf.d/init.cnf 
[mysqld]
default-time-zone='-03:00'
default_authentication_plugin=mysql_native_password
lower_case_table_names=1
partial_revokes=on
sql_mode=STRICT_TRANS_TABLES,NO_ZERO_IN_DATE,NO_ZERO_DATE,ERROR_FOR_DIVISION_BY_ZERO,NO_ENGINE_SUBSTITUTION
tls_version=TLSv1,TLSv1.1,TLSv1.2
innodb_buffer_pool_size = 640M
innodb_flush_method = O_DIRECT
join_buffer_size = 256K
key_buffer_size = 8M
max_allowed_packet = 24M
max_connections = 20
read_buffer_size = 128K
read_rnd_buffer_size = 256K
sort_buffer_size = 256K
thread_cache_size = 10
thread_stack = 256K
tmp_table_size = 16M
log_bin_trust_function_creators=1
wsrep_provider_options="gcache.size=1G; gcache.recover=yes"



1 Like

@davi I think this is amount of memory that Galera requires to process mysqldump. It would be better if replication could respect memory limits, but it will allocate memory needed to handle transactions.

1 Like

Hi, @vadimtk

We are facing a similar issue as @davi , trying to create some “resource manager” for our pods.

Does Galera does not explicitly limit about resource allocation? We found some performance tips, but no “hard limits”.

The memory sees to increase and do not release, and the most consuming query (like a big insert or update) sets the watermark (~1.5GB RAM)

Thanks

1 Like

Hi guys

Same problem((
A few DB, summary about 6 Gb size. My config:

max_allowed_packet               = 16M
    join_buffer_size                 = 64M
    sort_buffer_size                 = 1M
    tmp_table_size                   = 1G
    max_heap_table_size              = 1G
    innodb_log_file_size             =  1G 
    innodb_buffer_pool_instances     = 6
    innodb_buffer_pool_size          = 6G 
    innodb_buffer_pool_chunk_size    = 1G
    innodb_stats_on_metadata         = OFF
    slow_query_log                   = ON
    long_query_time                  = 3
    wsrep_slave_threads              = 2
    innodb_flush_log_at_trx_commit  = 0
    innodb_flush_method             = O_DIRECT
    innodb_file_per_table           = 1
    wsrep_provider_options="pc.wait_prim_timeout=PT3600S;gcache.size=1G;gcache.recover=yes"

K8S limits:

  resources:
    limits:
      memory: 24Gi
      cpu: 6000m
    requests:
      memory: 18Gi
      cpu: 6000m

pxc operator 1.9.1
and as result:

  pxc:
    Container ID:  docker://9341b8305d19c081c6fdc43fc9771b0e1f40454df20c66990a463a61846dad63
    Image:         percona/percona-xtradb-cluster:8.0.23-14.1
    Image ID:      docker-pullable://percona/percona-xtradb-cluster@sha256:8109f7ca4fc465ba862c08021df12e77b65d384395078e31e270d14b77810d79
    Ports:         3306/TCP, 4444/TCP, 4567/TCP, 4568/TCP, 33062/TCP, 33060/TCP
    Host Ports:    0/TCP, 0/TCP, 0/TCP, 0/TCP, 0/TCP, 0/TCP
    Command:
      /var/lib/mysql/pxc-entrypoint.sh
    Args:
      mysqld
    State:          Waiting
      Reason:       CrashLoopBackOff
    Last State:     Terminated
      Reason:       OOMKilled
      Exit Code:    137
      Started:      Mon, 06 Dec 2021 10:45:59 +0200
      Finished:     Mon, 06 Dec 2021 10:46:33 +0200
    Ready:          False
    Restart Count:  5
    Limits:
      cpu:     6
      memory:  24Gi
    Requests:
      cpu:      6
      memory:   18Gi

Is there any way to find root of this issue?

2 Likes

So, I’ve set my cluster on pause, than removed PVC on pod that was OOMkilled and start cluster again - works like a charm. But I still don’t understand main reason of this

1 Like

Hi @Oleksandr_Slubskyi

    tmp_table_size                   = 1G
    max_heap_table_size              = 1G
 innodb_buffer_pool_chunk_size    = 1G

Above Settings are not optimized for memory usage. tmp_table_size and max_heap_table_size allow each connection to use max 1G memory so it can easily run into OOM. I would suggest keeping it ~100M max, you can start with 32M

https://dev.mysql.com/doc/refman/8.0/en/server-system-variables.html#sysvar_tmp_table_size
https://dev.mysql.com/doc/refman/5.7/en/innodb-buffer-pool-resize.html#innodb-buffer-pool-chunk-size

2 Likes

Thanks for suggesting about tmp and heap. But whats wrong with innodb_buffer_pool_chunk_size ? In my case
innodb_buffer_pool_size (6Gb) = innodb_buffer_pool_chunk_size (1Gb) * innodb_buffer_pool_instances (6)

1 Like

@Oleksandr_Slubskyi
innodb_buffer_pool_chunk_size = 1G , this should be fine i just highlight the usecase.

but tmp_table_size = 1G and max_heap_table_size= 1G, for sure you should set lower value like 32M/64M

2 Likes

@lalit.choudhary got it, thanks!

1 Like

In our experience after one year dealing with the OOMKill the best solution is to remove the memory limit so no enforcement ever happens.

As far as we know, there is no way to preemptively know how much RAM the PXC pod will eventually allocate. Also, in all our clusters with varying my.cfg configurations to each (e.g. varying buffer sizes), some PXC pods stabilize in some “random” RAM usage, others keeps going way further than the MySQL documentation suggests — as in innodb_buffer_pool_size below 1Gi and the allocated RAM above 2Gi.

Anyhow, the operator is amazing, besides this ever increasing RAM usage we don’t have any complain and not single operational problem after one year and more than 200 PXC clusters in production (OOMKill aside obviously)…

3 Likes

Thank you all for your inputs.

Try workaround: [K8SPXC-441] Memory management and RAM OOM kill for PXC pods - Percona JIRA

We also working on memory usage improvements ref: [K8SPXC-907] Provide a way to use jemalloc for mysqld - Percona JIRA

1 Like