Pmm client getting OOM after few days in everest deployed cluster

I am running a PXC 3 node cluster using everest. After a few days I am getting this error.

Reason:Reason: OOMKilled - exit code: 137 

for the pmm-client container alone and this is happening for the first node alone.
Any help on how to address this or can someone point me to how to debug this thing.

OOM means there is not enough memory on that node. Can you increase the amount of memory? Use PMM to track memory usage and see which process is using too much.

This is a 32 gb instance. and there are no other significant pods scheduled in the node except for this 4 cpu 7gb pxc-0 db.

I did check here pmm sharing the screen shots

My question here would be if resources for pmm client alone can be increased?
Also i have no idea why the pxc-0 memory is hitting the limit so quickly either. i was doing some imports to tables but as you can see storage is just about 10gb so it is not a big database.
Later i turned off the pmm monitoring since the restarts keep happening. when i turn it back on it keeps happening again

Previously i got the same issue i uninstalled everest and did full install again it was fine for a week and then this happens after some data load of few gb.

Hi @beta-auction ,

Can you share:

  • K8s platform and version
  • Everest version
  • PMM server version
  • PXC version

How do you create a cluster in Everest, I mean the resources and nodes and do you set some custom database settings?

I am using RKE2 cluster, 1.30.2.
Everest used 1.1.0 (updated to 1.1.1 and still the same issue after turning monitoring off and then on)
PMM is 2.42 ( latest version)
PXC Version: 8.0.36-28.1 ( latest available in everest)

Cluster is created on aws, db nodes are are 3 ( all are m6a.2xlarge ). The percona cluster is 3 node medium config as specified in everest ui.

The db settings i have as

max_connections = 2000
thread_cache_size = 48
thread_pool_size = 8
table_open_cache_instances = 8
wait_timeout = 900
interactive_timeout = 900
default_time_zone = '+00:00'

Apart from usual db operation for my apps, i had done few imports(all less than few gb). And as you can see in the pics above max db size less 10 gb. Few tables have around million rows.

I notice similar behavior with an OVH cluster.

Everest Version: 1.2.0

pod description:

Init Containers:
    Image ID:
    Port:          <none>
    Host Port:     <none>
    State:          Terminated
      Reason:       Completed
      Exit Code:    0
      Started:      Mon, 07 Oct 2024 08:03:34 +0200
      Finished:     Mon, 07 Oct 2024 08:03:37 +0200
    Ready:          True
    Restart Count:  0
      cpu:     50m
      memory:  50M
      cpu:        50m
      memory:     50M
    Environment:  <none>
      /var/lib/mysql from datadir (rw)
      /var/run/secrets/ from kube-api-access-4p64t (ro)
    Image:          percona/pmm-client:2
    Image ID:
    Ports:          7777/TCP, 30100/TCP, 30101/TCP, 30102/TCP, 30103/TCP, 30104/TCP, 30105/TCP
    Host Ports:     0/TCP, 0/TCP, 0/TCP, 0/TCP, 0/TCP, 0/TCP, 0/TCP
    State:          Running
      Started:      Mon, 07 Oct 2024 08:27:14 +0200
    Last State:     Terminated
      Reason:       OOMKilled
      Exit Code:    137
      Started:      Mon, 07 Oct 2024 08:03:39 +0200
      Finished:     Mon, 07 Oct 2024 08:27:12 +0200
    Ready:          True
    Restart Count:  1
      cpu:     100m
      memory:  107374182400m
      cpu:     95m
      memory:  101994987520m
    Liveness:  http-get http://:7777/local/Status delay=60s timeout=5s period=10s #success=1 #failure=3
    Environment Variables from:
      <DB_NAME>-env-vars-pxc  Secret  Optional: true
      PMM_SERVER:                     <PMM_URL>
      CLIENT_PORT_LISTEN:             7777
      CLIENT_PORT_MIN:                30100
      CLIENT_PORT_MAX:                30105
      POD_NAME:                       <POD_NAME> (
      POD_NAMESPASE:                  <NAMESPACE> (v1:metadata.namespace)
      PMM_AGENT_SERVER_USERNAME:      api_key
      PMM_AGENT_SERVER_PASSWORD:      <set to the key 'pmmserverkey' in secret 'internal-<DB_NAME>'>  Optional: false
      PMM_AGENT_LISTEN_PORT:          7777
      PMM_AGENT_PORTS_MIN:            30100
      PMM_AGENT_PORTS_MAX:            30105
      PMM_AGENT_CONFIG_FILE:          /usr/local/percona/pmm2/config/pmm-agent.yaml
      PMM_AGENT_SETUP:                1
      PMM_AGENT_SETUP_FORCE:          1
      PMM_AGENT_SETUP_NODE_TYPE:      container
      DB_TYPE:                        mysql
      DB_USER:                        monitor
      DB_PASSWORD:                    <set to the key 'monitor' in secret 'internal-<DB_NAME>'>  Optional: false
      DB_ARGS:                        --query-source=perfschema
      DB_CLUSTER:                     pxc
      DB_HOST:                        localhost
      DB_PORT:                        33062
      CLUSTER_NAME:                   <CLUSTERNAME>
      PMM_AGENT_PRERUN_SCRIPT:        /var/lib/mysql/
      PMM_AGENT_SIDECAR:              true
      PMM_AGENT_PATHS_TEMPDIR:        /tmp
      /var/lib/mysql from datadir (rw)
      /var/run/secrets/ from kube-api-access-4p64t (ro)
    Image:         percona/percona-xtradb-cluster:8.0.36-28.1
    Image ID:
    Ports:         3306/TCP, 4444/TCP, 4567/TCP, 4568/TCP, 33062/TCP, 33060/TCP
    Host Ports:    0/TCP, 0/TCP, 0/TCP, 0/TCP, 0/TCP, 0/TCP
    State:          Running
      Started:      Mon, 07 Oct 2024 08:03:40 +0200
    Ready:          True
    Restart Count:  0
      cpu:     1500m
      memory:  5G
      cpu:      1500m
      memory:   5G
    Liveness:   exec [/var/lib/mysql/] delay=300s timeout=450s period=10s #success=1 #failure=3
    Readiness:  exec [/var/lib/mysql/] delay=15s timeout=450s period=30s #success=1 #failure=5
    Environment Variables from:
      <DB_NAME>-env-vars-pxc  Secret  Optional: true
      PXC_SERVICE:                    <DB_NAME>-pxc-unready
      MONITOR_HOST:                   %
      MYSQL_ROOT_PASSWORD:            <set to the key 'root' in secret 'internal-<DB_NAME>'>        Optional: false
      XTRABACKUP_PASSWORD:            <set to the key 'xtrabackup' in secret 'internal-<DB_NAME>'>  Optional: false
      MONITOR_PASSWORD:               <set to the key 'monitor' in secret 'internal-<DB_NAME>'>     Optional: false
      CLUSTER_HASH:                   1541532
      OPERATOR_ADMIN_PASSWORD:        <set to the key 'operator' in secret 'internal-<DB_NAME>'>  Optional: false
      LIVENESS_CHECK_TIMEOUT:         450
      DEFAULT_AUTHENTICATION_PLUGIN:  caching_sha2_password
      /etc/my.cnf.d from auto-config (rw)
      /etc/mysql/init-file from mysql-init-file (rw)
      /etc/mysql/mysql-users-secret from mysql-users-secret-file (rw)
      /etc/mysql/ssl from ssl (rw)
      /etc/mysql/ssl-internal from ssl-internal (rw)
      /etc/mysql/vault-keyring-secret from vault-keyring-secret (rw)
      /etc/percona-xtradb-cluster.conf.d from config (rw)
      /tmp from tmp (rw)
      /var/lib/mysql from datadir (rw)
      /var/run/secrets/ from kube-api-access-4p64t (ro)
  Type              Status
  Initialized       True 
  Ready             True 
  ContainersReady   True 
  PodScheduled      True 

The pmm agent gets OOM’d when I try to import a table structure.

The RAM request also seems rather tight for the agent, doesn’t it?

Any luck on resolving this? I see this on Everest v1.3
Pods are left behind due to OOM killed

unfortunately not.
@Tomislav_Plavcic should we put this in a separate thread for clarity?

just thinking here, maybe pmm client should be configurable or it should be separated as similar to a haproxy pod.

All are out of memory cases. at this point i have just turned off monitoring.