Pmm client getting OOM after few days in everest deployed cluster

I am running a PXC 3 node cluster using everest. After a few days I am getting this error.

terminated
Reason:Reason: OOMKilled - exit code: 137 

for the pmm-client container alone and this is happening for the first node alone.
Any help on how to address this or can someone point me to how to debug this thing.

OOM means there is not enough memory on that node. Can you increase the amount of memory? Use PMM to track memory usage and see which process is using too much.

This is a 32 gb instance. and there are no other significant pods scheduled in the node except for this 4 cpu 7gb pxc-0 db.

I did check here pmm sharing the screen shots

My question here would be if resources for pmm client alone can be increased?
Also i have no idea why the pxc-0 memory is hitting the limit so quickly either. i was doing some imports to tables but as you can see storage is just about 10gb so it is not a big database.
Later i turned off the pmm monitoring since the restarts keep happening. when i turn it back on it keeps happening again

Previously i got the same issue i uninstalled everest and did full install again it was fine for a week and then this happens after some data load of few gb.

Hi @beta-auction ,

Can you share:

  • K8s platform and version
  • Everest version
  • PMM server version
  • PXC version

How do you create a cluster in Everest, I mean the resources and nodes and do you set some custom database settings?

Thank you!

I am using RKE2 cluster, 1.30.2.
Everest used 1.1.0 (updated to 1.1.1 and still the same issue after turning monitoring off and then on)
PMM is 2.42 ( latest version)
PXC Version: 8.0.36-28.1 ( latest available in everest)

Cluster is created on aws, db nodes are are 3 ( all are m6a.2xlarge ). The percona cluster is 3 node medium config as specified in everest ui.

The db settings i have as

[mysqld]
max_connections = 2000
thread_cache_size = 48
thread_pool_size = 8
table_open_cache_instances = 8
wait_timeout = 900
interactive_timeout = 900
default_time_zone = '+00:00'

Apart from usual db operation for my apps, i had done few imports(all less than few gb). And as you can see in the pics above max db size less 10 gb. Few tables have around million rows.

I notice similar behavior with an OVH cluster.

Everest Version: 1.2.0

pod description:

Init Containers:
  pxc-init:
    [...]
    Image:         docker.io/percona/percona-xtradb-cluster-operator:1.15.0
    Image ID:      docker.io/percona/percona-xtradb-cluster-operator@sha256:6f7d8d4e472b8c4d166573cc7bb714bbb0fdf1535142b6138c62fdecbf881df9
    Port:          <none>
    Host Port:     <none>
    Command:
      /pxc-init-entrypoint.sh
    State:          Terminated
      Reason:       Completed
      Exit Code:    0
      Started:      Mon, 07 Oct 2024 08:03:34 +0200
      Finished:     Mon, 07 Oct 2024 08:03:37 +0200
    Ready:          True
    Restart Count:  0
    Limits:
      cpu:     50m
      memory:  50M
    Requests:
      cpu:        50m
      memory:     50M
    Environment:  <none>
    Mounts:
      /var/lib/mysql from datadir (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-4p64t (ro)
Containers:
  pmm-client:
    [...]
    Image:          percona/pmm-client:2
    Image ID:       docker.io/percona/pmm-client@sha256:18dea613445566c9037134335a74f0ff2f93c5612054d4f83dfd4e0e89e2bbc6
    Ports:          7777/TCP, 30100/TCP, 30101/TCP, 30102/TCP, 30103/TCP, 30104/TCP, 30105/TCP
    Host Ports:     0/TCP, 0/TCP, 0/TCP, 0/TCP, 0/TCP, 0/TCP, 0/TCP
    State:          Running
      Started:      Mon, 07 Oct 2024 08:27:14 +0200
    Last State:     Terminated
      Reason:       OOMKilled
      Exit Code:    137
      Started:      Mon, 07 Oct 2024 08:03:39 +0200
      Finished:     Mon, 07 Oct 2024 08:27:12 +0200
    Ready:          True
    Restart Count:  1
    Limits:
      cpu:     100m
      memory:  107374182400m
    Requests:
      cpu:     95m
      memory:  101994987520m
    Liveness:  http-get http://:7777/local/Status delay=60s timeout=5s period=10s #success=1 #failure=3
    Environment Variables from:
      <DB_NAME>-env-vars-pxc  Secret  Optional: true
    Environment:
      PMM_SERVER:                     <PMM_URL>
      CLIENT_PORT_LISTEN:             7777
      CLIENT_PORT_MIN:                30100
      CLIENT_PORT_MAX:                30105
      POD_NAME:                       <POD_NAME> (v1:metadata.name)
      POD_NAMESPASE:                  <NAMESPACE> (v1:metadata.namespace)
      PMM_AGENT_SERVER_ADDRESS:       <PMM_URL>
      PMM_AGENT_SERVER_USERNAME:      api_key
      PMM_AGENT_SERVER_PASSWORD:      <set to the key 'pmmserverkey' in secret 'internal-<DB_NAME>'>  Optional: false
      PMM_AGENT_LISTEN_PORT:          7777
      PMM_AGENT_PORTS_MIN:            30100
      PMM_AGENT_PORTS_MAX:            30105
      PMM_AGENT_CONFIG_FILE:          /usr/local/percona/pmm2/config/pmm-agent.yaml
      PMM_AGENT_SERVER_INSECURE_TLS:  1
      PMM_AGENT_LISTEN_ADDRESS:       0.0.0.0
      PMM_AGENT_SETUP_METRICS_MODE:   push
      PMM_AGENT_SETUP:                1
      PMM_AGENT_SETUP_FORCE:          1
      PMM_AGENT_SETUP_NODE_TYPE:      container
      PMM_AGENT_SETUP_NODE_NAME:      $(POD_NAMESPASE)-$(POD_NAME)
      DB_TYPE:                        mysql
      DB_USER:                        monitor
      DB_PASSWORD:                    <set to the key 'monitor' in secret 'internal-<DB_NAME>'>  Optional: false
      DB_ARGS:                        --query-source=perfschema
      DB_CLUSTER:                     pxc
      DB_HOST:                        localhost
      DB_PORT:                        33062
      CLUSTER_NAME:                   <CLUSTERNAME>
      PMM_ADMIN_CUSTOM_PARAMS:        
      PMM_AGENT_PRERUN_SCRIPT:        /var/lib/mysql/pmm-prerun.sh
      PMM_AGENT_SIDECAR:              true
      PMM_AGENT_SIDECAR_SLEEP:        5
      PMM_AGENT_PATHS_TEMPDIR:        /tmp
    Mounts:
      /var/lib/mysql from datadir (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-4p64t (ro)
  pxc:
    [...]
    Image:         percona/percona-xtradb-cluster:8.0.36-28.1
    Image ID:      docker.io/percona/percona-xtradb-cluster@sha256:b5cc4034ccfb0186d6a734cb749ae17f013b027e9e64746b2c876e8beef379b3
    Ports:         3306/TCP, 4444/TCP, 4567/TCP, 4568/TCP, 33062/TCP, 33060/TCP
    Host Ports:    0/TCP, 0/TCP, 0/TCP, 0/TCP, 0/TCP, 0/TCP
    Command:
      /var/lib/mysql/pxc-entrypoint.sh
    Args:
      mysqld
    State:          Running
      Started:      Mon, 07 Oct 2024 08:03:40 +0200
    Ready:          True
    Restart Count:  0
    Limits:
      cpu:     1500m
      memory:  5G
    Requests:
      cpu:      1500m
      memory:   5G
    Liveness:   exec [/var/lib/mysql/liveness-check.sh] delay=300s timeout=450s period=10s #success=1 #failure=3
    Readiness:  exec [/var/lib/mysql/readiness-check.sh] delay=15s timeout=450s period=30s #success=1 #failure=5
    Environment Variables from:
      <DB_NAME>-env-vars-pxc  Secret  Optional: true
    Environment:
      PXC_SERVICE:                    <DB_NAME>-pxc-unready
      MONITOR_HOST:                   %
      MYSQL_ROOT_PASSWORD:            <set to the key 'root' in secret 'internal-<DB_NAME>'>        Optional: false
      XTRABACKUP_PASSWORD:            <set to the key 'xtrabackup' in secret 'internal-<DB_NAME>'>  Optional: false
      MONITOR_PASSWORD:               <set to the key 'monitor' in secret 'internal-<DB_NAME>'>     Optional: false
      CLUSTER_HASH:                   1541532
      OPERATOR_ADMIN_PASSWORD:        <set to the key 'operator' in secret 'internal-<DB_NAME>'>  Optional: false
      LIVENESS_CHECK_TIMEOUT:         450
      READINESS_CHECK_TIMEOUT:        450
      DEFAULT_AUTHENTICATION_PLUGIN:  caching_sha2_password
    Mounts:
      /etc/my.cnf.d from auto-config (rw)
      /etc/mysql/init-file from mysql-init-file (rw)
      /etc/mysql/mysql-users-secret from mysql-users-secret-file (rw)
      /etc/mysql/ssl from ssl (rw)
      /etc/mysql/ssl-internal from ssl-internal (rw)
      /etc/mysql/vault-keyring-secret from vault-keyring-secret (rw)
      /etc/percona-xtradb-cluster.conf.d from config (rw)
      /tmp from tmp (rw)
      /var/lib/mysql from datadir (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-4p64t (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             True 
  ContainersReady   True 
  PodScheduled      True 

The pmm agent gets OOM’d when I try to import a table structure.

The RAM request also seems rather tight for the agent, doesn’t it?