Pmm-client container OOMKilled due to invalid memory limit generated by everest-1.7.0

Description:
When deploying a Percona XtraDB Cluster using everest-1.7.0 with pmm-client monitoring enabled, the pmm-client container is OOMKilled (Exit Code 137) within about one minute of startup. The generated Pod spec sets an unrealistically large memory limit (214643507200m, i.e., 214 TB), which leads to container termination due to memory constraints on the actual node.


:magnifying_glass_tilted_left: Steps to Reproduce

  1. Use everest-1.7.0 to deploy a PXC cluster with pmm-client enabled.
  2. Wait for pods to start.
  3. Observe that the pxc-0 pod’s pmm-client container exits with code 137.

:page_facing_up: Observed Pod Spec (Fragment)

yaml

CopyEdit

resources:
  limits:
    cpu: 240m
    memory: 214643507200m
  requests:
    cpu: 228m
    memory: 199168Ki

:cross_mark: Actual Behavior

  • pmm-client container is terminated with Exit Code 137 (OOMKilled).
  • Node lacks sufficient memory to honor the inflated limit, triggering kernel OOM killer.

:white_check_mark: Expected Behavior

  • Reasonable and consistent resource limits and requests for the pmm-client, such as:

yaml

CopyEdit

resources:
  limits:
    cpu: 500m
    memory: 512Mi
  requests:
    cpu: 250m
    memory: 256Mi

:hammer_and_wrench: Environment

  • Everest version: 1.7.0
  • PMM client image: percona/pmm-client:2

Here is everest spec:

apiVersion: everest.percona.com/v1alpha1
kind: DatabaseCluster
metadata:
name: db-cluster
namespace: prod-namespace
labels:
clusterName: db-cluster
monitoringConfigName: pmm
podSchedulingPolicyName: default-mysql-policy
finalizers:
- everest.percona.com/upstream-cluster-cleanup
- foregroundDeletion
spec:
engine:
type: pxc
version: 8.0.39-30.1
replicas: 3
resources:
cpu: “6”
memory: 16G
storage:
class: local-storage-class
size: 400Gi
userSecretsName: db-secrets
config: |
[mysqld]
max_connections = 802
innodb_buffer_pool_size = 5512928528
innodb_flush_log_at_trx_commit = 2

proxy:
type: haproxy
replicas: 3
resources:
cpu: 500m
memory: 200M
expose:
type: external
ipSourceRanges:
- 10.0.0.0/32
monitoring:
monitoringConfigName: pmm
resources: {}
backup:
pitr:
enabled: false
schedules:
- name: daily-backup
enabled: true
schedule: 30 22 * * *
retentionCopies: 2
backupStorageName: backup-storage
podSchedulingPolicyName: default-mysql-policy

Hi @pstekunov, thanks for reporting this. We already had this reported over on github: Constant OOM of core Everest components on large-scale nodes. · Issue #1274 · percona/everest · GitHub and we created a Jira ticket for internal tracking of this issue.
The fix will be released with Everest v1.8.0 later this week.