High memory usage of PostgreSQL leader pod

Description:

Hello !

We are using the Percona PostgreSQL operator to instanciate databases in a Kubernetes cluster.

Lately, we have received alerts about the leader pod being evicted because of memory pressure on a node.

After some investigations, we have found the memory usage of the pod was steadily increasing (and conversely, the node available memory was decreasing), leading to this eviction :

How can we investigate this issue ?

Steps to Reproduce:

Create a 3 nodes cluster using CRD `perconapgclusters.pgv2.percona.com` :

apiVersion: pgv2.percona.com/v2
kind: PerconaPGCluster
metadata:
  annotations:
    argocd.argoproj.io/tracking-id: artifactory:pgv2.percona.com/PerconaPGCluster:artifactory/artifactory-pg-db
    current-primary: artifactory-pg-db
    freelens.app/resource-version: v2
    pgv2.percona.com/patroni-version: 4.1.0
    postgres-operator.crunchydata.com/trigger-switchover: Tue Mar 31 10:19:18 AM CEST
      2026
  labels:
    app.kubernetes.io/instance: artifactory
    app.kubernetes.io/managed-by: Helm
    app.kubernetes.io/name: pg-db
    app.kubernetes.io/version: 2.8.2
    argocd.argoproj.io/instance: artifactory
    crunchy-pgha-scope: artifactory-pg-db
    deployment-name: artifactory-pg-db
    helm.sh/chart: pg-db-2.8.2
    name: artifactory-pg-db
    pg-cluster: artifactory-pg-db
    pgo-version: 2.8.2
    pgouser: admin
  name: artifactory-pg-db
  namespace: artifactory
spec:
  backups:
    enabled: true
    pgbackrest:
      global:
        archive-push-queue-max: 5G
        repo1-retention-full: "1"
        repo1-retention-full-type: count
      image: docker.io/percona/percona-pgbackrest:2.57.0-1
      manual:
        options:
        - --type=full
        - --annotation="percona.com/backup-name"="artifactory-pg-db-repo1-full-pknd9"
        repoName: repo1
      metadata:
        labels:
          pgv2.percona.com/version: 2.5.0
      repoHost:
        affinity:
          podAntiAffinity:
            preferredDuringSchedulingIgnoredDuringExecution:
            - podAffinityTerm:
                labelSelector:
                  matchLabels:
                    postgres-operator.crunchydata.com/data: pgbackrest
                topologyKey: kubernetes.io/hostname
              weight: 1
        priorityClassName: low
      repos:
      - name: repo1
        schedules:
          full: 0 0 * * *
        volume:
          volumeClaimSpec:
            accessModes:
            - ReadWriteOnce
            resources:
              requests:
                storage: 500Gi
            storageClassName: portworx-pso-fb-v3
    trackLatestRestorableTime: true
  crVersion: 2.8.2
  extensions:
    builtin:
      pg_audit: true
      pg_stat_monitor: true
  image: docker.io/percona/percona-distribution-postgresql:16.11-2
  imagePullPolicy: Always
  instances:
  - affinity:
      podAntiAffinity:
        preferredDuringSchedulingIgnoredDuringExecution:
        - podAffinityTerm:
            labelSelector:
              matchLabels:
                postgres-operator.crunchydata.com/data: postgres
                postgres-operator.crunchydata.com/instance-set: jfrog-platform
            topologyKey: kubernetes.io/hostname
          weight: 1
    dataVolumeClaimSpec:
      accessModes:
      - ReadWriteOnce
      resources:
        requests:
          storage: 500Gi
      storageClassName: portworx-pso-fb-v3
    metadata:
      labels:
        pgv2.percona.com/version: 2.5.0
    name: jfrog-platform
    replicas: 3
    walVolumeClaimSpec:
      accessModes:
      - ReadWriteOnce
      resources:
        requests:
          storage: 800Gi
      storageClassName: portworx-pso-fb-v3
  patroni:
    dynamicConfiguration:
      postgresql:
        parameters:
          max_connections: 200
        pg_hba:
        - local   all all trust
        - host    all all 10.10.0.0/16 md5
    leaderLeaseDurationSeconds: 30
    port: 8008
    switchover:
      enabled: true
      targetInstance: artifactory-pg-db-jfrog-platform-sgtv
      type: Switchover
    syncPeriodSeconds: 10
  pause: false
  pmm:
    enabled: false
    image: docker.io/percona/pmm-client:3.4.1
    querySource: pgstatmonitor
    secret: artifactory-pg-db-pmm-secret
    serverHost: monitoring-service
  port: 5432
  postgresVersion: 16
  proxy:
    pgBouncer:
      affinity:
        podAntiAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
          - podAffinityTerm:
              labelSelector:
                matchLabels:
                  postgres-operator.crunchydata.com/cluster: artifactory-pg-db
                  postgres-operator.crunchydata.com/role: pgbouncer
              topologyKey: kubernetes.io/hostname
            weight: 1
      exposeSuperusers: true
      image: docker.io/percona/percona-pgbouncer:1.25.0-1
      metadata:
        labels:
          pgv2.percona.com/version: 2.5.0
      port: 5432
      replicas: 3
  standby:
    enabled: false
  unmanaged: false
  users:
  - databases:
    - artifactory
    name: artifactory
    options: SUPERUSER
    secretName: artifactory-db-secret
  - databases:
    - xray
    name: xray
    options: SUPERUSER
    secretName: xray-db-secret

Version:

Kubernetes 1.34.3

Percona PostgreSQL operator 2.8.2

Percona distribution PostgreSQL 16.11-2

Logs:

n/a

Expected Result:

The database memory usage should not increase over time

Actual Result:

The database memory usage increased to the point the pod got evicted

Additional Information:

n/a

@an-toine I assume your workload patterns didn’t change much? I’ll deploy a cluster with a similar configuration to yours to see if I observe the same pattern.

Just remembered that pg_stat_monitor might create a significant memory overhead. Have you tried disabling pg_stat_monitor to see if it helps?

There was no recent workload change that could explain this new behavior.
We have not updated the client application and the usage remained stable across time.

I confirm extension `pg_stat_monitor` is enabled, we could try disabling it to check if situation improves :

➜  ~ k get perconapgclusters.pgv2.percona.com -o yaml artifactory-pg-db | grep -B 5 pg_stat_monitor
    trackLatestRestorableTime: true
  crVersion: 2.8.2
  extensions:
    builtin:
      pg_audit: true
      pg_stat_monitor: true

Antoine

@an-toine did you have the chance to disable it? any updates?

To investigate further, we chose to work on enabling PMM for this instance : will disabling this module impact PMM collected data ?

As an aside, I saw we were not setting limits memory and cpu requests/limits for this instance, and I was wondering if it could impact the memory usage in any way ?