Unable to setup PMM using Helm

Description:

Hello,
I am trying to setup PMM on my Kubernetes cluster using helm, but the pod is not moving into ready status.

Steps to Reproduce:

Followed the instructions here to setup:

https://docs.percona.com/percona-monitoring-and-management/setting-up/server/helm.html
helm install pmm \
--set secret.create=false --set secret.name=pmm-secret \
--set-string pmmEnv.DISABLE_UPDATES="1" \
--set service.type="LoadBalancer" \
percona/pmm

PMM 2.x # Version:

NAME    NAMESPACE       REVISION        UPDATED                                 STATUS          CHART           APP VERSION
pmm     default         1               2024-10-08 23:33:23.81268372 -0500 CDT  deployed        pmm-1.3.16      2.43.1

Logs:

NAME    READY   STATUS    RESTARTS   AGE
pmm-0   0/1     Running   0          51m

Events:
  Type     Reason     Age                   From               Message
  ----     ------     ----                  ----               -------
  Normal   Scheduled  52m                   default-scheduler  Successfully assigned default/pmm-0 to fdc00datah603l
  Normal   Pulled     51m                   kubelet            Container image "percona/pmm-server:2.43.1" already present on machine
  Normal   Created    51m                   kubelet            Created container pmm
  Normal   Started    51m                   kubelet            Started container pmm
  Warning  Unhealthy  51m                   kubelet            Readiness probe failed: Get "http://192.168.133.83:80/v1/readyz": dial tcp 192.168.133.83:80: connect: connection refused
  Warning  Unhealthy  114s (x641 over 51m)  kubelet            Readiness probe failed: HTTP probe failed with statuscode: 500

Looking at pod logs, it looked like the issue was due to postgres not starting:

2024-10-09 04:33:28,291 INFO success: pmm-update-perform-init entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2024-10-09 04:33:28,291 INFO success: clickhouse entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2024-10-09 04:33:28,291 INFO success: grafana entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2024-10-09 04:33:28,292 INFO success: nginx entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2024-10-09 04:33:28,292 INFO success: victoriametrics entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2024-10-09 04:33:28,292 INFO success: vmalert entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2024-10-09 04:33:28,292 INFO success: alertmanager entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2024-10-09 04:33:28,292 INFO success: vmproxy entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2024-10-09 04:33:28,304 INFO exited: qan-api2 (exit status 1; not expected)
2024-10-09 04:33:28,436 INFO success: pmm-managed entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2024-10-09 04:33:28,639 INFO spawned: 'postgresql' with pid 54
2024-10-09 04:33:28,697 INFO exited: postgresql (exit status 2; not expected)
2024-10-09 04:33:29,428 INFO spawned: 'qan-api2' with pid 59
2024-10-09 04:33:29,553 INFO exited: qan-api2 (exit status 1; not expected)

PV and PVC:

NAME                STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS          VOLUMEATTRIBUTESCLASS   AGE
pmm-storage-pmm-0   Bound    pvc-ad6786af-5fd4-4c6c-b86b-dc8294aea1d2   10Gi       RWO            managed-nfs-storage   <unset>                 57m

NAME                                       CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS   CLAIM                       STORAGECLASS          VOLUMEATTRIBUTESCLASS   REASON   AGE
pvc-ad6786af-5fd4-4c6c-b86b-dc8294aea1d2   10Gi       RWO            Delete           Bound    default/pmm-storage-pmm-0   managed-nfs-storage   <unset>                          58m

Any suggestions on what could be wrong here?

On checking the logs further I noticed this error in /srv/logs/postgresql14.log

postgres: could not access the server configuration file "/srv/postgres14/postgresql.conf": No such file or directory
postgres: could not access the server configuration file "/srv/postgres14/postgresql.conf": No such file or directory
postgres: could not access the server configuration file "/srv/postgres14/postgresql.conf": No such file or directory

On checking this location, I found that this postgres14 directory was empty

image

Any help is appreciated, thanks!

What version of Helm and K8s are you using?

Hello, I am on Kubernetes v1.29.4 and Helm v3.16.2

It seemed like issue was caused by a misconfigured NFS storage class. This is working now

1 Like