Description:
Hello,
I am trying to setup PMM on my Kubernetes cluster using helm, but the pod is not moving into ready status.
Steps to Reproduce:
Followed the instructions here to setup:
https://docs.percona.com/percona-monitoring-and-management/setting-up/server/helm.html
helm install pmm \
--set secret.create=false --set secret.name=pmm-secret \
--set-string pmmEnv.DISABLE_UPDATES="1" \
--set service.type="LoadBalancer" \
percona/pmm
PMM 2.x # Version:
NAME NAMESPACE REVISION UPDATED STATUS CHART APP VERSION
pmm default 1 2024-10-08 23:33:23.81268372 -0500 CDT deployed pmm-1.3.16 2.43.1
Logs:
NAME READY STATUS RESTARTS AGE
pmm-0 0/1 Running 0 51m
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 52m default-scheduler Successfully assigned default/pmm-0 to fdc00datah603l
Normal Pulled 51m kubelet Container image "percona/pmm-server:2.43.1" already present on machine
Normal Created 51m kubelet Created container pmm
Normal Started 51m kubelet Started container pmm
Warning Unhealthy 51m kubelet Readiness probe failed: Get "http://192.168.133.83:80/v1/readyz": dial tcp 192.168.133.83:80: connect: connection refused
Warning Unhealthy 114s (x641 over 51m) kubelet Readiness probe failed: HTTP probe failed with statuscode: 500
Looking at pod logs, it looked like the issue was due to postgres not starting:
2024-10-09 04:33:28,291 INFO success: pmm-update-perform-init entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2024-10-09 04:33:28,291 INFO success: clickhouse entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2024-10-09 04:33:28,291 INFO success: grafana entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2024-10-09 04:33:28,292 INFO success: nginx entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2024-10-09 04:33:28,292 INFO success: victoriametrics entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2024-10-09 04:33:28,292 INFO success: vmalert entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2024-10-09 04:33:28,292 INFO success: alertmanager entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2024-10-09 04:33:28,292 INFO success: vmproxy entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2024-10-09 04:33:28,304 INFO exited: qan-api2 (exit status 1; not expected)
2024-10-09 04:33:28,436 INFO success: pmm-managed entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2024-10-09 04:33:28,639 INFO spawned: 'postgresql' with pid 54
2024-10-09 04:33:28,697 INFO exited: postgresql (exit status 2; not expected)
2024-10-09 04:33:29,428 INFO spawned: 'qan-api2' with pid 59
2024-10-09 04:33:29,553 INFO exited: qan-api2 (exit status 1; not expected)
PV and PVC:
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS VOLUMEATTRIBUTESCLASS AGE
pmm-storage-pmm-0 Bound pvc-ad6786af-5fd4-4c6c-b86b-dc8294aea1d2 10Gi RWO managed-nfs-storage <unset> 57m
NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS VOLUMEATTRIBUTESCLASS REASON AGE
pvc-ad6786af-5fd4-4c6c-b86b-dc8294aea1d2 10Gi RWO Delete Bound default/pmm-storage-pmm-0 managed-nfs-storage <unset> 58m
Any suggestions on what could be wrong here?