Description:
I have installed the latest pmm helm chart V1.3.11, and I am having trouble reaching the pmm service from other namespaces, the pod seems to be runing fine but there is readiness error on the pod event log and I don’t know if this related to reachabilty issue.
Steps to Reproduce:
Install pmm helm chart on EKS cluster V1.29 using this values:
service:
type: ClusterIP
nodeSelector:
USAGE: MONITORING
Version:
Helm chart: 1.3.11
PMM image: 2.41.0
Logs:
pmm-agent log:
Failed to register pmm-agent on PMM Server: Post "https://monitoring-service.pmm.svc.cluster.local:443/v1/management/Node/Register": dial tcp 172.20.44.58:443: i/o timeou
pod event log:
Events: │
│ Type Reason Age From Message │
│ ---- ------ ---- ---- ------- │
│ Normal Scheduled 78s default-scheduler Successfully assigned pmm/pmm-0 to ip-10-0-5-134.eu-central-1.compute.internal │
│ Normal Pulled 68s kubelet Container image "percona/pmm-server:2.41.1" already present on machine │
│ Normal Created 68s kubelet Created container pmm │
│ Normal Started 68s kubelet Started container pmm │
│ Warning Unhealthy 67s kubelet Readiness probe failed: HTTP probe failed with statuscode: 500
curl inside pmm server on the rediness endpoint:
bash-5.1# curl -Iv http://127.0.0.1/v1/readyz
* Trying 127.0.0.1:80...
* Connected to 127.0.0.1 (127.0.0.1) port 80 (#0)
> HEAD /v1/readyz HTTP/1.1
> Host: 127.0.0.1
> User-Agent: curl/7.76.1
> Accept: */*
>
* Mark bundle as not supporting multiuse
< HTTP/1.1 501 Not Implemented
HTTP/1.1 501 Not Implemented
< Server: nginx
Server: nginx
< Date: Wed, 28 Feb 2024 19:45:46 GMT
Date: Wed, 28 Feb 2024 19:45:46 GMT
< Content-Type: application/json
Content-Type: application/json
< Content-Length: 87
Content-Length: 87
< Connection: keep-alive
Connection: keep-alive
< Strict-Transport-Security: max-age=63072000; includeSubdomains;
Strict-Transport-Security: max-age=63072000; includeSubdomains;
<
* Connection #0 to host 127.0.0.1 left intact
Expected Result:
PMm server should be reachble fron other namespaces
Actual Result:
pmm-agent can’t be registred
nurlan
February 28, 2024, 10:40pm
2
Hi @Abdelnasser_FRIED
Abdelnasser_FRIED:
HEAD /v1/readyz HTTP/1.1
HEAD HTTP method is not supported by PMM, please send GET request.
Could you share logs from /srv/logs/ directory on PMM Server pod?
Hello @nurlan and thank you for the reply,
Here is the pmm-managed.logs:
bash-5.1# tail -f /srv/logs/pmm-managed.log
time="2024-02-29T06:33:53.205+00:00" level=info msg="Starting RPC /server.Server/Readiness ..." request=81a8c8f0-d6cc-11ee-9478-922a5a178739
time="2024-02-29T06:33:53.207+00:00" level=info msg="RPC /server.Server/Readiness done in 1.702188ms." request=81a8c8f0-d6cc-11ee-9478-922a5a178739
time="2024-02-29T06:34:03.206+00:00" level=info msg="Starting RPC /server.Server/Readiness ..." request=879eb8e4-d6cc-11ee-9478-922a5a178739
time="2024-02-29T06:34:03.208+00:00" level=info msg="RPC /server.Server/Readiness done in 1.826162ms." request=879eb8e4-d6cc-11ee-9478-922a5a178739
time="2024-02-29T06:34:13.205+00:00" level=info msg="Starting RPC /server.Server/Readiness ..." request=8d948af2-d6cc-11ee-9478-922a5a178739
time="2024-02-29T06:34:13.207+00:00" level=info msg="RPC /server.Server/Readiness done in 2.054546ms." request=8d948af2-d6cc-11ee-9478-922a5a178739
time="2024-02-29T06:34:23.205+00:00" level=info msg="Starting RPC /server.Server/Readiness ..." request=938a633a-d6cc-11ee-9478-922a5a178739
time="2024-02-29T06:34:23.207+00:00" level=info msg="RPC /server.Server/Readiness done in 1.932654ms." request=938a633a-d6cc-11ee-9478-922a5a178739
time="2024-02-29T06:34:33.205+00:00" level=info msg="Starting RPC /server.Server/Readiness ..." request=99804568-d6cc-11ee-9478-922a5a178739
time="2024-02-29T06:34:33.207+00:00" level=info msg="RPC /server.Server/Readiness done in 1.887562ms." request=99804568-d6cc-11ee-9478-922a5a178739
time="2024-02-29T06:34:43.206+00:00" level=info msg="Starting RPC /server.Server/Readiness ..." request=9f763a29-d6cc-11ee-9478-922a5a178739
time="2024-02-29T06:34:43.208+00:00" level=info msg="RPC /server.Server/Readiness done in 2.27617ms." request=9f763a29-d6cc-11ee-9478-922a5a178739
Nginx logs:
10.0.5.134 - - [29/Feb/2024:06:32:23 +0000] "GET /v1/readyz HTTP/1.1" 200 2 "-" "kube-probe/1.29+" "-"
10.0.48.46 - - [29/Feb/2024:06:32:23 +0000] "GET / HTTP/1.1" 302 138 "-" "ELB-HealthChecker/2.0" "-"
10.0.50.71 - - [29/Feb/2024:06:32:23 +0000] "GET / HTTP/1.1" 302 138 "-" "ELB-HealthChecker/2.0" "-"
10.0.5.134 - - [29/Feb/2024:06:32:33 +0000] "GET /v1/readyz HTTP/1.1" 200 2 "-" "kube-probe/1.29+" "-"
10.0.49.102 - - [29/Feb/2024:06:32:37 +0000] "GET / HTTP/1.1" 302 138 "-" "ELB-HealthChecker/2.0" "-"
10.0.48.46 - - [29/Feb/2024:06:32:38 +0000] "GET / HTTP/1.1" 302 138 "-" "ELB-HealthChecker/2.0" "-"
10.0.50.71 - - [29/Feb/2024:06:32:38 +0000] "GET / HTTP/1.1" 302 138 "-" "ELB-HealthChecker/2.0" "-"
10.0.5.134 - - [29/Feb/2024:06:32:43 +0000] "GET /v1/readyz HTTP/1.1" 200 2 "-" "kube-probe/1.29+" "-"
10.0.49.102 - - [29/Feb/2024:06:32:52 +0000] "GET / HTTP/1.1" 302 138 "-" "ELB-HealthChecker/2.0" "-"
10.0.5.134 - - [29/Feb/2024:06:32:53 +0000] "GET /v1/readyz HTTP/1.1" 200 2 "-" "kube-probe/1.29+" "-"
10.0.48.46 - - [29/Feb/2024:06:32:53 +0000] "GET / HTTP/1.1" 302 138 "-" "ELB-HealthChecker/2.0" "-"
10.0.50.71 - - [29/Feb/2024:06:32:53 +0000] "GET / HTTP/1.1" 302 138 "-" "ELB-HealthChecker/2.0" "-"
10.0.5.134 - - [29/Feb/2024:06:33:03 +0000] "GET /v1/readyz HTTP/1.1" 200 2 "-" "kube-probe/1.29+" "-"
10.0.49.102 - - [29/Feb/2024:06:33:07 +0000] "GET / HTTP/1.1" 302 138 "-" "ELB-HealthChecker/2.0" "-"
10.0.48.46 - - [29/Feb/2024:06:33:08 +0000] "GET / HTTP/1.1" 302 138 "-" "ELB-HealthChecker/2.0" "-"
10.0.50.71 - - [29/Feb/2024:06:33:08 +0000] "GET / HTTP/1.1" 302 138 "-" "ELB-HealthChecker/2.0" "-"
10.0.5.134 - - [29/Feb/2024:06:33:13 +0000] "GET /v1/readyz HTTP/1.1" 200 2 "-" "kube-probe/1.29+" "-"
10.0.49.102 - - [29/Feb/2024:06:33:22 +0000] "GET / HTTP/1.1" 302 138 "-" "ELB-HealthChecker/2.0" "-"
10.0.5.134 - - [29/Feb/2024:06:33:23 +0000] "GET /v1/readyz HTTP/1.1" 200 2 "-" "kube-probe/1.29+" "-"
10.0.48.46 - - [29/Feb/2024:06:33:23 +0000] "GET / HTTP/1.1" 302 138 "-" "ELB-HealthChecker/2.0" "-"
10.0.50.71 - - [29/Feb/2024:06:33:23 +0000] "GET / HTTP/1.1" 302 138 "-" "ELB-HealthChecker/2.0" "-"
10.0.5.134 - - [29/Feb/2024:06:33:33 +0000] "GET /v1/readyz HTTP/1.1" 200 2 "-" "kube-probe/1.29+" "-"
10.0.49.102 - - [29/Feb/2024:06:33:37 +0000] "GET / HTTP/1.1" 302 138 "-" "ELB-HealthChecker/2.0" "-"
10.0.48.46 - - [29/Feb/2024:06:33:38 +0000] "GET / HTTP/1.1" 302 138 "-" "ELB-HealthChecker/2.0" "-"
10.0.50.71 - - [29/Feb/2024:06:33:39 +0000] "GET / HTTP/1.1" 302 138 "-" "ELB-HealthChecker/2.0" "-"
10.0.5.134 - - [29/Feb/2024:06:33:43 +0000] "GET /v1/readyz HTTP/1.1" 200 2 "-" "kube-probe/1.29+" "-"
10.0.49.102 - - [29/Feb/2024:06:33:52 +0000] "GET / HTTP/1.1" 302 138 "-" "ELB-HealthChecker/2.0" "-"
10.0.5.134 - - [29/Feb/2024:06:33:53 +0000] "GET /v1/readyz HTTP/1.1" 200 2 "-" "kube-probe/1.29+" "-"
10.0.48.46 - - [29/Feb/2024:06:33:53 +0000] "GET / HTTP/1.1" 302 138 "-" "ELB-HealthChecker/2.0" "-"
10.0.50.71 - - [29/Feb/2024:06:33:54 +0000] "GET / HTTP/1.1" 302 138 "-" "ELB-HealthChecker/2.0" "-"
10.0.5.134 - - [29/Feb/2024:06:34:03 +0000] "GET /v1/readyz HTTP/1.1" 200 2 "-" "kube-probe/1.29+" "-"
10.0.49.102 - - [29/Feb/2024:06:34:07 +0000] "GET / HTTP/1.1" 302 138 "-" "ELB-HealthChecker/2.0" "-"
10.0.48.46 - - [29/Feb/2024:06:34:08 +0000] "GET / HTTP/1.1" 302 138 "-" "ELB-HealthChecker/2.0" "-"
10.0.50.71 - - [29/Feb/2024:06:34:09 +0000] "GET / HTTP/1.1" 302 138 "-" "ELB-HealthChecker/2.0" "-"
10.0.5.134 - - [29/Feb/2024:06:34:13 +0000] "GET /v1/readyz HTTP/1.1" 200 2 "-" "kube-probe/1.29+" "-"
10.0.49.102 - - [29/Feb/2024:06:34:22 +0000] "GET / HTTP/1.1" 302 138 "-" "ELB-HealthChecker/2.0" "-"
10.0.5.134 - - [29/Feb/2024:06:34:23 +0000] "GET /v1/readyz HTTP/1.1" 200 2 "-" "kube-probe/1.29+" "-"
10.0.48.46 - - [29/Feb/2024:06:34:23 +0000] "GET / HTTP/1.1" 302 138 "-" "ELB-HealthChecker/2.0" "-"
10.0.50.71 - - [29/Feb/2024:06:34:24 +0000] "GET / HTTP/1.1" 302 138 "-" "ELB-HealthChecker/2.0" "-"
10.0.5.134 - - [29/Feb/2024:06:34:33 +0000] "GET /v1/readyz HTTP/1.1" 200 2 "-" "kube-probe/1.29+" "-"
is there specific service logs to show you, I think this is networking issue
nurlan
March 8, 2024, 3:37pm
4
as far as I see everything is fine from PMM Server side, probably you are right and it’s networking issue in K8s. Can you curl PMM Server from PMM Client?