The server is not reachble fron other namespaces, pmm-agent getting i/o time out

Abdelnasser_FRIED · February 28, 2024, 7:48pm

Description:

I have installed the latest pmm helm chart V1.3.11, and I am having trouble reaching the pmm service from other namespaces, the pod seems to be runing fine but there is readiness error on the pod event log and I don’t know if this related to reachabilty issue.

Steps to Reproduce:

Install pmm helm chart on EKS cluster V1.29 using this values:

service:
  type: ClusterIP
nodeSelector:
  USAGE: MONITORING

Version:

Helm chart: 1.3.11
PMM image: 2.41.0

Logs:

pmm-agent log:
Failed to register pmm-agent on PMM Server: Post "https://monitoring-service.pmm.svc.cluster.local:443/v1/management/Node/Register": dial tcp 172.20.44.58:443: i/o timeou

pod event log:

Events:                                                                                                                                                                    │
│   Type     Reason     Age   From               Message                                                                                                                     │
│   ----     ------     ----  ----               -------                                                                                                                     │
│   Normal   Scheduled  78s   default-scheduler  Successfully assigned pmm/pmm-0 to ip-10-0-5-134.eu-central-1.compute.internal                                              │
│   Normal   Pulled     68s   kubelet            Container image "percona/pmm-server:2.41.1" already present on machine                                                      │
│   Normal   Created    68s   kubelet            Created container pmm                                                                                                       │
│   Normal   Started    68s   kubelet            Started container pmm                                                                                                       │
│   Warning  Unhealthy  67s   kubelet            Readiness probe failed: HTTP probe failed with statuscode: 500

curl inside pmm server on the rediness endpoint:

bash-5.1# curl -Iv http://127.0.0.1/v1/readyz
*   Trying 127.0.0.1:80...
* Connected to 127.0.0.1 (127.0.0.1) port 80 (#0)
> HEAD /v1/readyz HTTP/1.1
> Host: 127.0.0.1
> User-Agent: curl/7.76.1
> Accept: */*
>
* Mark bundle as not supporting multiuse
< HTTP/1.1 501 Not Implemented
HTTP/1.1 501 Not Implemented
< Server: nginx
Server: nginx
< Date: Wed, 28 Feb 2024 19:45:46 GMT
Date: Wed, 28 Feb 2024 19:45:46 GMT
< Content-Type: application/json
Content-Type: application/json
< Content-Length: 87
Content-Length: 87
< Connection: keep-alive
Connection: keep-alive
< Strict-Transport-Security: max-age=63072000; includeSubdomains;
Strict-Transport-Security: max-age=63072000; includeSubdomains;

<
* Connection #0 to host 127.0.0.1 left intact

Expected Result:

PMm server should be reachble fron other namespaces

Actual Result:

pmm-agent can’t be registred

nurlan · February 28, 2024, 10:40pm

Hi @Abdelnasser_FRIED

HEAD HTTP method is not supported by PMM, please send GET request.
Could you share logs from /srv/logs/ directory on PMM Server pod?

Abdelnasser_FRIED · February 29, 2024, 6:38am

Hello @nurlan and thank you for the reply,

Here is the pmm-managed.logs:

bash-5.1# tail -f /srv/logs/pmm-managed.log
time="2024-02-29T06:33:53.205+00:00" level=info msg="Starting RPC /server.Server/Readiness ..." request=81a8c8f0-d6cc-11ee-9478-922a5a178739
time="2024-02-29T06:33:53.207+00:00" level=info msg="RPC /server.Server/Readiness done in 1.702188ms." request=81a8c8f0-d6cc-11ee-9478-922a5a178739
time="2024-02-29T06:34:03.206+00:00" level=info msg="Starting RPC /server.Server/Readiness ..." request=879eb8e4-d6cc-11ee-9478-922a5a178739
time="2024-02-29T06:34:03.208+00:00" level=info msg="RPC /server.Server/Readiness done in 1.826162ms." request=879eb8e4-d6cc-11ee-9478-922a5a178739
time="2024-02-29T06:34:13.205+00:00" level=info msg="Starting RPC /server.Server/Readiness ..." request=8d948af2-d6cc-11ee-9478-922a5a178739
time="2024-02-29T06:34:13.207+00:00" level=info msg="RPC /server.Server/Readiness done in 2.054546ms." request=8d948af2-d6cc-11ee-9478-922a5a178739
time="2024-02-29T06:34:23.205+00:00" level=info msg="Starting RPC /server.Server/Readiness ..." request=938a633a-d6cc-11ee-9478-922a5a178739
time="2024-02-29T06:34:23.207+00:00" level=info msg="RPC /server.Server/Readiness done in 1.932654ms." request=938a633a-d6cc-11ee-9478-922a5a178739
time="2024-02-29T06:34:33.205+00:00" level=info msg="Starting RPC /server.Server/Readiness ..." request=99804568-d6cc-11ee-9478-922a5a178739
time="2024-02-29T06:34:33.207+00:00" level=info msg="RPC /server.Server/Readiness done in 1.887562ms." request=99804568-d6cc-11ee-9478-922a5a178739
time="2024-02-29T06:34:43.206+00:00" level=info msg="Starting RPC /server.Server/Readiness ..." request=9f763a29-d6cc-11ee-9478-922a5a178739
time="2024-02-29T06:34:43.208+00:00" level=info msg="RPC /server.Server/Readiness done in 2.27617ms." request=9f763a29-d6cc-11ee-9478-922a5a178739

Nginx logs:

10.0.5.134 - - [29/Feb/2024:06:32:23 +0000] "GET /v1/readyz HTTP/1.1" 200 2 "-" "kube-probe/1.29+" "-"
10.0.48.46 - - [29/Feb/2024:06:32:23 +0000] "GET / HTTP/1.1" 302 138 "-" "ELB-HealthChecker/2.0" "-"
10.0.50.71 - - [29/Feb/2024:06:32:23 +0000] "GET / HTTP/1.1" 302 138 "-" "ELB-HealthChecker/2.0" "-"
10.0.5.134 - - [29/Feb/2024:06:32:33 +0000] "GET /v1/readyz HTTP/1.1" 200 2 "-" "kube-probe/1.29+" "-"
10.0.49.102 - - [29/Feb/2024:06:32:37 +0000] "GET / HTTP/1.1" 302 138 "-" "ELB-HealthChecker/2.0" "-"
10.0.48.46 - - [29/Feb/2024:06:32:38 +0000] "GET / HTTP/1.1" 302 138 "-" "ELB-HealthChecker/2.0" "-"
10.0.50.71 - - [29/Feb/2024:06:32:38 +0000] "GET / HTTP/1.1" 302 138 "-" "ELB-HealthChecker/2.0" "-"
10.0.5.134 - - [29/Feb/2024:06:32:43 +0000] "GET /v1/readyz HTTP/1.1" 200 2 "-" "kube-probe/1.29+" "-"
10.0.49.102 - - [29/Feb/2024:06:32:52 +0000] "GET / HTTP/1.1" 302 138 "-" "ELB-HealthChecker/2.0" "-"
10.0.5.134 - - [29/Feb/2024:06:32:53 +0000] "GET /v1/readyz HTTP/1.1" 200 2 "-" "kube-probe/1.29+" "-"
10.0.48.46 - - [29/Feb/2024:06:32:53 +0000] "GET / HTTP/1.1" 302 138 "-" "ELB-HealthChecker/2.0" "-"
10.0.50.71 - - [29/Feb/2024:06:32:53 +0000] "GET / HTTP/1.1" 302 138 "-" "ELB-HealthChecker/2.0" "-"
10.0.5.134 - - [29/Feb/2024:06:33:03 +0000] "GET /v1/readyz HTTP/1.1" 200 2 "-" "kube-probe/1.29+" "-"
10.0.49.102 - - [29/Feb/2024:06:33:07 +0000] "GET / HTTP/1.1" 302 138 "-" "ELB-HealthChecker/2.0" "-"
10.0.48.46 - - [29/Feb/2024:06:33:08 +0000] "GET / HTTP/1.1" 302 138 "-" "ELB-HealthChecker/2.0" "-"
10.0.50.71 - - [29/Feb/2024:06:33:08 +0000] "GET / HTTP/1.1" 302 138 "-" "ELB-HealthChecker/2.0" "-"
10.0.5.134 - - [29/Feb/2024:06:33:13 +0000] "GET /v1/readyz HTTP/1.1" 200 2 "-" "kube-probe/1.29+" "-"
10.0.49.102 - - [29/Feb/2024:06:33:22 +0000] "GET / HTTP/1.1" 302 138 "-" "ELB-HealthChecker/2.0" "-"
10.0.5.134 - - [29/Feb/2024:06:33:23 +0000] "GET /v1/readyz HTTP/1.1" 200 2 "-" "kube-probe/1.29+" "-"
10.0.48.46 - - [29/Feb/2024:06:33:23 +0000] "GET / HTTP/1.1" 302 138 "-" "ELB-HealthChecker/2.0" "-"
10.0.50.71 - - [29/Feb/2024:06:33:23 +0000] "GET / HTTP/1.1" 302 138 "-" "ELB-HealthChecker/2.0" "-"
10.0.5.134 - - [29/Feb/2024:06:33:33 +0000] "GET /v1/readyz HTTP/1.1" 200 2 "-" "kube-probe/1.29+" "-"
10.0.49.102 - - [29/Feb/2024:06:33:37 +0000] "GET / HTTP/1.1" 302 138 "-" "ELB-HealthChecker/2.0" "-"
10.0.48.46 - - [29/Feb/2024:06:33:38 +0000] "GET / HTTP/1.1" 302 138 "-" "ELB-HealthChecker/2.0" "-"
10.0.50.71 - - [29/Feb/2024:06:33:39 +0000] "GET / HTTP/1.1" 302 138 "-" "ELB-HealthChecker/2.0" "-"
10.0.5.134 - - [29/Feb/2024:06:33:43 +0000] "GET /v1/readyz HTTP/1.1" 200 2 "-" "kube-probe/1.29+" "-"
10.0.49.102 - - [29/Feb/2024:06:33:52 +0000] "GET / HTTP/1.1" 302 138 "-" "ELB-HealthChecker/2.0" "-"
10.0.5.134 - - [29/Feb/2024:06:33:53 +0000] "GET /v1/readyz HTTP/1.1" 200 2 "-" "kube-probe/1.29+" "-"
10.0.48.46 - - [29/Feb/2024:06:33:53 +0000] "GET / HTTP/1.1" 302 138 "-" "ELB-HealthChecker/2.0" "-"
10.0.50.71 - - [29/Feb/2024:06:33:54 +0000] "GET / HTTP/1.1" 302 138 "-" "ELB-HealthChecker/2.0" "-"
10.0.5.134 - - [29/Feb/2024:06:34:03 +0000] "GET /v1/readyz HTTP/1.1" 200 2 "-" "kube-probe/1.29+" "-"
10.0.49.102 - - [29/Feb/2024:06:34:07 +0000] "GET / HTTP/1.1" 302 138 "-" "ELB-HealthChecker/2.0" "-"
10.0.48.46 - - [29/Feb/2024:06:34:08 +0000] "GET / HTTP/1.1" 302 138 "-" "ELB-HealthChecker/2.0" "-"
10.0.50.71 - - [29/Feb/2024:06:34:09 +0000] "GET / HTTP/1.1" 302 138 "-" "ELB-HealthChecker/2.0" "-"
10.0.5.134 - - [29/Feb/2024:06:34:13 +0000] "GET /v1/readyz HTTP/1.1" 200 2 "-" "kube-probe/1.29+" "-"
10.0.49.102 - - [29/Feb/2024:06:34:22 +0000] "GET / HTTP/1.1" 302 138 "-" "ELB-HealthChecker/2.0" "-"
10.0.5.134 - - [29/Feb/2024:06:34:23 +0000] "GET /v1/readyz HTTP/1.1" 200 2 "-" "kube-probe/1.29+" "-"
10.0.48.46 - - [29/Feb/2024:06:34:23 +0000] "GET / HTTP/1.1" 302 138 "-" "ELB-HealthChecker/2.0" "-"
10.0.50.71 - - [29/Feb/2024:06:34:24 +0000] "GET / HTTP/1.1" 302 138 "-" "ELB-HealthChecker/2.0" "-"
10.0.5.134 - - [29/Feb/2024:06:34:33 +0000] "GET /v1/readyz HTTP/1.1" 200 2 "-" "kube-probe/1.29+" "-"

is there specific service logs to show you, I think this is networking issue

nurlan · March 8, 2024, 3:37pm

as far as I see everything is fine from PMM Server side, probably you are right and it’s networking issue in K8s. Can you curl PMM Server from PMM Client?

Topic		Replies	Views
Unable to setup PMM using Helm PMM 2.x pmm	4	137	October 21, 2024
Failed to register pmm-agent on PMM Server PMM 2.x	3	1168	March 31, 2022
Failed to register pmm-agent on PMM Server - no such host Percona Monitoring and Management (PMM) pmm , percona , closed-no-reply	0	972	March 31, 2022
PMM 2.22.0 in k8s PMM 2.x	5	1010	October 6, 2021
INSTALLATION FAILED: failed to download "percona/pmm-server" Percona Operator for MongoDB	2	660	February 28, 2022