Not the answer you need?
Register and ask your own question!

Agent - Server connectivity

GhanGhan EntrantCurrent User Role Contributor
Hello all,
at PMM v2, I know pmm-admin check-network removed and list, status commands replaced behalf of it. But I have connectivity problem, I can successfully add an agent to the server but can not get any statistics about my agent such as CPU, RAM etc. I'm sure there are some network problems and I can see at pmm-admin list command. I have to define problem to be able to solve it. How can it find more details about network communication between agent-server? I have 2 firewalls between server and agent, but I need clues to work with my network engineer.

Here are the logs of pmm-admin list and pmm-admin status --debug
[[email protected] log]# pmm-admin list
Post https://pmmserver:443/v1/inventory/Agents/List: read tcp pmmagent:49816->pmmserver:443: read: connection reset by peer


[[email protected] log]# pmm-admin status --debug
DEBUG 2020-04-09 07:58:57.212962979Z: POST /local/Status HTTP/1.1
Host: 127.0.0.1:7777
User-Agent: Go-http-client/1.1
Content-Length: 3
Accept: application/json
Content-Type: application/json
Accept-Encoding: gzip

{}

DEBUG 2020-04-09 07:58:57.216275012Z: HTTP/1.1 200 OK
Content-Length: 316
Content-Type: application/json
Date: Thu, 09 Apr 2020 07:58:57 GMT
Grpc-Metadata-Content-Type: application/grpc

{"agent_id":"/agent_id/049f902e-8bf4-44c7-9774-7968f31dc8da","runs_on_node_id":"/node_id/fa42848f-6b6d-453d-95a4-9b087c08e1b5","server_info":{"url":"https://admin:[email protected]:443/","insecure_tls":true,"connected":true,"version":"2.3.0"},"config_filepath":"/usr/local/percona/pmm2/config/pmm-agent.yaml"}
DEBUG 2020-04-09 07:58:57.217122422Z: POST /local/Status HTTP/1.1
Host: 127.0.0.1:7777
User-Agent: Go-http-client/1.1
Content-Length: 26
Accept: application/json
Content-Type: application/json
Accept-Encoding: gzip

{"get_network_info":true}

DEBUG 2020-04-09 08:04:57.722574234Z: HTTP/1.1 503 Service Unavailable
Connection: close
Content-Length: 75
Content-Type: application/json
Date: Thu, 09 Apr 2020 08:04:57 GMT

{"error":"transport is closing","code":14,"message":"transport is closing"}
DEBUG 2020-04-09 08:04:57.72281479Z: Result: <nil>
DEBUG 2020-04-09 08:04:57.72289094Z: Error: &agent_local.StatusDefault{_statusCode:503, Payload:(*agent_local.StatusDefaultBody)(0xc0000f8000)}
transport is closing


Here is message log
Apr  9 11:04:57 linuxmachine pmm-agent: #033[31mERRO#033[0m[2020-04-09T11:04:57.720+03:00] Can't get network info: failed to receive message: rpc error: code = Unavailable desc = transport is closing  #033[31mcomponent#033[0m=local-server
Apr  9 11:04:57 linuxmachine pmm-agent: #033[36mINFO#033[0m[2020-04-09T11:04:57.720+03:00] Done.                                         #033[36mcomponent#033[0m=actions-runner
Apr  9 11:04:57 linuxmachine pmm-agent: #033[36mINFO#033[0m[2020-04-09T11:04:57.720+03:00] Stopped.                                      #033[36mcomponent#033[0m=local-server/JSON
Apr  9 11:04:57 linuxmachine pmm-agent: #033[36mINFO#033[0m[2020-04-09T11:04:57.720+03:00] Done.                                         #033[36mcomponent#033[0m=supervisor
Apr  9 11:04:57 linuxmachine pmm-agent: #033[36mINFO#033[0m[2020-04-09T11:04:57.721+03:00] Done.                                         #033[36mcomponent#033[0m=client
Apr  9 11:04:58 linuxmachine pmm-agent: #033[36mINFO#033[0m[2020-04-09T11:04:58.221+03:00] Done.                                         #033[36mcomponent#033[0m=local-server
Apr  9 11:04:58 linuxmachine pmm-agent: #033[36mINFO#033[0m[2020-04-09T11:04:58.221+03:00] Starting...                                   #033[36mcomponent#033[0m=client
Apr  9 11:04:58 linuxmachine pmm-agent: #033[36mINFO#033[0m[2020-04-09T11:04:58.221+03:00] Connecting to https://admin:***@pmmserver:443/ ...  #033[36mcomponent#033[0m=client
Apr  9 11:04:58 linuxmachine pmm-agent: #033[36mINFO#033[0m[2020-04-09T11:04:58.221+03:00] Starting local API server on http://127.0.0.1:7777/ ...  #033[36mcomponent#033[0m=local-server/JSON
Apr  9 11:04:58 linuxmachine pmm-agent: #033[36mINFO#033[0m[2020-04-09T11:04:58.224+03:00] Started.                                      #033[36mcomponent#033[0m=local-server/JSON
Apr  9 11:04:58 linuxmachine pmm-agent: #033[36mINFO#033[0m[2020-04-09T11:04:58.235+03:00] Connected to pmmserver:443.                #033[36mcomponent#033[0m=client
Apr  9 11:04:58 linuxmachine pmm-agent: #033[36mINFO#033[0m[2020-04-09T11:04:58.235+03:00] Establishing two-way communication channel ...  #033[36mcomponent#033[0m=client
Apr  9 11:04:58 linuxmachine pmm-agent: #033[36mINFO#033[0m[2020-04-09T11:04:58.241+03:00] Two-way communication channel established in 6.14482ms. Estimated clock drift: -1.494974ms.  #033[36mcomponent#033[0m=client

Tagged:

Answers

  • steve.hoffmansteve.hoffman Percona Percona Staff Role
    Ok...there's actually several communication paths you need to be aware of to get it all working.  I'll do my best to list them and then give a few things to look at to get it resolved. 
    First is the API address on the client which binds to localhost on port 7777.  This allows pmm-agent to talk to pmm-managed.  
    Second is the normal client --> server communication...it's defaulted to https and works over whatever port you set up your pmm-server container to run on (typically 443).  This is also the same communication channel that QAN works over.  
    Third is the Exporters which are server --> client. They run locally on your client side and typically bind to ports 4200x (where x can be 2,4,6,8...depending on the number of exporters you run on a single machine.  In the case of linux with mysql running you'd likely get the linux server exporter bound to 42002 and mysql exporter on 42004. 

    Here's a few diagrams that will help illustrate what I talked through up above.  

    To get it all up and running you'll need to make sure all of your firewalls allow communication initiated in the right direction (so 443 from client to server and 42002 and 42004 from server to client).  A huge help here can be looking at the prometheus targets page (https://pmmserver/prometheus/targets which will show timeout messages and the like (I expect you'll see many errors there)...most of the time it's a matter of opening up the firewall and enjoying the stream of data but there are also the cases where you do the initial registration of client to server and we attempt to detect the right interface but some more complex system setups involve multiple Network adapaters and we incorrectly register an unroutable IP to the PMM server so when we attempt to retrieve information from the exporter the pmm-server believes it should contact 10.0.0.3:42002 but that's a private unroutable interface that the PMM server couldn't talk to even if firewall rules were in place.  In this instance you'll need to unregister and reregister the client and pass the node-address parameter.  

    From what you described above, I'm going to guess that QAN works but if you didn't add the mysql or postgres exporter there may be nothing to see there, I think the issue will turn out to be opening up ports 4200x from the server to the client in your firewall(s) and you'll see the errors on the prometheus targets page magically disappear and within a few minutes the scrapes will populate the graphs! 
  • GhanGhan Entrant Current User Role Contributor
    Hi Steve!
    Thank you for this detailed reply. I actually find the problem. Even every port are open between pmmserver-agent, I still got this problem. It is because of MTU package size ?!?!? At my inventory, MTU sizes are 9000 (higher than usual because of project plan) Agent can query the server and register itself without any problem. But when server try to query the agent for exporters, the package is getting bigger and the network layer won't allow it to pass. So we changed that setting via network team and now everything is working :)
Sign In or Register to comment.

MySQL, InnoDB, MariaDB and MongoDB are trademarks of their respective owners.
Copyright ©2005 - 2020 Percona LLC. All rights reserved.