We have installed the latest pmm in one of the MySQL servers, everything was fine but not able to get the graphs. I have checked the pmm-admin list which shows its okay but the pmm-admin check-network shows the status DOWN.
Here are the outputs, could you please me out.
[09:13:44 root@prod-mysql-01.fra1 ~]$ pmm-admin list pmm-admin 1.17.3
System Time NTP Server (0.pool.ntp.org) | unable to get ntp time: %!s(<nil>) PMM Server | 2020-04-06 13:14:02 +0000 GMT PMM Client | 2020-04-06 09:14:16 -0400 EDT PMM Client to PMM Server Time Drift | OK
Connection: Client –> Server -------------------- ------- SERVER SERVICE STATUS -------------------- ------- Consul API OK Prometheus API OK Query Analytics API DOWN
Connection: Client <– Server -------------- ------------------------------------- ----------------- ------- ---------- --------- SERVICE TYPE NAME REMOTE ENDPOINT STATUS HTTPS/TLS PASSWORD -------------- ------------------------------------- ----------------- ------- ---------- --------- linux:metrics prod-mysql-01.fra1.admarketplace.net10.69.1.10:42000 DOWN YES YES mysql:metrics prod-mysql-01.fra1.admarketplace.net10.69.1.10:42002 DOWN YES YES
When an endpoint is down it may indicate that the corresponding service is stopped (run ‘pmm-admin list’ to verify). If it’s running, check out the logs /var/log/pmm-*.log
When all endpoints are down but ‘pmm-admin list’ shows they are up and no errors in the logs, check the firewall settings whether this system allows incoming connections from server to address:port in question.
Is it possible the server you’re monitoring has multiple interfaces, one of which the PMM server cannot connect to for routing reasons (mgmt interface or something)? When you register your pmm-client to the pmm-server we make a guess about which node-address is the right one to use and sometimes get it wrong so the PMM server tries to talk back to an interface that is not routable from server –> client (would show up as errors here: http://prod-perconamm-01.lga1.admarketplace.net/prometheus/targets, look for context exceeded messages in the Error column). You may need to use the pmm-admin remove command to drop the node and re-add it with a different node-address defined. It’s also possible that the firewall rules are preventing the PMM server from making a TCP connection on port 42000 and 42002 (you could check that from the PMM server with a simple telnet test to one of those two ports using 10.69.1.10 as the destination IP)
Yes, that screams that the PMM server cannot talk to either the client node (routing - can the pmm server ping the ip of the client from a command prompt) or if the server can see the client’s IP, can you telnet first from the machine hosting the pmm server (‘telnet 10.69.1.10 42002’, if this fails there’s likely a firewall between pmm-server host and pmm-client host). If all of that succeeds then we’d have to go inside the pmm-server container to run some tests and I’d shift my attention to selinux or something like that. Has this worked for other systems in your network or is this the first system you’re attempting to monitor?
We have opened the ports and its working now, thanks for the help.
[12:30:39 root@prod-mysql-01.fra1 ~]$ pmm-admin list
pmm-admin 1.17.3
PMM Server | prod-perconamm-01.lga1.admarketplace.net (password-protected)
Client Name | prod-mysql-01.fra1.admarketplace.net
Client Address | 10.69.1.10
Service Manager | linux-systemd
-------------- ------------------------------------- ----------- -------- -------------------------------------------- ---------------
SERVICE TYPE NAME LOCAL PORT RUNNING DATA SOURCE OPTIONS
-------------- ------------------------------------- ----------- -------- -------------------------------------------- ---------------
linux:metrics prod-mysql-01.fra1.admarketplace.net 42000 YES -
mysql:metrics prod-mysql-01.fra1.admarketplace.net 42002 YES pmm_app:***@unix(/var/lib/mysql/mysql.sock) tablestats=OFF
[12:30:42 root@prod-mysql-01.fra1 ~]$ pmm-admin check-network
PMM Network Status
Server Address | prod-perconamm-01.lga1.admarketplace.net
Client Address | 10.69.1.10
* System Time
NTP Server (0.pool.ntp.org) | unable to get ntp time: %!s(<nil>)
PMM Server | 2020-04-06 16:30:47 +0000 GMT
PMM Client | 2020-04-06 12:31:02 -0400 EDT
PMM Client to PMM Server Time Drift | OK
* Connection: Client --> Server
-------------------- -------
SERVER SERVICE STATUS
-------------------- -------
Consul API OK
Prometheus API OK
Query Analytics API DOWN
Connection duration | 195.003963ms
Request duration | -99.663039ms
Full round trip | 95.340924ms
* Connection: Client <-- Server
-------------- ------------------------------------- ----------------- ------- ---------- ---------
SERVICE TYPE NAME REMOTE ENDPOINT STATUS HTTPS/TLS PASSWORD
-------------- ------------------------------------- ----------------- ------- ---------- ---------
linux:metrics prod-mysql-01.fra1.admarketplace.net 10.69.1.10:42000 OK YES YES
mysql:metrics prod-mysql-01.fra1.admarketplace.net 10.69.1.10:42002 OK YES YES
[12:31:03 root@prod-mysql-01.fra1 ~]$
From the pmm server host (not inside the container) can you telnet to 10.69.1.10 on port 42002? I’d at least like to verify that TCP isn’t part of the problem. Are there multiple interfaces on the 10.69.1.10 server and is that the IP that the PMM server should be talking to the PMM client on? That error is either because you cannot talk to the port from the pmm server (network issue) or that pmm-managed isn’t running on the client end (maybe check and see if pmm-managed is running or restart pmm-agent)