Client Doesnt show up in Grafana

My client doesnt show up in Grafana. Pmm server is in aws machine and it has ports 80 and 443 open.
Does it need any more ports open for incoming connections from pmm client?
[root@XYZ sbin]# pmm-admin ping
OK, PMM server is alive.

PMM Server | XX.YY.ZZ.AA
Client Name | ABC
Client Address | AA.BB.CC.DD

It is needed to allow traffic:
[LIST]
[]from PMM client any port to PMM server 80 and 443 ports
[
]from PMM server any port to PMM client port 42000 for linux:metrics
[]from PMM server any port to PMM client port 42002 for mysql:metrics
[
]from PMM server any port to PMM client port 42003 for mongodb:metrics
[*]from PMM server any port to PMM client port 42004 for proxysql:metrics
[/LIST]

Hi,
I am having the same problem. I am pretty sure they (clients and the server) are in the same network and the traffic is allowed, have checked the ports are opened with nmap.
I can see the servers in the Query Analytics but they are not shown in Metrics Monitor. I am having trouble with just some clients, most works fine.

pmm-admin 1.0.7
Go Version 1.7.4

Hi lux,

you can check connectivity by ‘pmm-admin check-network’ command (run it on problem host).
it checks connectivity from server to client and from client to server.

Hi,
Thanks for quick reply. check-network shows status is down, and i found these in prometheus.log from server. Do you know what causes it?

time=“2017-01-10T12:23:57Z” level=warning msg=“Scrape health sample discarded” error=“sample timestamp out of order” sample=up{instance=“client”, job=“mysql”} => 0 @[1484051037.723] source=“scrape.go:500”
time=“2017-01-10T12:23:57Z” level=warning msg=“Scrape duration sample discarded” error=“sample timestamp out of order” sample=scrape_duration_seconds{instance=“client”, job=“mysql”} => 0.256849543 @[1484051037.723] source=“scrape.go:503”

i checked the timestamp on the client, it is correct.

This is the message from [url]http://server/prometheus/targets[/url]

Get [url]http://ip:42000/metrics:[/url] malformed HTTP response “\x15\x03\x01\x00\x02\x02”

can you check, may be you have two clients with the same name/hostname?

well, I checked [URL]http://server/consul/[/URL], there is no redundant server/host name, prometheus/targets shows IP address which is individual and nothing identical either. I have tried to deregister the client from consul and did “pmm-admin repair” from the client, still the same problem. Do I miss some required packages or something?
It is bit strange that I only have a couple clients which don’t work, all others work fine. And they are all in the same network, so shouldn’t have different firewall rules between them.

  1. I checked all pmm-* logs, there is no error/complain other than “tls: first record does not look like a TLS handshake”, which seems common/normal from others reply.
  2. pmm-admin check-network shows client to server works fine, but server to client status is down.
  3. I used nmap from server to client on port 42000 and 42002, both shows open.

I am bit running out of ideas what else to check…

:frowning:

can you check on broken client:

  • Are versions of PMM server and PMM client the same?
  • Can you show output of ‘pmm-admin check-network’ command? (you can replace IPs by 1.1.1.1 and 2.2.2.2)

Thanks for the tips. Yeah finally it works.

The server version was 1.0.5 and the client which didn’t work was 1.0.7. After upgrading server to 1.0.7, i can see all clients now. I checked the client which worked with server 1.0.5, they were having 1.0.6, so didn’t think the version matters that much. Anyway, thanks for helping! :slight_smile:

I am having a similar issue, except both my client and server instances are running the same version.

pmm-admin check-network
PMM Network Status

Server Address | REDACTED
Client Address | REDACTED

* System Time
Server | 2017-01-31 15:57:20 -0500 EST
Client | 2017-01-31 15:57:20 -0500 EST
Time Drift | OK


* Connection: Client --> Server
-------------------- -------
SERVER SERVICE STATUS
-------------------- -------
Consul API OK
Prometheus API OK
Query Analytics API OK

Connection duration | 616.433µs
Request duration | 1.107953ms
Full round trip | 1.724386ms


* Connection: Client <-- Server
-------------- -------------------- ---------------------- ------- ---------- ---------
SERVICE TYPE NAME REMOTE ENDPOINT STATUS HTTPS/TLS PASSWORD
-------------- -------------------- ---------------------- ------- ---------- ---------
linux:metrics REDACTED REDACTED:42000 DOWN YES YES
mysql:metrics REDACTED REDACTED:42002 DOWN YES YES

When an endpoint is down it may indicate that the corresponding service is stopped (run 'pmm-admin list' to verify).
If it's running, check out the logs /var/log/pmm-*.log

When all endpoints are down but 'pmm-admin list' shows they are up and no errors in the logs,
check the firewall settings whether this system allows incoming connections from server to address:port in question.

Also you can check the endpoint status by the URL: http://REDACTED/prometheus/targets
pmm-admin 1.0.7

PMM Server | REDACTED (password-protected)
Client Name | REDACTED
Client Address | REDACTED
Service Manager | linux-systemd

-------------- -------------------- ----------- -------- ---------------------------------------- ---------------------
SERVICE TYPE NAME LOCAL PORT RUNNING DATA SOURCE OPTIONS
-------------- -------------------- ----------- -------- ---------------------------------------- ---------------------
mysql:queries REDACTED - YES pmm:***&#64;unix(/var/lib/mysql/mysql.sock) query_source=slowlog
linux:metrics REDACTED 42000 YES -
mysql:metrics REDACTED 42002 YES pmm:***&#64;unix(/var/lib/mysql/mysql.sock) tablestats=OFF

I ran cURL commands from within the Docker container to the endpoints without issue-

root&#64;2105dab46d05:/opt# curl --insecure https://REDACTED:42000
<html>
<head><title>Node Exporter</title></head>
<body>
<h1>Node Exporter</h1>
<p><a href="/metrics">Metrics</a></p>
</body>
</html>root&#64;2105dab46d05:/opt

For the client-side logs:

/var/log/pmm-mysql-queries-0.log

2017/01/31 16:04:08.182228 WARNING data-sender Waiting for API to ack qan_1485892380005193039: read tcp REDACTED:48005->REDACTED:80: i/o timeout

The other two logs just have the TLS handshake errors.

Are there any further avenues of exploration?

Hi TerranUp16,

Do you have two networks (like private and public)?