pmm-server was unable to connect pmm-client to collect linux:metrics

pmm-server was unable to connect pmm-client to collect linux:metrics. The follow is the output of pmm-admin check-network. I checked pmm-admin list and linux:metrics showing as running. I verified firewall issue also. There is no firewall issue between pmm-server and pmm-client for 42000 port.

  • Connection: Client ← Server

SERVICE TYPE NAME REMOTE ENDPOINT STATUS HTTPS/TLS PASSWORD


linux:metrics pmm-client client_ip_address::42000 DOWN YES -
mysql:metrics dbcrpmysqlsbxha2 client_ip_address:42002 OK YES -


SERVICE TYPE NAME LOCAL PORT RUNNING DATA SOURCE OPTIONS


linux:metrics pmm_client ip address 42000 YES -

Thanks,
Vishnu

Hi Vishnu,

lets check node_exporter.
can you run the following command on pmm client itself and on pmm server?

wget https://pmm-client-ip:42000/metrics --no-check-certificate

I am wondering wget command output (not url content)

1 Like

Thanks Mykola for your help!

I tried the command (wget [url]https://10.49.xx.xx:42000/metrics[/url] --no-check-certificate) from client and server and getting Connection refused error.

Follow is the error message from pmm-server targets GUI ([url]http://10.49.xx.xx/prometheus/targets[/url])

State: Down
Error: context deadline exceeded

Follow is the error from pmm-linux-metrics-42000.log.

time=“2017-03-29T12:09:44-07:00” level=info msg=“Starting node_exporter (version=1.1.1, branch=master, revision=2d78e22000779d63c714011e4fb30c65623b9c77)” source=“node_exporter.go:170”
time=“2017-03-29T12:09:44-07:00” level=info msg=“Build context (go=go1.7.4, user=, date=)” source=“node_exporter.go:171”
time=“2017-03-29T12:09:44-07:00” level=info msg=“Enabled collectors:” source=“node_exporter.go:190”
time=“2017-03-29T12:09:44-07:00” level=info msg=" - stat" source=“node_exporter.go:192”
time=“2017-03-29T12:09:44-07:00” level=info msg=" - vmstat" source=“node_exporter.go:192”
time=“2017-03-29T12:09:44-07:00” level=info msg=" - filesystem" source=“node_exporter.go:192”
time=“2017-03-29T12:09:44-07:00” level=info msg=" - meminfo" source=“node_exporter.go:192”
time=“2017-03-29T12:09:44-07:00” level=info msg=" - netdev" source=“node_exporter.go:192”
time=“2017-03-29T12:09:44-07:00” level=info msg=" - netstat" source=“node_exporter.go:192”
time=“2017-03-29T12:09:44-07:00” level=info msg=" - uname" source=“node_exporter.go:192”
time=“2017-03-29T12:09:44-07:00” level=info msg=" - diskstats" source=“node_exporter.go:192”
time=“2017-03-29T12:09:44-07:00” level=info msg=" - filefd" source=“node_exporter.go:192”
time=“2017-03-29T12:09:44-07:00” level=info msg=" - loadavg" source=“node_exporter.go:192”
time=“2017-03-29T12:09:44-07:00” level=info msg=" - time" source=“node_exporter.go:192”
time=“2017-03-29T12:09:44-07:00” level=info msg=“HTTPS/TLS is enabled” source=“node_exporter.go:235”
time=“2017-03-29T12:09:44-07:00” level=info msg=“Listening on 10.49.xx.xx:42000” source=“node_exporter.go:238”
2017/03/29 12:09:55 http: TLS handshake error from 10.49.xx.xx:44816: tls: first record does not look like a TLS handshake

Thanks,
Vishnu

1 Like

So it looks like firewall or network configuration issue.
prometheus (on PMM Server side) fetches https://pmm-client-ip:42000/metrics url every second.
this url should be accessible from PMM Server.
can you open 42000, 42002 ports for PMM Server ?

1 Like

It doesn’t seem firewall issue. Because I am able to do telnet from pmm-server to pmm-client with port number.

[root@dbcrpmysqlsbxha3 log]# telnet dbcrpmysqlsbxha3 42000
Trying 10.49.80.46…
Connected to dbcrpmysqlsbxha2.

[root@dbcrpmysqlsbxha3 log]# telnet dbcrpmysqlsbxha2 42002
Trying 10.49.80.46…
Connected to dbcrpmysqlsbxha2.

What is context deadline exceeded error? and Where can I see error logs for more information?

Thanks,
Vishnu

1 Like

Hm,

is the following command working find on pmm client?


wget https://pmm-client-ip:42000/metrics --no-check-certificate 
1 Like

Yes, It seems working fine. See the output of wget [url]https://dbcrpmysqlsbxha2:42000/metrics[/url] --no-check-certificate command.

[root@dbcrpmysqlsbxha2 ~]# wget [url]https://dbcrpmysqlsbxha2:42000/metrics[/url] --no-check-certificate
–2017-03-30 10:22:37-- [url]https://dbcrpmysqlsbxha2:42000/metrics[/url]
Resolving dbcrpmysqlsbxha2… 10.49.80.46
Connecting to dbcrpmysqlsbxha2|10.49.80.46|:42000… connected.
WARNING: cannot verify dbcrpmysqlsbxha2’s certificate, issued by “/O=PMM Client”:
Unable to locally verify the issuer’s authority.
WARNING: certificate common name “” doesn’t match requested host name “dbcrpmysqlsbxha2”.
HTTP request sent, awaiting response… 200 OK
Length: 17784427 (17M) [text/plain]
Saving to: “metrics”

100%[===========================================================================================================================================================>] 17,784,427 6.35M/s in 2.7s

2017-03-30 10:24:06 (6.35 MB/s) - “metrics” saved [17784427/17784427]

Thanks,
Vishnu

1 Like

And additional to this, how to configure SMTP options to set mail notification alerts. I am using followed options and unable to get notifications. Please let me know where I need to modify below SMTP configuration.

#################################### SMTP / Emailing ##########################
[smtp]
;enabled = true
;host = localhost:25
;user = ivishnu7@gmail.com
;password =
;cert_file =
;key_file =
;skip_verify = false
;from_address = admin@grafana.localhost

1 Like

about SMTP - I created separate topic [url]https://www.percona.com/forums/questions-discussions/percona-monitoring-and-management/47988-configure-smtp[/url]

about network connectivity,
is the same wget command working fine on PMM Server machine?

1 Like

Yes, wget [url]https://dbcrpmysqlsbxha2:42000/metrics[/url] --no-check-certificate command working fine in pmm-server also.

[root@dbcrpmysqlsbxha3 ~]# wget [url]https://dbcrpmysqlsbxha2:42000/metrics[/url] --no-check-certificate
–2017-03-31 10:10:09-- [url]https://dbcrpmysqlsbxha2:42000/metrics[/url]
Resolving dbcrpmysqlsbxha2… 10.49.80.46
Connecting to dbcrpmysqlsbxha2|10.49.80.46|:42000… connected.
WARNING: cannot verify dbcrpmysqlsbxha2’s certificate, issued by “/O=PMM Client”:
Unable to locally verify the issuer’s authority.
WARNING: certificate common name “” doesn’t match requested host name “dbcrpmysqlsbxha2”.
HTTP request sent, awaiting response… 200 OK
Length: 17784140 (17M) [text/plain]
Saving to: “metrics.1”

100%[===========================================================================================================================================================>] 17,784,140 2.41M/s in 7.0s

2017-03-31 10:11:37 (2.41 MB/s) - “metrics.1” saved [17784140/17784140]

[root@dbcrpmysqlsbxha3 ~]#
[root@dbcrpmysqlsbxha3 ~]#
[root@dbcrpmysqlsbxha3 ~]# telnet dbcrpmysqlsbxha2 42000
Trying 10.49.80.46…
Connected to dbcrpmysqlsbxha2.

1 Like

can you check [url]https://dbcrpmysqlsbxha2:42000/metrics[/url] status on targets page?
[url]http://pmm-server-ip/prometheus/targets[/url]
it should be “UP”

1 Like

Thanks Mykola, The issue has resolved now. The problem is with interval timeout to connect pmm-server with pmm-client. Previously it was 1sec, Now I changed to 5 minutes. After that, everything seems fine.

These are present my configuration variables in /etc.prometheus.yml:

scrape_interval: 5s
scrape_timeout: 5s

1 Like

it is recommended to keep 1s because internal database performance debug requires high resolution.
if you want to keep it is needed to found found what kind of monitoring query is slow and disable query via mysqld_exporter options.
sometimes mysqld_exporter queries are ok, in this case it is needed place servers in one physical network or high performance network without delays.

1 Like

Hi Mykola,

How should I know the exact reasons for context deadline exceeded error? As you mentioned, I tried with scrape_interval 1s only. But I am seeing context deadlock error exceed error for some monitoring servers (Not for all servers) (Present I am monitoring 5 mysql instance from remote PMM server with 16 GB RAM and 4 CPU’s. In this, I am able to collect linux and mysql metrics for 2 instances without any issue and not able to collect linux metrics for remaining 3 instances.). Getting context deadline exceeded errors in prometheus/target page. How to control linux metrics monitoring more efficiently? No firewall issue.

1 Like

Hi Vishnu,

context deadline exceeded means that mysqld_exporter works longer that prometheus expected (cannot finish work in 1 second).
so mysqld_exporter creates long additional load on database server.
mysqld_exporter runs many queries to database, so we can disable some checks to speedup it.
usually most long query is ‘tablestat’, it is possible to disable this query by the following commands.

pmm-admin remove mysql:metrics
pmm-admin add mysql:metrics --disable-tablestats

also options available.

1 Like

Thanks Mykola for update. Here you mentioned all checks only for mysql (–disable-tablestats, --disable-userstats --disable-processlist --disable-binlogstats). We already disabled table-stats in my environment to improve performance and everything seems good. Is there any way to filter linux metrics like mysql metrics? Because I am seeing context deadline exceeded errors only for Linux metrics.

Thanks PMM team for providing beautiful monitoring tool for MySQL.

Thnaks,
Vishnu

1 Like

usually, node_expoerter (Linux metrics) works fine. we don’t have any options for it.

can you share the output of the following command?

time curl -s -o /dev/null --insecure https://pmm-client-ip:42000/metrics

please run it on pmm-client and on pmm-server (I want to compare response time)

1 Like

Hello,

if pmm-server was unable to connect pmm - client to collect linux: metrics, Please go through this link:https://www.percona.com/doc/percona-monitoring-and-management/2.x/manage/client-linux-metrics.html

Thanks,

Anita

1 Like

I also faced same type of issue which might be due to some latency between client node and server node. Same has been resolved by increasing value of scrape_timeout to 5s .

1 Like

I also faced same type of issue which might be due to some latency between client node and server node. Same has been resolved by increasing value of scrape_timeout to 5s .

1 Like