Grafana Graph works only sporadic

A have installed PMM 1.05 and adding some hosts with linux:metrics and mysql:metrics, but Grafana is showing the graph very sporadic.

In times i have no graphs the client state is down.


SERVICE TYPE NAME REMOTE ENDPOINT STATUS
-------------- ------- ---------------------- -------
linux:metrics db-1-1 144.76.15.145:42000 DOWN 
mysql:metrics db-1-1 144.76.15.145:42002 DOWN 

I run pmm-server as following to prevent sync issues.

docker run -d \
-p 44444:443 \
-e METRICS_RESOLUTION=5s \
--volumes-from pmm-data \
--name pmm-server \
-v /usr/share/pmm:/etc/nginx/ssl \
--restart always \
percona/pmm-server:1.0.5

But i don’t think this a problem

Connection duration | 323.078µs
Request duration | 4.35313ms
Full round trip | 4.676208ms
PING db-1-1 (144.76.15.145) 56(84) bytes of data.
64 bytes from db-1-1 (x.x.x.x): icmp_seq=1 ttl=60 time=0.311 ms
64 bytes from db-1-1 (x.x.x.x): icmp_seq=2 ttl=60 time=0.333 ms
64 bytes from db-1-1 (x.x.x.x): icmp_seq=3 ttl=60 time=0.684 ms
Server side: Wed Oct 26 10:56:51 UTC 2016
Client Side: Wed Oct 26 12:56:51 CEST 2016

If i take a look in /var/log/prometheus.log there are no error entries. Sometimes, there are absolute no entries over a few hours.

I also can do in time i get no data:

root@756c3afeceba:/opt# curl http://db-1-1:42002
<html>
<head><title>MySQL 3-in-1 exporter</title></head>
<body>
<h1>MySQL 3-in-1 exporter</h1>
<li><a href="/metrics-hr">high-res metrics</a></li>
<li><a href="/metrics-mr">medium-res metrics</a></li>
<li><a href="/metrics-lr">low-res metrics</a></li>
</body>
</html>

I have no idea, why sometimes data is available an often not.

You can watch how the container performs docker stats pmm-server. May be the monitor server is not having enough resources.

Hi,

thanks for the fast answer.

On the physical machine there are more as enough resources and here is the output from docker:

CONTAINER CPU % MEM USAGE / LIMIT MEM % NET I/O BLOCK I/O PIDS
pmm-server 3.13% 0 B / 0 B 0.00% 172.2 MB / 224.5 MB 116.3 MB / 294.9 GB 0

I think it’s another problem.

I have found that I needed to bump up the memory in the docker with:

METRICS_MEMORY=2097152

I have been having this (or similar issue) and it seems to be that it runs out of memory to ingest the metrics coming in to prometheus.
And usually when it dies it really dies and I have to restart the docker.

I have been bumping up the memory with this command:

METRICS_MEMORY=2097152

In the docker run command line. It has some fairly small default - that was working fine until I added one too many servers or has a peak in events.

You might take a look at the Prometheus graph in grafana.

My running server looks like this when I run docker stats pmm-server

CONTAINER CPU % MEM USAGE / LIMIT MEM % NET I/O BLOCK I/O

pmm-server 71.24% 6.77 GB / 7.934 GB 85.33% 6.932 GB / 379.2 MB 3.766 GB / 18.15 GB

I test it again …

… adding slowly monitored hosts to the pmm server.

After doing this for the first 9 server only 3 minutes after adding number 10 the monitoring stopped completely.

Again there are no log entries in /var/log/prometheus.log and ALL metrics on ALL servers are in running state, but without connectivity to the server:

SERVICE TYPE NAME REMOTE ENDPOINT STATUS
-------------- ------- ---------------------- -------
linux:metrics db-1-2 x.x.x.x:42000 DOWN 
mysql:metrics db-1-2 x.x.x.x:42002 DOWN

PMM is a nice and helpfull tool, but it seems not very robust and we are not able to work with it at the moment.

What can I do?

What is OS and docker version?

When the connectivity is lost, does this work curl http://db-1-2:42000/metrics from inside the container?

Distributor ID: Debian
Description: Debian GNU/Linux 8.6 (jessie)
Release: 8.6
Codename: jessie
Docker version 1.12.2, build bb80604

If I run curl http://db-1-2:42000/metrics from inside the container:

process_cpu_seconds_total 23.42
# HELP process_max_fds Maximum number of open file descriptors.
# TYPE process_max_fds gauge
process_max_fds 1024
# HELP process_open_fds Number of open file descriptors.
# TYPE process_open_fds gauge
process_open_fds 9
# HELP process_resident_memory_bytes Resident memory size in bytes.
# TYPE process_resident_memory_bytes gauge
process_resident_memory_bytes 1.370112e+07
# HELP process_start_time_seconds Start time of the process since unix epoch in seconds.
# TYPE process_start_time_seconds gauge
process_start_time_seconds 1.47749268343e+09
# HELP process_virtual_memory_bytes Virtual memory size in bytes.
# TYPE process_virtual_memory_bytes gauge
process_virtual_memory_bytes 2.7041792e+07
...

Since my last post i got 3 times data without changing anything on the infrastructure.

It’s strange.

When pmm-admin check-network shows DOWN for endpoint, does curl from inside container for the same one work? I don’t think it’s possible.

Thx weber and geoiii,

i run the container now as following:


docker run -d \
-p 44444:443 \
-e METRICS_RESOLUTION=5s \
-e METRICS_MEMORY=4124672 \
--volumes-from pmm-data \
--name pmm-server \
-v /usr/share/pmm:/etc/nginx/ssl \
-v /etc/localtime:/etc/localtime:ro \
--restart always \
percona/pmm-server:1.0.5

Sometimes the time in the docker container changes, so i sync the timezone with the local ones and start the container with more memory.

At the moment all is working fine.

The stats are now more plausible, but mem usage looks a little bit strange. There are no values, but it works :slight_smile:

CONTAINER CPU % MEM USAGE / LIMIT MEM % NET I/O BLOCK I/O PIDS
pmm-server 10.72% 0 B / 0 B 0.00% 77.36 MB / 3.013 MB 105.8 MB / 1.107 GB 0

Oh, I misread, I thought it was your post about METRICS_MEMORY as well :slight_smile: