Empty data on home-dashboard

Hi I’m new to PMM. I just setup to monitor two MySQL servers following the same installation process. One machine works well, but the other one has no data at all for the whole row on the home-dashboard page. On other pages like “mysql-overview”, data is complete for both MySQL instances.

I’ve checked “check-network” and everything is OK with latencies <10ms. Also I checked the log files “pmm-*.log” on the client machine, as well as the log files on the PMM server inside the docker. The only suspicious thing is in the “prometheus.log” file on the server docker, as following:

[HTML]


time="2018-03-05T10:41:45Z" level=warning msg="Error on ingesting out-of-order samples" numDropped=42 source="scrape.go:534" 

time="2018-03-05T10:41:45Z" level=warning msg="Scrape health sample discarded" error="sample timestamp out of order" sample=up{instance="db-dedicated.ff.digital", job="mysql"} => 1 &#64;[1520246504.445] source="scrape.go:587" 

time="2018-03-05T10:41:45Z" level=warning msg="Scrape duration sample discarded" error="sample timestamp out of order" sample=scrape_duration_seconds{instance="db-dedicated.ff.digital", job="mysql"} => 1.441987055 &#64;[1520246504.445] source="scrape.go:590" 

time="2018-03-05T10:41:45Z" level=warning msg="Scrape sample count sample discarded" error="sample timestamp out of order" sample=scrape_duration_seconds{instance="db-dedicated.ff.digital", job="mysql"} => 1.441987055 &#64;[1520246504.445] source="scrape.go:593" 

time="2018-03-05T10:41:45Z" level=warning msg="Scrape sample count post-relabeling sample discarded" error="sample timestamp out of order" sample=scrape_duration_seconds{instance="db-dedicated.ff.digital", job="mysql"} => 1.441987055 &#64;[1520246504.445] source="scrape.go:596" 

time="2018-03-05T10:41:46Z" level=warning msg="Error on ingesting out-of-order samples" numDropped=42 source="scrape.go:534" 

time="2018-03-05T10:41:46Z" level=warning msg="Scrape health sample discarded" error="sample timestamp out of order" sample=up{instance="db-dedicated.ff.digital", job="mysql"} => 1 &#64;[1520246506.364] source="scrape.go:587" 

time="2018-03-05T10:41:46Z" level=warning msg="Scrape duration sample discarded" error="sample timestamp out of order" sample=scrape_duration_seconds{instance="db-dedicated.ff.digital", job="mysql"} => 0.625042525 &#64;[1520246506.364] source="scrape.go:590" 

time="2018-03-05T10:41:46Z" level=warning msg="Scrape sample count sample discarded" error="sample timestamp out of order" sample=scrape_duration_seconds{instance="db-dedicated.ff.digital", job="mysql"} => 0.625042525 &#64;[1520246506.364] source="scrape.go:593" 

time="2018-03-05T10:41:46Z" level=warning msg="Scrape sample count post-relabeling sample discarded" error="sample timestamp out of order" sample=scrape_duration_seconds{instance="db-dedicated.ff.digital", job="mysql"} => 0.625042525 &#64;[1520246506.364] source="scrape.go:596" 

time="2018-03-05T10:41:51Z" level=warning msg="Error on ingesting out-of-order samples" numDropped=42 source="scrape.go:534" 

time="2018-03-05T10:41:51Z" level=warning msg="Scrape health sample discarded" error="sample timestamp out of order" sample=up{instance="db-dedicated.ff.digital", job="mysql"} => 1 &#64;[1520246511.373] source="scrape.go:587" 

time="2018-03-05T10:41:51Z" level=warning msg="Scrape duration sample discarded" error="sample timestamp out of order" sample=scrape_duration_seconds{instance="db-dedicated.ff.digital", job="mysql"} => 0.45708211 &#64;[1520246511.373] source="scrape.go:590" 

time="2018-03-05T10:41:51Z" level=warning msg="Scrape sample count sample discarded" error="sample timestamp out of order" sample=scrape_duration_seconds{instance="db-dedicated.ff.digital", job="mysql"} => 0.45708211 &#64;[1520246511.373] source="scrape.go:593" 

time="2018-03-05T10:41:51Z" level=warning msg="Scrape sample count post-relabeling sample discarded" error="sample timestamp out of order" sample=scrape_duration_seconds{instance="db-dedicated.ff.digital", job="mysql"} => 0.45708211 &#64;[1520246511.373] source="scrape.go:596"


[/HTML]

But when I checked system time on all MySQL servers, the PMM server docker and the docker host machine, they are all correct. Does any one have an idea?

AFAIK, the only differences between the two MySQL servers are the networks. One is within the same internal network as the PMM server, while the other (the one with no data) is not with slightly different firewall config. But I’m sure all required ports are open.

Hi there, sorry for the delay, are you still having this problem?

Can I just check you ran through the troubleshooting advice https://www.percona.com/blog/2018/01…t-pmm-metrics/ ?
There’s also a section on checking network here https://www.percona.com/doc/percona-monitoring-and-management/faq.html

Meanwhile I will see if I can find any other information for you.

Myles , it looks like some problem with Date - time. so better check and runtime sync on your server and client
Also pls, check the https://YOURSERVER/prometheus/targets page, is there any errors?
and https://YOURSERVER/graph/dashboard/db/prometheus can also provide some info about the problem

Thank you but I’m afraid it does not solve the problem. I followed the troubleshooting and everything seems good.

Hi Roma,

I re-checked the servers and they both have time in sync.
All prometheus targets UP and “last scrape” for 2 “linux” targets are within 3 seconds.
Prometheus graph shows it’s using very low percentage of resources (<10% CPU & mem, etc.), everything looks good.

I can also see the following when I visit https://<PMM_client_ip>:42000/metrics, which means the client is providing Linux related metrics correctly.

I think I found my own answer, which looks like a bug. The following is the problematic part of page I see at http:///graph/dashboard/db/home-dashboard?orgId=1, and it is the only page I found with problem. Nothing else is wrong. And when I click on the big “no value” box, it takes me to http:///graph/dashboard/db/system-overview?from=now-12h&to=now&var-interval=$__auto_interval&var-host=All&orgId=1 with no value displaying too.
The thing I noticed is “var-host=All” in the second URL, which means no specific host is selected. Thus the second page I believe is intended behaviour.
And finally I solved my problem by changing the “client_name” value in pmm.yml at client machine, like in the screenshot attached. “db-0.fake.domain” is the old name, and I changed it to “db-dedicated”.

Conclusion: I believe it is a bug that when client_name(host) contains dot (.) the dashboard landing page will not recognise it as “var-host” value when fetching data from webpage backend, thus displaying “no value” and providing a wrong link.

Same problem here with a client with a dot (.) in its name.

Hi there, thanks for this input, I am just going to bring it to the attention of the PMM team so they can check it out.
Appreciated!

Yeah , its a bug. I also tested and it works fine as far as FQDN does not have dot “.” in between them as “client_name” in pmm.yml file on pmm client machine. Re-install everything with this fix , it should work.

Hi all thanks for all your input, this has been registered as a bug, and I’ll update as soon as there’s a status change