Hi I’m new to PMM. I just setup to monitor two MySQL servers following the same installation process. One machine works well, but the other one has no data at all for the whole row on the home-dashboard page. On other pages like “mysql-overview”, data is complete for both MySQL instances.
I’ve checked “check-network” and everything is OK with latencies <10ms. Also I checked the log files “pmm-*.log” on the client machine, as well as the log files on the PMM server inside the docker. The only suspicious thing is in the “prometheus.log” file on the server docker, as following:
[HTML]
time="2018-03-05T10:41:45Z" level=warning msg="Error on ingesting out-of-order samples" numDropped=42 source="scrape.go:534"
time="2018-03-05T10:41:45Z" level=warning msg="Scrape health sample discarded" error="sample timestamp out of order" sample=up{instance="db-dedicated.ff.digital", job="mysql"} => 1 @[1520246504.445] source="scrape.go:587"
time="2018-03-05T10:41:45Z" level=warning msg="Scrape duration sample discarded" error="sample timestamp out of order" sample=scrape_duration_seconds{instance="db-dedicated.ff.digital", job="mysql"} => 1.441987055 @[1520246504.445] source="scrape.go:590"
time="2018-03-05T10:41:45Z" level=warning msg="Scrape sample count sample discarded" error="sample timestamp out of order" sample=scrape_duration_seconds{instance="db-dedicated.ff.digital", job="mysql"} => 1.441987055 @[1520246504.445] source="scrape.go:593"
time="2018-03-05T10:41:45Z" level=warning msg="Scrape sample count post-relabeling sample discarded" error="sample timestamp out of order" sample=scrape_duration_seconds{instance="db-dedicated.ff.digital", job="mysql"} => 1.441987055 @[1520246504.445] source="scrape.go:596"
time="2018-03-05T10:41:46Z" level=warning msg="Error on ingesting out-of-order samples" numDropped=42 source="scrape.go:534"
time="2018-03-05T10:41:46Z" level=warning msg="Scrape health sample discarded" error="sample timestamp out of order" sample=up{instance="db-dedicated.ff.digital", job="mysql"} => 1 @[1520246506.364] source="scrape.go:587"
time="2018-03-05T10:41:46Z" level=warning msg="Scrape duration sample discarded" error="sample timestamp out of order" sample=scrape_duration_seconds{instance="db-dedicated.ff.digital", job="mysql"} => 0.625042525 @[1520246506.364] source="scrape.go:590"
time="2018-03-05T10:41:46Z" level=warning msg="Scrape sample count sample discarded" error="sample timestamp out of order" sample=scrape_duration_seconds{instance="db-dedicated.ff.digital", job="mysql"} => 0.625042525 @[1520246506.364] source="scrape.go:593"
time="2018-03-05T10:41:46Z" level=warning msg="Scrape sample count post-relabeling sample discarded" error="sample timestamp out of order" sample=scrape_duration_seconds{instance="db-dedicated.ff.digital", job="mysql"} => 0.625042525 @[1520246506.364] source="scrape.go:596"
time="2018-03-05T10:41:51Z" level=warning msg="Error on ingesting out-of-order samples" numDropped=42 source="scrape.go:534"
time="2018-03-05T10:41:51Z" level=warning msg="Scrape health sample discarded" error="sample timestamp out of order" sample=up{instance="db-dedicated.ff.digital", job="mysql"} => 1 @[1520246511.373] source="scrape.go:587"
time="2018-03-05T10:41:51Z" level=warning msg="Scrape duration sample discarded" error="sample timestamp out of order" sample=scrape_duration_seconds{instance="db-dedicated.ff.digital", job="mysql"} => 0.45708211 @[1520246511.373] source="scrape.go:590"
time="2018-03-05T10:41:51Z" level=warning msg="Scrape sample count sample discarded" error="sample timestamp out of order" sample=scrape_duration_seconds{instance="db-dedicated.ff.digital", job="mysql"} => 0.45708211 @[1520246511.373] source="scrape.go:593"
time="2018-03-05T10:41:51Z" level=warning msg="Scrape sample count post-relabeling sample discarded" error="sample timestamp out of order" sample=scrape_duration_seconds{instance="db-dedicated.ff.digital", job="mysql"} => 0.45708211 @[1520246511.373] source="scrape.go:596"
[/HTML]
But when I checked system time on all MySQL servers, the PMM server docker and the docker host machine, they are all correct. Does any one have an idea?
AFAIK, the only differences between the two MySQL servers are the networks. One is within the same internal network as the PMM server, while the other (the one with no data) is not with slightly different firewall config. But I’m sure all required ports are open.
I re-checked the servers and they both have time in sync.
All prometheus targets UP and “last scrape” for 2 “linux” targets are within 3 seconds.
Prometheus graph shows it’s using very low percentage of resources (<10% CPU & mem, etc.), everything looks good.
I can also see the following when I visit https://<PMM_client_ip>:42000/metrics, which means the client is providing Linux related metrics correctly.
I think I found my own answer, which looks like a bug. The following is the problematic part of page I see at http:///graph/dashboard/db/home-dashboard?orgId=1, and it is the only page I found with problem. Nothing else is wrong. And when I click on the big “no value” box, it takes me to http:///graph/dashboard/db/system-overview?from=now-12h&to=now&var-interval=$__auto_interval&var-host=All&orgId=1 with no value displaying too.
The thing I noticed is “var-host=All” in the second URL, which means no specific host is selected. Thus the second page I believe is intended behaviour.
And finally I solved my problem by changing the “client_name” value in pmm.yml at client machine, like in the screenshot attached. “db-0.fake.domain” is the old name, and I changed it to “db-dedicated”.
Conclusion: I believe it is a bug that when client_name(host) contains dot (.) the dashboard landing page will not recognise it as “var-host” value when fetching data from webpage backend, thus displaying “no value” and providing a wrong link.
Yeah , its a bug. I also tested and it works fine as far as FQDN does not have dot “.” in between them as “client_name” in pmm.yml file on pmm client machine. Re-install everything with this fix , it should work.