PMM does not display metrics when either agent or target MongoDB is down

Hi,

I am facing an issue when either the pmm client or if the monitored MongoDB is down, the coressponding metrics of that host and MongoDB is not displayed
The MongoDB dashboards does not display any metrics before the pmm client or MongoDB went down, until we start back the pmm-client or MongoDB

Thanks,
Tanveer

Hi Tanveer,

can you click on “Time range” in the right top corner and choose a longer time interval?

Hi Mykola,

Thanks for your response!!!
I have tried that but the host itself does not display in the drop down list
But once I recover the database and restart the agent, then it shows all the metrics
I am attaching the screenshots
Any advise on this will be very helpful

Thanks,
Tanveer

PMM_ISSUE.docx (431 KB)

Hi tanveermadan , You need to ensure that your time range is sufficiently wide enough to include the period when the exporter was still up.
Prometheus knows to scrape your MongoDB exporter (i.e. pmm-admin list shows that the client is running), and will attempt to do so until you explicitly tell it to stop, usually via:

pmm-admin remove mongodb ...
pmm-admin repair

So if the exporter is still running, Prometheus will be able to retrieve some metrics and then save them to the database. This then leads to Grafana having the host available to show statistics (which, if MongoDB was down, would be all zeros / flat lines), and also appear in the Host selector box.

So if the exporter is stopped or the server turned off, Prometheus attempts to scrape but is unable to collect any values at all, and so the timeseries for your MongoDB host simply won’t exist - it doesn’t write blanks to the database, but rather no writes at all for your MongoDB instance are recorded.

This is why you need to ensure that if your exporter is down, your time range should then be inclusive enough to show a period when the server was actually up and running, otherwise Grafana will ask Prometheus “show me all MongoDB instances up during this time range” - so if yours wasn’t running, it won’t be in the list.

I hope this helps!

Hi Michael Coburn ,

Thanks for your response

I shutdown the pmm-client from 08/15 14:52PM until 08/15 12:00PM on the host xsj-db04
And tried to see metrics when the pmm-client was up i.e before 08/14 14:52PM but still I am unable to see it
But once I brought up the pmm-client, I could see metrics for the host and mongodb
I have attached the screenshots with the details on the test I did

Thanks,
Tanveer

Hi tanveermadan , this is interesting. I am escalating this internally as I don’t have an immediate resolution action for you to take. In the meantime, can you please look at this blog post and share with us the information collected. If you prefer, you can email it to me: michael.coburn@percona.com
[url]https://www.percona.com/blog/2016/11/01/percona-monitoring-and-management-pmm-information-script/[/url]

Hi tanveermadan , I received your pmm-info.sh content, thank you.

One thing occurred to me this morning when reading this thread again - the dashboards are configured to evaluate the Template variable $host only on dashboard load. This means that adjusting the time range isn’t sufficient to force hosts to be re-discovered. So my request is: can you run the test again? [LIST=1]
[]Stop the client at T0
[
]Check Metrics Monitor and confirm that host does not appear in the list, by ensuring you’re using a time range that starts after the client was down. Doing a change of time range puts this range into the URL, so simply reloading the page should make Metrics Monitor re-evaluate $host. You should not see the node.
[*]Expand the time range to be before the client was shut down. Force the browser to reload. You should see the MongoDB node again in the $host dropdown. And the graphs should show metrics up to the point when you stopped the client, then be empty.
[/LIST] Please let us know if this is your experience. I’m including an image below of how $host gets populated - note Refresh is defined as “On Dashboard Load” rather than “On Time Range Change”.

Hi Michael Coburn ,

Thanks for your response !!!
I tested the steps given by you and experienced the exact behavior
Reloading the page with the timeframe when the client was up shows the host and the metrics related to that host
To solve this should I set it to “On time range change”

Thanks,
Tanveer

PMM_ISSUE_2.docx (433 KB)