Memory Usage graph reports error when >68 hosts are selected

We have 58 MySQL instances running on 16 MySQL VM, plus the PMM server is monitored, of course.

I have tried to access data directly in Prometheus and I see all the metrics for all nodes without any trouble, hence the issue seems related to Grafana.

Playing around selecting and deselecting hosts from the list, the problem appears “around” 68 hosts. Depending on the set, sometimes 67 is enough to trigger the problem, other times you need 69. So, it does not seem to be related to the number of hosts rather than the data volume to be accessed.

Thanks

  1. do you see any javascript errors in the browser javascript console?
  2. can you decrease resolution to 1/10
    it is needed to
  • login to grafana as Admin user
  • click on graph title
  • click “Edit”
  • choose “Metrics” tab
  • change Resolution value

see screenshot

Sorry for the very late reply, been very busy during the past days.
I have tried all the possible resolutions for the Memory trend graph, but I still get the error when >~68 hosts are selected from the list.

can you open “Developer Tools”->“Console” in your browser and reload page.
do you see any javascript errors?

I have the following JS messages on the Console:

TypeError: Cannot read property ‘result’ of undefined
at eval (datasource.js?bust=1488533087258:4)
at f (boot.b8fccc44.js:52)
at Function.Pg (boot.b8fccc44.js:53)
at eval (datasource.js?bust=1488533087258:4)
at g (boot.b8fccc44.js:42)
at boot.b8fccc44.js:42
at o.$eval (boot.b8fccc44.js:42)
at o.$digest (boot.b8fccc44.js:42)
at o.$apply (boot.b8fccc44.js:42)
at boot.b8fccc44.js:42
(anonymous) @ boot.b8fccc44.js:42
(anonymous) @ boot.b8fccc44.js:41
g @ boot.b8fccc44.js:42
(anonymous) @ boot.b8fccc44.js:42
$eval @ boot.b8fccc44.js:42
$digest @ boot.b8fccc44.js:42
$apply @ boot.b8fccc44.js:42
(anonymous) @ boot.b8fccc44.js:42
f @ boot.b8fccc44.js:40
(anonymous) @ boot.b8fccc44.js:41
boot.b8fccc44.js:46

Panel data error: TypeError: Cannot read property ‘result’ of undefined
at eval (datasource.js?bust=1488533087258:4)
at f (boot.b8fccc44.js:52)
at Function.Pg (boot.b8fccc44.js:53)
at eval (datasource.js?bust=1488533087258:4)
at g (boot.b8fccc44.js:42)
at boot.b8fccc44.js:42
at o.$eval (boot.b8fccc44.js:42)
at o.$digest (boot.b8fccc44.js:42)
at o.$apply (boot.b8fccc44.js:42)
at boot.b8fccc44.js:42

can you replace Query?

it is needed to

  • login to grafana as Admin user
  • click on graph title
  • click “Edit”
  • choose “Metrics” tab
  • fully replace Query value
100 - 100 * node_memory_MemAvailable{instance=~"$host"} / (node_memory_MemTotal{instance=~"$host"} + 0.1)

I have replaced the query with the new one you sent me. In this case I see the graph only for the host “pmm-server” when All is ticked in the hosts list.

Am I doing something wrong?

To be clearer, here is a screenshot:

Hm :frowning:
screen shot looks very strange, new Query working fine on my machine :frowning:

can you try again to change “Resolution” and “Query” after that press “save Dashboard” button (looks like floppy disk) and make refresh.

I tried various combinations of changing query and resolution, with no luck so far.
Selecting a subset of hosts (excluding pmm-server) is basically returning no data points, while with the original query I get a graph displayed. Please check the attached screenshot.

can you try one more Query?

100 - 100 * (node_memory_MemFree{instance=~"$host"} + node_memory_Buffers{instance=~"$host"} + node_memory_Cached{instance=~"$host"}) / (node_memory_MemTotal{instance=~"$host"} + 0.1)

Yes, this fixes the problem, thanks! Should I do anything to make it persistent?

unfortunately, such changes will be overridden on PMM update.

can you check one more Query? (is very similar to current default)
if it works fine I will commit it.

(node_memory_MemTotal{instance=~"$host"} - (node_memory_MemAvailable{instance=~"$host"} or (node_memory_MemFree{instance=~"$host"} + node_memory_Buffers{instance=~"$host"} + node_memory_Cached{instance=~"$host"}))) / ( node_memory_MemTotal{instance=~"$host"} + 0.1 ) * 100

The last query you sent me returns no data points for any host (including pmm-server).

the latest idea…

100 - 100 * (node_memory_MemAvailable{instance=~"$host"} or (node_memory_MemFree{instance=~"$host"} + node_memory_Buffers{instance=~"$host"} + node_memory_Cached{instance=~"$host"})) / (node_memory_MemTotal{instance=~"$host"} + 0.1)

This one works flawlessly

thank you!

I created pull request [URL]https://github.com/percona/grafana-dashboards/pull/16[/URL]

Hi Mykola,
today I had the chance to upgrade all of our PMM server deployments to version 1.1.4, but I see that the fix for this issue hasn’t been merged.
The last query you sent me works as expected even with an high number of nodes:


100 - 100 * (node_memory_MemAvailable{instance=~"$host"} or (node_memory_MemFree{instance=~"$host"} + node_memory_Buffers{instance=~"$host"} + node_memory_Cached{instance=~"$host"})) / (node_memory_MemTotal{instance=~"$host"} + 0.1)

But by default the Grafana dashboard has this instead:


(node_memory_MemTotal{instance=~"$host"} - (node_memory_MemAvailable{instance=~"$host"} or (node_memory_MemFree{instance=~"$host"} + node_memory_Buffers{instance=~"$host"} + node_memory_Cached{instance=~"$host"}))) / node_memory_MemTotal{instance=~"$host"} * 100

which breaks the graph.

Could you please help?

Thanks

Hi,

I asked PMM Team to prioritize this pull request, so now it is reviewed and merged (fixed).
So new code (with fix) will be avaiable in the next 1.1.5 release.

Thanks, very appreciated.