We often see gaps in PMM mongodb graphs when load on mongodb instances being monitored are higher than usual:
[IMG2=JSON]{“data-align”:“none”,“data-size”:“full”,“src”:“https://www.percona.com/forums/image/gif;base64,R0lGODlhAQABAPABAP///wAAACH5BAEKAAAALAAAAAABAAEAAAICRAEAOw==”}[/IMG2]
At the same time metrics coming from node_exporter do not have gaps and show some increase in CPU usage (not awfully high):
[IMG2=JSON]{“data-align”:“none”,“data-size”:“full”,“src”:“https://www.percona.com/forums/image/gif;base64,R0lGODlhAQABAPABAP///wAAACH5BAEKAAAALAAAAAABAAEAAAICRAEAOw==”}[/IMG2]
There is some increase in cpu usage but not real pressure from cpu or memory side.
We are also running Zabbix agents in parallel on the same instances (which has been the monitoring solution up until now) and the graphs in zabbix do not show any gaps whatsoever:
[IMG2=JSON]{“data-align”:“none”,“data-size”:“full”,“src”:“https://www.percona.com/forums/image/gif;base64,R0lGODlhAQABAPABAP///wAAACH5BAEKAAAALAAAAAABAAEAAAICRAEAOw==”}[/IMG2]
[IMG2=JSON]{“data-align”:“none”,“data-size”:“full”,“src”:“https://www.percona.com/forums/image/gif;base64,R0lGODlhAQABAPABAP///wAAACH5BAEKAAAALAAAAAABAAEAAAICRAEAOw==”}[/IMG2]
Time in Zabbix is UTC and PMM is UTC+3.
Since zabbix agent is able to report metric for the same period, it makes me think that a problem is on pmm-client side which is not able to collect/send metrics under higher load. It’s like we lose metric when we most need them
I also checked the cpu/memory on pmm-server and it does not look constrained by cpu or memory.
I like the pmm monitoring but this issue just defeats its purpose. Any help is highly appreciated
Thanks