my graphs in Grafana sometimes working but sometimes not

seagen · September 27, 2016, 10:55pm

I installed PMM 1.0.4 yesterday. From the beginning when there were only one or two test hosts, the graphs were still working. But later they were not working when I added several hosts into pmm-server one by one.

pmm-admin list

RUNNING is all YES

pmm-admin check-network --no-emoji

Client → Server all is OK
Client ← Server all is PROBLEM ( I already checked the firewall which was stopped.)

The following is one of my graphs.

Please give me any advice, thanks so much. BTW, PMM is so cool:P

photoid=46015

weber · September 28, 2016, 12:44am

Watch this page [url]http://server/prometheus/targets[/url] to see endpoint status.
Also you can check the log file by entering container “docker exec -ti pmm-server bash”, then “vi /var/log/prometheus.log”.

This usually happen when there is a network latency between server and clients.

Another thing you can test whether 1s resolution is not too much for given system resources for monitor server (where container runs) and network latency.
You can try 5s and see if it works better [url]Percona Monitoring and Management

seagen · September 28, 2016, 3:36am

I found the following error in prometheus.log when graphs were not working.

time=“2016-09-28T08:13:57Z” level=error msg=“Storage needs throttling. Scrapes and rule evaluations will be skipped.” chunksToPersist=78816 maxChunksToPersist=524288 maxToleratedMemChunks=288358 memoryChunks=300294 source=“storage.go:707”

this afternoon I also tried to change the settings of prometheus.yml, such as scrape_intervals , scrape_timeout. And then to restart pmm-server the above issue is still there.

As your mention I added the option, the output is the following…(docker create met the same issue)

docker run -d -p 80:80 -m METRICS_RESOLUTION=5s --volumes-from pmm-data2 --name pmm-server2 --restart always percona/pmm-server:1.0.4

docker: invalid size: ‘METRICS_RESOLUTION=5s’.
See ‘docker run --help’.

weber · September 28, 2016, 5:45am

How many endpoints do you have in Prometheus? Or time series? (Prometheus dashboard).
May be it’s not enough memory 256M dedicated to Prometheus [url]Percona Monitoring and Management

seagen · September 28, 2016, 10:07pm

Morning Roman…thanks so much for your nice hints.

Yesterday I removed all hosts from pmm-server and then added 5 new hosts back to pmm-server. Until now all graphs are working. As you said I went to prometheus/targets and found all endpoints whose state are UP except only one (42002/metrics-lr) that is DOWN, error is context deadline exceeded.

My PMM server is a virtual machine with 4G Ram, 2core. Before there was mysqld running on it. I already stopped it yesterday. I am not sure whether it is not enough resource for prometheus.

Otherwise, what does these metrics mean, metrics-hr, metrics-mr and metrics-lr ?

weber · September 29, 2016, 1:37am

metrics-hr - 1s resolution metrics
metrics-mr - 5s resolution metrics
metrics-lr - 60s resolution metrics

metrics-lr includes global variables and more intensive stats like table stats, user stats etc.

“pmm-admin add mysql --help” has the following flags:
–disable-binlogstats disable binlog statistics
–disable-processlist disable process state metrics
–disable-tablestats disable table statistics (disabled automatically with 10000+ tables)
–disable-userstats disable user statistics

How many tables do you have? SELECT COUNT(*) FROM information_schema.tables

For 5 hosts I recommend to bump Prometheus memory to 1024M as you say VM has 4G.

seagen · September 29, 2016, 3:18am

Almost 5000 tables are there in the instance.

Aye, I already start to learn prometheus, which is such a huge system and powerful…

weber · September 29, 2016, 3:23pm

Looks like 5000 tables is still a lot to return various metrics on each. Disabling table stats (re-adding mysql:metrics with --disable-tablestats) should make mysql-lr job up.

seagen · September 29, 2016, 9:49pm

You are right Roman. metrics-lr now is up with --disable-tablestats. Thanks a lot!

weber · September 30, 2016, 1:09am

Thanks for checking, I think we should lower the count of tables when table stats is disabled automatically.

seagen · September 30, 2016, 2:11am

That should be nice.

Roman, anther issue happened again on mongodb graph I remember that the first time adding one mongodb server, all graphs were working… Today I also tried to add the mongo server to pmm server, but not all of graphs is working, such as command operations sec, document operations, getLastError-xxx, oplog insert time, Memory fault … no graph.

And then I went to [url]http://server/prometheus/graph[/url], manually executed the metrics. I could get values. Please give some advice.

Thanks.

weber · September 30, 2016, 4:19am

If you added mongodb instance w/o nodetype, replset flags etc. then you should see the graphs only on Standalone instance dashboard. We plan to make nodetype and replset auto-discovered so this is not needed.

seagen · September 30, 2016, 4:34am

I already added --replset repset --nodetype mongod --uri mongodb://xxxx

Before I could get all graphs on ReplSet type. Right now the above mentioned graphs are empty both on Standalone instance and Replica set.

Topic		Replies	Views
Grafana Graph works only sporadic PMM 1.x	10	1236	October 27, 2016
PMM graphs look strange PMM 1.x	6	744	June 1, 2017
MySQL RDS graphs are empty & hosts servers have disappeared in grafana PMM 1.x	2	663	June 1, 2017
PMM is losing instances PMM 1.x	34	2815	July 24, 2017
Grafana has failed to load Percona Monitoring and Management (PMM)	20	2737	October 15, 2023

my graphs in Grafana sometimes working but sometimes not

pmm-admin list

pmm-admin check-network --no-emoji

I found the following error in prometheus.log when graphs were not working.

docker run -d -p 80:80 -m METRICS_RESOLUTION=5s --volumes-from pmm-data2 --name pmm-server2 --restart always percona/pmm-server:1.0.4

Related topics