I installed PMM 1.0.4 yesterday. From the beginning when there were only one or two test hosts, the graphs were still working. But later they were not working when I added several hosts into pmm-server one by one.
pmm-admin list
RUNNING is all YES
pmm-admin check-network --no-emoji
Client → Server all is OK
Client ← Server all is PROBLEM ( I already checked the firewall which was stopped.)
The following is one of my graphs.
Please give me any advice, thanks so much. BTW, PMM is so cool:P
Watch this page [url]http://server/prometheus/targets[/url] to see endpoint status.
Also you can check the log file by entering container “docker exec -ti pmm-server bash”, then “vi /var/log/prometheus.log”.
This usually happen when there is a network latency between server and clients.
Another thing you can test whether 1s resolution is not too much for given system resources for monitor server (where container runs) and network latency.
You can try 5s and see if it works better [url]Percona Monitoring and Management
I found the following error in prometheus.log when graphs were not working.
time=“2016-09-28T08:13:57Z” level=error msg=“Storage needs throttling. Scrapes and rule evaluations will be skipped.” chunksToPersist=78816 maxChunksToPersist=524288 maxToleratedMemChunks=288358 memoryChunks=300294 source=“storage.go:707”
this afternoon I also tried to change the settings of prometheus.yml, such as scrape_intervals , scrape_timeout. And then to restart pmm-server the above issue is still there.
As your mention I added the option, the output is the following…(docker create met the same issue)
How many endpoints do you have in Prometheus? Or time series? (Prometheus dashboard).
May be it’s not enough memory 256M dedicated to Prometheus [url]Percona Monitoring and Management
Yesterday I removed all hosts from pmm-server and then added 5 new hosts back to pmm-server. Until now all graphs are working. As you said I went to prometheus/targets and found all endpoints whose state are UP except only one (42002/metrics-lr) that is DOWN, error is context deadline exceeded.
My PMM server is a virtual machine with 4G Ram, 2core. Before there was mysqld running on it. I already stopped it yesterday. I am not sure whether it is not enough resource for prometheus.
Otherwise, what does these metrics mean, metrics-hr, metrics-mr and metrics-lr ?
metrics-lr includes global variables and more intensive stats like table stats, user stats etc.
“pmm-admin add mysql --help” has the following flags:
–disable-binlogstats disable binlog statistics
–disable-processlist disable process state metrics
–disable-tablestats disable table statistics (disabled automatically with 10000+ tables)
–disable-userstats disable user statistics
How many tables do you have? SELECT COUNT(*) FROM information_schema.tables
For 5 hosts I recommend to bump Prometheus memory to 1024M as you say VM has 4G.
Looks like 5000 tables is still a lot to return various metrics on each. Disabling table stats (re-adding mysql:metrics with --disable-tablestats) should make mysql-lr job up.
Roman, anther issue happened again on mongodb graph I remember that the first time adding one mongodb server, all graphs were working… Today I also tried to add the mongo server to pmm server, but not all of graphs is working, such as command operations sec, document operations, getLastError-xxx, oplog insert time, Memory fault … no graph.
If you added mongodb instance w/o nodetype, replset flags etc. then you should see the graphs only on Standalone instance dashboard. We plan to make nodetype and replset auto-discovered so this is not needed.