prometheus high cpu

The following host load conditions

![5J~]`@627170GAUL@KZHGBM.png|690x383](upload://1uxBEM23ATwcB1hDlEXC0jx1E9l.png)

![J2]LQ%]T7D42Z5`P(FH231.png|472x500

what problems do you have (except high cpu numbers)?
do you see any issues in prometheus log?
you can open it with the following command help

docker exec -it pmm-server less /var/log/prometheus.log

We are seeing the same issue since moving to 1.2.0. We removed pmm-data volumes and started fresh two days ago (7-18 roughly 12pm) . Prometheous CPU usage jumped, disk io and load climbing steadily since install time.

prometheus.log is filled with the following:

time=“2017-07-20T18:40:10Z” level=warning msg=“Scrape duration sample discarded” error=“sample timestamp out of order” sample=scrape_duration_seconds{instance=“db2”, job=“mysql”} => 0.882654579 @[1500576009.988] source=“scrape.go:590”
time=“2017-07-20T18:40:10Z” level=warning msg=“Scrape sample count sample discarded” error=“sample timestamp out of order” sample=scrape_duration_seconds{instance=“db2”, job=“mysql”} => 0.882654579 @[1500576009.988] source=“scrape.go:593”
time=“2017-07-20T18:40:10Z” level=warning msg=“Scrape sample count post-relabeling sample discarded” error=“sample timestamp out of order” sample=scrape_duration_seconds{instance=“db2”, job=“mysql”} => 0.882654579 @[1500576009.988] source=“scrape.go:596”
time=“2017-07-20T18:40:12Z” level=warning msg=“Storage has entered rushed mode.” chunksToPersist=1032 memoryChunks=37175 source=“storage.go:1867” urgencyScore=0.803
time=“2017-07-20T18:40:12Z” level=info msg=“Completed initial partial maintenance sweep through 763 in-memory fingerprints in 25.691535331s.” source=“storage.go:1398”
time=“2017-07-20T18:40:14Z” level=info msg=“Storage has left rushed mode.” chunksToPersist=1002 memoryChunks=37242 source=“storage.go:1857” urgencyScore=0.569

Time is synced between hosts and within docker, except docker is on UTC.

nms1:~ : date
Thu Jul 20 12:32:37 PDT 2017

nms1:~ : ssh db1 date
Thu Jul 20 12:32:37 PDT 2017

nms1:~ : ssh db2 date
Thu Jul 20 12:32:37 PDT 2017

nms1:~ : sudo docker exec -it pmm-server date
Thu Jul 20 19:32:38 UTC 2017

Issue is resolved. Removed all client services (pmm-admin rm --all), removed pmm-server and pmm-data, then added pmm-server with -e METRICS_RESOLUTION=5s -e METRICS_MEMORY=786432 options, then added back all clients same as before. Adding those two options seems to have done the trick. Load and disk io on Prometheus server is fine and steady, and no more storage rushed mode. Still seeing “sample timestamp out of order” and “sample discarded” messages though.

cloud-admin@sbwell.com, thank you for your feedback. we are going to increase METRICS_MEMORY default value soon.

Is there any update about “sample timestamp out of order” log?

thanks

We are seeing excessive CPU use from prometheus and there are gaps in the data. Where is the log hidden in the current version (1.9.1)? There is no prometheus.log in /var/log or anywhere else on the filesystem.