prometheus high cpu

liuqian · June 8, 2017, 4:30am

The following host load conditions

![5J~]`@627170GAUL@KZHGBM.png|690x383](upload://1uxBEM23ATwcB1hDlEXC0jx1E9l.png)

![J2]LQ%]T7D42Z5`P(FH231.png|472x500

Mykola · June 8, 2017, 7:37am

what problems do you have (except high cpu numbers)?
do you see any issues in prometheus log?
you can open it with the following command help

docker exec -it pmm-server less /var/log/prometheus.log

cloud-admin · July 20, 2017, 2:04pm

We are seeing the same issue since moving to 1.2.0. We removed pmm-data volumes and started fresh two days ago (7-18 roughly 12pm) . Prometheous CPU usage jumped, disk io and load climbing steadily since install time.

prometheus.log is filled with the following:

time=“2017-07-20T18:40:10Z” level=warning msg=“Scrape duration sample discarded” error=“sample timestamp out of order” sample=scrape_duration_seconds{instance=“db2”, job=“mysql”} => 0.882654579 @[1500576009.988] source=“scrape.go:590”
time=“2017-07-20T18:40:10Z” level=warning msg=“Scrape sample count sample discarded” error=“sample timestamp out of order” sample=scrape_duration_seconds{instance=“db2”, job=“mysql”} => 0.882654579 @[1500576009.988] source=“scrape.go:593”
time=“2017-07-20T18:40:10Z” level=warning msg=“Scrape sample count post-relabeling sample discarded” error=“sample timestamp out of order” sample=scrape_duration_seconds{instance=“db2”, job=“mysql”} => 0.882654579 @[1500576009.988] source=“scrape.go:596”
time=“2017-07-20T18:40:12Z” level=warning msg=“Storage has entered rushed mode.” chunksToPersist=1032 memoryChunks=37175 source=“storage.go:1867” urgencyScore=0.803
time=“2017-07-20T18:40:12Z” level=info msg=“Completed initial partial maintenance sweep through 763 in-memory fingerprints in 25.691535331s.” source=“storage.go:1398”
time=“2017-07-20T18:40:14Z” level=info msg=“Storage has left rushed mode.” chunksToPersist=1002 memoryChunks=37242 source=“storage.go:1857” urgencyScore=0.569

Time is synced between hosts and within docker, except docker is on UTC.

nms1:~ : date
Thu Jul 20 12:32:37 PDT 2017

nms1:~ : ssh db1 date
Thu Jul 20 12:32:37 PDT 2017

nms1:~ : ssh db2 date
Thu Jul 20 12:32:37 PDT 2017

nms1:~ : sudo docker exec -it pmm-server date
Thu Jul 20 19:32:38 UTC 2017

cloud-admin · July 21, 2017, 10:45am

Issue is resolved. Removed all client services (pmm-admin rm --all), removed pmm-server and pmm-data, then added pmm-server with -e METRICS_RESOLUTION=5s -e METRICS_MEMORY=786432 options, then added back all clients same as before. Adding those two options seems to have done the trick. Load and disk io on Prometheus server is fine and steady, and no more storage rushed mode. Still seeing “sample timestamp out of order” and “sample discarded” messages though.

Mykola · July 24, 2017, 2:25am

cloud-admin@sbwell.com, thank you for your feedback. we are going to increase METRICS_MEMORY default value soon.

bckim · December 12, 2017, 2:33am

Is there any update about “sample timestamp out of order” log?

thanks

Registering_Sucks · April 19, 2018, 10:16am

We are seeing excessive CPU use from prometheus and there are gaps in the data. Where is the log hidden in the current version (1.9.1)? There is no prometheus.log in /var/log or anywhere else on the filesystem.

Topic		Replies	Views
extremely high cpu cost PMM 1.x	6	1433	September 10, 2017
PMM is losing instances PMM 1.x	34	2815	July 24, 2017
some errors are ouccured in prometheus log PMM 1.x	1	904	December 24, 2017
Prometheus Log Warning. PMM 1.x	4	1859	March 15, 2019
Prometheus crashed on PMM 1.1.4.0 PMM 1.x	0	548	September 10, 2018

prometheus high cpu

Related topics