Prometheus crashed on PMM 1.1.4.0

Gajendra · September 10, 2018, 5:58am

Hi,
i recently upgraded pmm server 1.10.0 to 1.14.0 and its working fine from last 4 days but today i wanted to see last 7 days data and suddenly prometheus consumed all available memory on server and got crashed.

find below prometheus log

time=“2018-09-10T09:48:33Z” level=info msg=“Checkpointing in-memory metrics and chunks…” source=“persistence.go:633”
time=“2018-09-10T09:48:33Z” level=info msg=“Done checkpointing in-memory metrics and chunks in 16.105302ms.” source=“persistence.go:665”
time=“2018-09-10T09:50:07Z” level=warning msg=“Storage has entered rushed mode.” chunksToPersist=0 memoryChunks=516221 source=“storage.go:1872” urgencyScore=1
time=“2018-09-10T09:50:09Z” level=info msg=“Completed full maintenance sweep through 2009 in-memory fingerprints in 33m52.365850069s.” source=“storage.go:1400”
time=“2018-09-10T09:50:20Z” level=info msg=“Storage has left rushed mode.” chunksToPersist=0 memoryChunks=516221 source=“storage.go:1862” urgencyScore=0
time=“2018-09-10T09:50:30Z” level=warning msg=“Storage has entered rushed mode.” chunksToPersist=0 memoryChunks=516221 source=“storage.go:1872” urgencyScore=1
time=“2018-09-10T09:50:32Z” level=info msg=“Completed full maintenance sweep through 2030 in-memory fingerprints in 12.646472451s.” source=“storage.go:1400”

time=“2018-09-10T09:50:37Z” level=error msg=“Storage needs throttling. Scrapes and rule evaluations will be skipped.” chunksToPersist=0 memoryChunks=516221 source=“storage.go:1011” urgencyScore=1

time=“2018-09-10T09:50:44Z” level=info msg=“Storage has left rushed mode.” chunksToPersist=0 memoryChunks=516221 source=“storage.go:1862” urgencyScore=0
time=“2018-09-10T09:50:54Z” level=warning msg=“Storage has entered rushed mode.” chunksToPersist=0 memoryChunks=516221 source=“storage.go:1872” urgencyScore=1
time=“2018-09-10T09:51:09Z” level=info msg=“Starting prometheus (version=1.8.2, branch=v1.8.2, revision=5211b96d4d1291c3dd1a569f711d3b301b635ecb)” source=“main.go:87”
time=“2018-09-10T09:51:09Z” level=info msg=“Build context (go=go1.10.1, user=Jenkins, date=20180828-08:18:03OURCE)” source=“main.go:88”
time=“2018-09-10T09:51:09Z” level=info msg=“Host details (Linux 3.10.0-693.17.1.el7.x86_64 #1 SMP Thu Jan 25 20:13:58 UTC 2018 x86_64 c9292d3484e1 (none))” source=“main.go:89”
time=“2018-09-10T09:51:09Z” level=info msg=“Loading configuration file /etc/prometheus1.yml” source=“main.go:254”
time=“2018-09-10T09:51:09Z” level=info msg=“Listening on :9094” source=“web.go:341”
time=“2018-09-10T09:51:10Z” level=info msg=“Loading series map and head chunks…” source=“storage.go:428”
time=“2018-09-10T09:51:10Z” level=warning msg=“Persistence layer appears dirty.” source=“persistence.go:846”
time=“2018-09-10T09:51:10Z” level=warning msg=“Starting crash recovery. Prometheus is inoperational until complete.” source=“crashrecovery.go:40”
time=“2018-09-10T09:51:10Z” level=warning msg=“To avoid crash recovery in the future, shut down Prometheus with SIGTERM or a HTTP POST to /-/quit.” source=“crashrecovery.go:41”
time=“2018-09-10T09:51:10Z” level=info msg=“Scanning files.” source=“crashrecovery.go:55”
time=“2018-09-10T09:51:12Z” level=warning msg=“Fingerprint 11bbc841c4918c8e assumed archived but couldn’t be found in archived index.” source=“crashrecovery.go:394”

Topic		Replies	Views
PMM is losing instances PMM 1.x	34	2815	July 24, 2017
prometheus got killed because of large usage of memory PMM 1.x	2	1205	October 25, 2017
prometheus high cpu PMM 1.x	6	2177	April 19, 2018
PMM Server 2 Eats all memory PMM 2.x pmm	3	1045	November 19, 2020
Could we stop Prometheus in docker gracefully? PMM 1.x	5	2163	January 11, 2018

Prometheus crashed on PMM 1.1.4.0

Related topics