I’ve been running PMM Docker instance for over 5 months now, and the 80GB disk has filled up.
In the beginning the container was started with QUERY/METRICS retention of 1 year, only to be reduced to 3 months, and then 2 months a week ago. According to my findings on the forums here, that should have resulted in freeing up some of the storage, but that didn’t happen, so now I’m at <4% of free space.
Can someone advise on how to clean up the stale data up to a certain point in time, and free up disk space?
The problem #1 with adding more storage is that RackSpace gave me 500GB storage with 30MB/s sequential write performance, while pretending there’s nothing wrong with that.
Problem #2 is that I can communicate with RackSpace only through a proxy, so it’s a ‘human factor’ that I have to deal with, and it’d make my life so much easier to get this sorted out in software.
Hi milosb , interesting - this isn’t how I expected the Prometheus data directory to work. What version are you currently running of PMM? From that I can figure out your Prometheus version.
I’ll also attempt to change the retention period in a test environment and report back my findings. Thanks for bringing this up!
Few probably important things I forgot to mention:
PMM is monitoring 4 servers in total: 3 Galera cluster nodes, and a replication slave (That shouldn’t produce too much data, right? Even in spite of the point below)
The settings are default, except for: innodb_monitor_enable=‘all’ and log_slow_verbosity=‘full’
There’s a custom dashboard with alerts configured, twelve of them to be exact,… and all except two (the DDL statements count) are generic.
Michael Coburn
Back when clean-up was intended by reducing the metrics retention to 3, 2, and then 1 month, versions were 1.6.1, and 1.7.0. Funny thing is that the clean-up
worked on the staging PMM instance: 6h after I reduced the QUERY/METRICS_RETENTION to 720h, nearly 30GB was freed up on an 80GB disk.
With that in mind, it just didn’t make sense for this not to work for the production server: I keep shortening the retention period, while disk usage by PMM keeps growing. And I’m sure it’s not something else eating up the space, like a rogue log file… So maybe it’s how much data I’m collecting by using settings above.
Anyhow, my storage space problem is (temporarily?) solved - I’ve resorted to painfully slow process of moving Docker data to the add-on volume (at 30MB/s), to see how it works after. The IOPS performance of the volume, which I initially overlooked, was indeed that of an SSD, although capped to ~3k,… just like the (sequential) throughput is.
Hi milosb , I’m glad you found a workaround! I am curious where the actual disk usage is coming from, I entirely suspect Prometheus but if you could help us verify that would be appreciated. This command should work, just change the docker container name as needed. Note my usage is very low since I just spun up this example instance this morning:
Before the migration, I’m pretty sure /opt/prometheus/data was stuck at >70GB. My being newbie with Docker resulted in moving /var/lib/docker contents to new disk using ‘tar’.
The result: 100GB sparse file (/var/lib/docker/devicemapper/devicemapper/data) is no longer sparse. So I also need to figure out if ‘fallocate -p /var/lib/docker/devicemapper/devicemapper/data’, after stopping docker service will make the file sparse again without making any damage.