Need Advice on Disk Space Cleanup for PMM Server 2.x (Docker)

Hi PMM Community,

We are running PMM Server 2.x in Docker on RHEL and noticed that the PMM data volume has grown significantly over time.

Current environment:

  • PMM Server: 2.x (Docker deployment)
  • Docker image: percona/pmm-server:2
  • Uptime: approximately 2 years
  • Docker volume size:
docker system df -v

Local Volumes space usage:

VOLUME NAME   LINKS   SIZE
pmm-data      1       110.1GB

The container writable layer is only about 5.5 GB, so most of the storage consumption appears to be inside the persistent pmm-data volume.

Before performing any cleanup, we would like to understand the recommended and supported way to reclaim disk space safely.

Questions:

  1. What is the recommended procedure to identify which PMM component is consuming the most storage (VictoriaMetrics, ClickHouse/QAN, logs, etc.)?
  2. Is there a supported method to reduce storage usage without losing all historical monitoring data?
  3. Can retention policies be adjusted retroactively to reclaim existing space?
  4. Are there any PMM maintenance commands or cleanup procedures recommended by Percona?
  5. Has anyone successfully reduced a large PMM data volume while preserving recent monitoring history?

Any guidance or best practices would be greatly appreciated.

Thank you.

@Ly_Kimmeng

What is the recommended procedure to identify which PMM component is consuming the most storage (VictoriaMetrics, ClickHouse/QAN, logs, etc.)?

By executing the command below, you can quickly check exactly what component is taking majority of storage space.

**du -sh /srv/* | sort -hr**
93M	/srv/grafana
87M	/srv/postgres14
27M	/srv/clickhouse
13M	/srv/victoriametrics
6.3M	/srv/logs
52K	/srv/nginx
8.0K	/srv/prometheus
8.0K	/srv/pmm-agent
8.0K	/srv/nomad
8.0K	/srv/alerting
4.0K	/srv/pmm-encryption.key
4.0K	/srv/pmm-distribution
4.0K	/srv/backup

Is there a supported method to reduce storage usage without losing all historical monitoring data?
Can retention policies be adjusted retroactively to reclaim existing space?

If the [/srv/logs] directory consumes space, related files can be truncated; however, for the PMM component, any cleanup may result in the loss of historical metrics.

Below is the documentation which better guides in dealing with high PMM disk usage. You can change the retention policy or limit tablestat/disable it as well if feasible for you.

VictoriaMetrics: downsampling feature

Limiting or disabling tablestat, especially if having tons of tables to be monitored there

Metric history/Data retention

Are there any PMM maintenance commands or cleanup procedures recommended by Percona?

If you could clarify exactly which component is taking up space, we may be able to better guide you on the appropriate solution. Please share the output of the initial command we shared above.

Has anyone successfully reduced a large PMM data volume while preserving recent monitoring history?

Yes, the above procedure I shared will help you in lowering the hgh disk usage to some extent. But again if there are many Client say [500 or more] connect to a single PMM-Server the disk consumption will considerable high comparing to a setup where having a few PMM Client/database boxes. How many PMM Client/Agent you having which connectes to this PMM Server ?

Also, there are some changes introduced in PMM ≥ 2.41.0 which can cause rapid table size growth for table:asynchronous_metric_log. Bye the way, what PMM version you are using ?

Hi Anil,

Thank you for your guidance.

We are currently running PMM 2.44.0.

The storage usage on our PMM server is:

71G     /srv/clickhouse
31G     /srv/victoriametrics
849M    /srv/logs
528M    /srv/grafana
180M    /srv/postgres14

It appears that ClickHouse and VictoriaMetrics are consuming most of the storage.

We have already tried adjusting the PMM data retention settings, but we have not observed any significant reduction in disk usage.

Could you please advise:

  1. Is the large ClickHouse size expected for PMM 2.44.0, or could it be related to excessive QAN data retention?
  2. Are there any recommended cleanup or optimization procedures for ClickHouse that can safely reclaim space?
  3. Could PMM 2.44.0 be affected by the asynchronous_metric_log growth issue you mentioned? If so, how can we verify whether this table is contributing to the storage consumption?
  4. Does reducing retention only affect newly collected data, or should older data be automatically purged and disk space reclaimed?
  5. Is there a manual process to force retention cleanup or compaction for ClickHouse and VictoriaMetrics?
  6. Are there any known issues in PMM 2.44.0 where retention settings do not immediately reclaim disk space?
  7. We currently monitor approximately PMM clients/agents. Is this storage growth considered normal for that scale?

Any recommendations to reduce storage usage while preserving recent monitoring history would be greatly appreciated.

Best regards,
Kimmeng