PMM 3.6.0 – ClickHouse background merge loop on system.metric_log causing 100% CPU and repeated MEMORY_LIMIT_EXCEEDED

Henryx · March 13, 2026, 10:57am

Hello,

I am experiencing a severe performance issue with PMM 3.6.0 where the server becomes extremely slow and CPU usage reaches 100% continuously.

Investigation shows that ClickHouse repeatedly attempts background merges on the system.metric_log table, but every merge attempt fails with a MEMORY_LIMIT_EXCEEDED exception. The failed merge is immediately retried, resulting in an infinite loop that consumes CPU.

This effectively makes the PMM instance almost unusable.

Environment

PMM version: 3.6.0
ClickHouse version bundled with PMM: 25.3.6.56
Deployment: containerized environment (Podman)

No unusual workload was running when the issue appeared.

Observed behaviour

ClickHouse repeatedly schedules merges on the system.metric_log table:

system.metric_log (MergerMutator): Selected parts for merge
system.metric_log (MergerMutator): Merged parts

However the merge then fails with:

Code: 241. DB::Exception: (total) memory limit exceeded
would use 5.84 GiB (attempt to allocate chunk of ~4 MiB)
current RSS: ~1.3 GiB
maximum: 5.84 GiB

The error occurs during the merge execution phase:

While executing MergeTreeSequentialSource
while reading from part ... in table system.metric_log

Immediately after the failure, ClickHouse schedules the same merge again, which fails in the same way. This cycle repeats continuously.

As a result:

ClickHouse background threads consume full CPU
PMM UI becomes very slow
the system remains in a constant retry loop

We face memory issue since we upgraded PMM server & clients from 3.4.0 to 3.6.0 and it re-appears again after re-deployment of server containers.

Are you aware of this issue ?

Thanks in advance for any help

anil.joshi · March 15, 2026, 9:34am

@Henryx

I see some fixes proposed for v3.7.0 that should address these problems. Meanwhile, you can test with the mentioned changes to see if it helps.

https://perconadev.atlassian.net/issues?jql=textfields%20~%20"clickhouse%20memory*"&selectedIssue=PMM-14722

When using ClickHouse with less than 16GB of RAM, we recommend the following:

Lower the size of the mark cache in the config.xml. It can be set as low as 500 MB, but it cannot be set to zero.

Lower the number of query processing threads down to 1.

Lower the max_block_size to 8192. Values as low as 1024 can still be practical.

Lower max_download_threads to 1.

Set input_format_parallel_parsing and output_format_parallel_formatting to 0.

disable writing in log tables, as it keeps the background merge task reserving RAM to perform merges of log tables. Disable asynchronous_metric_log, metric_log, text_log, trace_log.

The similar memory-exceed issue discussed here: https://perconadev.atlassian.net/browse/PMM-14788 & PMM-14788 Increase memory resources for ClickHouse by ademidoff · Pull Request #779 · percona/percona-helm-charts · GitHub as well.

Increased the limits for ClickHouse server resources to 8Gi memory and 4 CPU, up from 4Gi and 2 respectively.

Removed the explicit requests for memory and CPU, which previously reserved 1Gi memory and 500m CPU.

Did you set any resource quota for Docker/Podman? What are the OS resources (CPU/Memory) ?

Topic		Replies	Views
PMM eat all memory PMM 1.x	10	2058	March 27, 2017
Metrics loss on pmm-server PMM 1.x	3	1067	November 10, 2016
Pmm client getting OOM after few days in everest deployed cluster Percona Everest pmm , mysql	8	522	January 18, 2025
Pmm-agent and insane memory usage PMM 2.x	5	2114	December 16, 2020
Pmm-client is consuming Memory PMM 1.x	5	1057	August 14, 2018

PMM 3.6.0 – ClickHouse background merge loop on system.metric_log causing 100% CPU and repeated MEMORY_LIMIT_EXCEEDED

Environment

Observed behaviour

Related topics