We are using pmm-server (1.x) for two years in one docker container with this specs:
8GB of RAM
23 clients configured ( almost 200 databases)
All the customers have configured pmm-agent with mysql.
All works perfect.
But we upgraded our container to pmm-server 2.11. After all the configuration, our docker container dies in 3 minutes.
After some research whe see with the same scenario can only configure 6 clients (pmm-admin). If we configure more than these, our containers eats all memory and docker kill the container (OOM)
Is there any option with version 2 i missing?
We are planning to rollback to the previus version of pmm. If we can take the same configuration with version 2
Attach our docker stats with 6 pmm-clients configured:
Prometheus will be replaced by VictoriaMetrics for metrics storing in the next pmm-server release.
It will significantly reduce memory usage.
pmm-server 2.12 is planed to be released in two weeks.
Also I have linked your post to the ticket that is related to checking Memory Consumption with many nodes.
your consumption sounds really high…I’m monitoring 10 nodes and using ~750MB on my pmm-server container, our pmmdemo site is monitoring 30 nodes and only using about 4.2GB. We have a general recommendation here on sizing but it’s definitely padded to be safe.
I’m curious to know what’s consuming all the memory, could you do the following and let us know:
get a bash prompt inside the docker container: docker exec -it pmm-server bash
and see what the main consumer of memory is: ps -aux
That would at least give a place to start checking logs for something running haywire.
My immediate questions would be around: scrape interval: do you have things collecting at the highest resolution? Have you installed any additional grafana plugins? How did you transition to PMM2 from PMM1: whole new server, sid-by-side. What is your retention period and how active are users in consuming the graphs?
Those are all the things I can think of off the top of my head that would be the big offenders. VictoriaMetrics will certainly reduce the footprint but sounds like there’s something else going wrong.
We created a new container following the pmm manual installation. We not installed any component/plugin to grafana. Only the components with the container.
Here is the screen capture of ps -aux:
Seams Prometheus is eating all the memory with this new version. Only two users can access to this server and the use is very lowest. May be access one time a day.