PMM 3.3.0 to 3.3.1 upgrade problem when restarting

Linux Ubuntu 24.04.2 LTS

Docker version 27.5.1

PMM 3.3.0 standard PMM installation performed, and worked well BUT

We triyed to upgrade to 3.3.1, via UI, process starts, then stays forever on “Restarting PMM…”

docker image ls show pmm-server:3.3.1, so ok

docker container ls show pmm-server:3.3.1 up for 32 minutes (unhealthy) !!

docker container logs pmm-server show multiple lines like :

INFO spawned: ‘grafana’ with pid xxxxxINFO exited: grafana (exit status 1; not expected)

then :

INFO gave up: grafana entered FATAL state, too many start retries too quickly
INFO exited: pmm-init (exit status 2; not expected)
INFO spawned: ‘pmm-init’ with pid 3024
INFO success: pmm-init entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
INFO exited: pmm-managed (exit status 1; not expected)
INFO spawned: ‘pmm-managed’ with pid 3366
INFO success: pmm-managed entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
INFO exited: pmm-init (exit status 2; not expected)

Seems Upgrade from UI does works …

What can we do ?

Francis

Inspecting pmm-server docker logs

Hi, can you share logs from /srv/logs/pmm-init.log, /srv/logs/pmm-managed.log, /srv/logs/postgresql14.log and /srv/logs/grafana.log? Thank you

Here are the logs.

postgresql14.log (42.0 KB)

pmm-managed.log (8.5 MB)

pmm-init.log (109.8 KB)

grafana log is too big, so truncated to last reccurent errors (if you need complete file, please tell me how to send to you) (compressing log is only 2mb size but not allowed here)

grafana.log (3.0 MB)

In order to try something, i stopped the pmm-server container and restarted it

UI said for 1 minute ‘upgrade in progress’ then the UI started normaly and all seems ok !

Now i have to upgrade the PMM clients (i hope with no problems :slight_smile: )

As i should upgrade the same pmm-server in produciton (3.3.0 to 3.3.1), (the current server is a ‘test’ server’) can you please still examine the logs to see if something explaines the failure ?

Thanks

Francis

Seems like

2025-08-07 09:45:47.280 UTC [644] FATAL:  lock file "postmaster.pid" already exists
2025-08-07 09:45:47.280 UTC [644] HINT:  Is another postmaster (PID 14) running in data directory "/srv/postgres14"?

caused the problem. removing this lock file should fix the problem.
Seems like some race condition happened during initialization and that’s why this lock file wasn’t removed.
it’s great that you could fix you problem by restarting PMM.