PMM 2.28.0 ->2.34.0 upgrade issues

Hello folks. If found that current PMM version we are using is pretty old. I am using docker container on ec2 instance. I tried to update it, but I faced 2 issues:

  1. it’s a cosmetic issue. PMM UI shows that my 2.28.0 version is the latest and there is nothing to update
  2. using the troubleshooting doc I forced the upgrade button with Alt(Option on mac) button. But the upgrade process was stacked in an infinite loop. Here is the log of updates:
ProjectName: pmm-update
Version: 2.34.0
PMMVersion: 2.34.0
Timestamp: 2023-01-13 13:47:14 (UTC)
FullCommit: a7f5d228d0a79172ecaed3760f3b6c8f04f80709
Starting "yum --verbose info installed pmm-update" ...
Loading "changelog" plugin
Loading "fastestmirror" plugin
Loading "ovl" plugin
Config time: 0.015
rpmdb time: 0.000
ovl: Copying up (0) files from OverlayFS lower layer
Yum version: 3.4.3
Installed Packages
Name        : pmm-update
Arch        : noarch
Version     : 2.34.0
Release     : 67.2301131347.a7f5d22.el7
Size        : 2.1 M
Repo        : installed
From repo   : pmm2-server
Committer   : Michal Kralik <michal.kralik@percona.com>
Committime  : Thu Dec  8 12:00:00 2022
Buildtime   : Fri Jan 13 13:47:15 2023
Install time: Tue Feb 14 09:01:17 2023
Installed by: System <unset>
Changed by  : System <unset>
Summary     : Tool for updating packages and OS configuration for PMM Server
URL         : https://github.com/percona/pmm
License     : AGPLv3
Description : Tool for updating packages and OS configuration for PMM Server

Starting "yum update --assumeyes pmm-update" ...
Loaded plugins: changelog, fastestmirror, ovl
Loading mirror speeds from cached hostfile
 * base: mirror.centos.org
 * epel: dl.fedoraproject.org
 * extras: mirror.centos.org
 * updates: mirror.centos.org
No packages marked for update
Starting "yum --verbose info installed pmm-update" ...
Loading "changelog" plugin
Loading "fastestmirror" plugin
Loading "ovl" plugin
Config time: 0.015
rpmdb time: 0.000
ovl: Copying up (0) files from OverlayFS lower layer
Yum version: 3.4.3
Installed Packages
Name        : pmm-update
Arch        : noarch
Version     : 2.34.0
Release     : 67.2301131347.a7f5d22.el7
Size        : 2.1 M
Repo        : installed
From repo   : pmm2-server
Committer   : Michal Kralik <michal.kralik@percona.com>
Committime  : Thu Dec  8 12:00:00 2022
Buildtime   : Fri Jan 13 13:47:15 2023
Install time: Tue Feb 14 09:01:17 2023
Installed by: System <unset>
Changed by  : System <unset>
Summary     : Tool for updating packages and OS configuration for PMM Server
URL         : https://github.com/percona/pmm
License     : AGPLv3
Description : Tool for updating packages and OS configuration for PMM Server

time="2023-02-14T09:02:44Z" level=info msg="pmm-update:\nbefore update = {Version:2.34.0 FullVersion:2.34.0-67.2301131347.a7f5d22.el7 BuildTime:2023-01-13 13:47:15 +0000 UTC Repo:pmm2-server}\n after update = {Version:2.34.0 FullVersion:2.34.0-67.2301131347.a7f5d22.el7 BuildTime:2023-01-13 13:47:15 +0000 UTC Repo:pmm2-server}"
time="2023-02-14T09:02:44Z" level=info msg="Version did not change."
Starting "ansible-playbook --flush-cache /usr/share/pmm-update/ansible/playbook/tasks/update.yml" ...
[WARNING]: provided hosts list is empty, only localhost is available. Note that
the implicit localhost does not match 'all'

PLAY [localhost] ***************************************************************

TASK [Gathering Facts] *********************************************************
ok: [localhost]

TASK [detect /srv/pmm-distribution] ********************************************
ok: [localhost]

TASK [detect containers] *******************************************************
ok: [localhost]

TASK [force container] *********************************************************
skipping: [localhost]

TASK [Remove percona-dashboard without architecture] ***************************
ok: [localhost]

TASK [Update percona-dashboards package] ***************************************
fatal: [localhost]: FAILED! => {"changed": true, "changes": {"installed": [], "updated": [["percona-grafana", "9.2.5-97.2301101419.46ded50.el7.x86_64 from pmm2-server"]]}, "msg": "Error unpacking rpm package percona-grafana-9.2.5-97.2301101419.46ded50.el7.x86_64\npercona-grafana-8.3.5-95.2205101657.f666e10.el7.x86_64 was supposed to be removed but is not!\n", "rc": 1, "results": ["All packages providing percona-dashboards are up to date", "Loaded plugins: changelog, fastestmirror, ovl\nLoading mirror speeds from cached hostfile\n * base: mirror.centos.org\n * epel: dl.fedoraproject.org\n * extras: mirror.centos.org\n * updates: mirror.centos.org\nPackage percona-dashboards-2.34.0-19.2301090609.8d63972.el7.x86_64 already installed and latest version\nResolving Dependencies\n--> Running transaction check\n---> Package percona-grafana.x86_64 0:8.3.5-95.2205101657.f666e10.el7 will be updated\n---> Package percona-grafana.x86_64 0:9.2.5-97.2301101419.46ded50.el7 will be an update\n--> Finished Dependency Resolution\n\nDependencies Resolved\n\n================================================================================\n Package          Arch    Version                            Repository    Size\n================================================================================\nUpdating:\n percona-grafana  x86_64  9.2.5-97.2301101419.46ded50.el7    pmm2-server   80 M\n\nTransaction Summary\n================================================================================\nUpgrade  1 Package\n\nTotal download size: 80 M\nDownloading packages:\nDelta RPMs disabled because /usr/bin/applydeltarpm not installed.\nRunning transaction check\nRunning transaction test\nTransaction test succeeded\nRunning transaction\n  Updating   : percona-grafana-9.2.5-97.2301101419.46ded50.el7.x86_64       1/2 \nerror: unpacking of archive failed on file /etc/grafana/ldap.toml: cpio: rename\n  Verifying  : percona-grafana-8.3.5-95.2205101657.f666e10.el7.x86_64       1/2 \n  Verifying  : percona-grafana-9.2.5-97.2301101419.46ded50.el7.x86_64       2/2 \n\nFailed:\n  percona-grafana.x86_64 0:8.3.5-95.2205101657.f666e10.el7                      \n  percona-grafana.x86_64 0:9.2.5-97.2301101419.46ded50.el7                      \n\nComplete!\n"]}

PLAY RECAP *********************************************************************
localhost                  : ok=4    changed=0    unreachable=0    failed=1    skipped=1    rescued=0    ignored=0   

time="2023-02-14T09:03:08Z" level=fatal msg="RunPlaybook failed: exit status 2"
ProjectName: pmm-update

I also used the method where I remove the old 2.28.0 docker container (pmm-server) and just run a modern one. It showed unhealthy status and UI showed ERROR: 500. As I understand PostgreSQL couldn’t start. Empty deployment without pmm-data - works fine, but it’s not a good way for PROD

Any ideas/suggestions on how to resolve it?

Hi @Stateros thank you for posting to the Percona forums

Apologies for the silly statement but did you try restarting the container again and wait? Does it remain in broken state or does it eventually recover?

Could you share the contents of the log file /srv/logs/postgresql.log
As well as the output of docker exec -it pmm-server supervisorctl status

Another upgrade option could be to go version by version to get from 2.28 → 2.34. At least this way we could see if there is a version in between that is causing the issue.

But you are correct, this shouldn’t happen, it should be smooth between versions and leave you with an upgraded PMM server along with no data loss. lets see if we can identify a smooth upgrade path for you :slight_smile:

Hello @Michael_Coburn sure I did restart few times, install from scratch stop/start, all the time if pmm-data has some data from 2.28.0 version new version of pmm-server is broken

  1. ~# docker exec -it pmm-server supervisorctl status
    alertmanager RUNNING pid 46, uptime 5 days, 0:11:49
    clickhouse RUNNING pid 30, uptime 5 days, 0:11:49
    dbaas-controller STOPPED Not started
    grafana RUNNING pid 38, uptime 5 days, 0:11:49
    nginx RUNNING pid 92, uptime 5 days, 0:11:49
    pmm-agent RUNNING pid 66, uptime 5 days, 0:11:49
    pmm-managed RUNNING pid 58, uptime 5 days, 0:11:49
    pmm-update-perform RUNNING pid 1815934, uptime 0:00:12
    pmm-update-perform-init EXITED Feb 09 03:47 PM
    postgresql RUNNING pid 29, uptime 5 days, 0:11:49
    prometheus STOPPED Not started
    qan-api2 RUNNING pid 145, uptime 5 days, 0:11:48
    victoriametrics RUNNING pid 40, uptime 5 days, 0:11:49
    vmalert RUNNING pid 41, uptime 5 days, 0:11:49

  2. ~# cat /srv/logs/postgresql.log
    2023-02-09 15:47:20.613 UTC [29] LOG: listening on IPv4 address “127.0.0.1”, port 5432
    2023-02-09 15:47:20.613 UTC [29] LOG: could not bind IPv6 address “::1”: Cannot assign requested address
    2023-02-09 15:47:20.613 UTC [29] HINT: Is another postmaster already running on port 5432? If not, wait a few seconds and retry.
    2023-02-09 15:47:20.620 UTC [29] LOG: listening on Unix socket “/var/run/postgresql/.s.PGSQL.5432”
    2023-02-09 15:47:20.628 UTC [29] LOG: listening on Unix socket “/tmp/.s.PGSQL.5432”
    2023-02-09 15:47:20.821 UTC [82] LOG: database system was shut down at 2022-05-10 17:18:56 UTC
    2023-02-09 15:47:20.842 UTC [29] LOG: database system is ready to accept connections
    2023-02-09 15:47:21.045 UTC [97] FATAL: role “pmm-managed” does not exist
    2023-02-09 15:47:21.413 UTC [98] ERROR: syntax error at or near “$1” at character 34
    2023-02-09 15:47:21.413 UTC [98] STATEMENT: GRANT ALL PRIVILEGES ON DATABASE $1 TO $2
    2023-02-09 15:47:22.471 UTC [160] ERROR: relation “schema_migrations” does not exist at character 16
    2023-02-09 15:47:22.471 UTC [160] STATEMENT: SELECT id FROM schema_migrations ORDER BY id DESC LIMIT 1

1 Like

Hmmm…that gives the appearance that Postgres DB isn’t there… but it says running!!!

The “is another postmaster already running” caught my eye…but no role pmm-managed? Can you check permissions on the /srv/postgres14 directory (docker exec -it pmm-server ll /srv/postgres14) should be owner and group postgres.

It also might be worth checking the pmm-managed.log as well for any errors during the upgrade process (docker exec -it pmm-server cat /srv/logs/pmm-managed).

1 Like

Hello @steve.hoffman

there are no /srv/prostgres14 in running docker container “pmm-server”, but /srv/postgres - exists and grants looks correct:

ll /srv/postgres/
total 120
drwx------ 6 postgres postgres 4096 Feb 9 15:47 base
drwx------ 2 postgres postgres 4096 Feb 14 13:02 global
drwx------ 2 postgres postgres 4096 May 10 2022 pg_commit_ts
drwx------ 2 postgres postgres 4096 May 10 2022 pg_dynshmem
-rw------- 1 postgres postgres 4513 May 10 2022 pg_hba.conf
-rw------- 1 postgres postgres 1636 May 10 2022 pg_ident.conf
drwx------ 4 postgres postgres 4096 Feb 14 15:49 pg_logical
drwx------ 4 postgres postgres 4096 Feb 9 15:47 pg_multixact
drwx------ 2 postgres postgres 4096 Feb 9 15:47 pg_notify
drwx------ 2 postgres postgres 4096 May 10 2022 pg_replslot
drwx------ 2 postgres postgres 4096 May 10 2022 pg_serial
drwx------ 2 postgres postgres 4096 May 10 2022 pg_snapshots
drwx------ 2 postgres postgres 4096 Feb 9 15:47 pg_stat
drwx------ 2 postgres postgres 4096 Feb 14 16:34 pg_stat_tmp
drwx------ 2 postgres postgres 4096 Feb 9 15:47 pg_subtrans
drwx------ 2 postgres postgres 4096 May 10 2022 pg_tblspc
drwx------ 2 postgres postgres 4096 May 10 2022 pg_twophase
-rw------- 1 postgres postgres 3 May 10 2022 PG_VERSION
drwx------ 3 postgres postgres 4096 Feb 9 15:47 pg_wal
drwx------ 2 postgres postgres 4096 Feb 9 15:47 pg_xact
-rw------- 1 postgres postgres 88 May 10 2022 postgresql.auto.conf
-rw------- 1 postgres postgres 23918 May 10 2022 postgresql.conf
-rw------- 1 postgres postgres 206 Feb 9 15:47 postmaster.opts
-rw------- 1 postgres postgres 92 Feb 9 15:47 postmaster.pid

And here is output by errors from pmm-managed.log I can’t find any usefull info. Current log file is around 46Mb.

cat /srv/logs/pmm-managed.log | grep error
WARN[2023-02-13T09:34:38.593+00:00] RPC /platform.Platform/UserStatus done in 2.731593ms with gRPC error: rpc error: code = Unauthenticated desc = Failed to get access token. Please sign in using your Percona Account. request=a2a065f7-ab81-11ed-adc8-0242ac110002
WARN[2023-02-13T09:35:10.553+00:00] RPC /server.Server/CheckUpdates done in 30.001145991s with gRPC error: rpc error: code = Unavailable desc = failed to check for updates request=a3cbd087-ab81-11ed-adc8-0242ac110002
WARN[2023-02-13T13:43:28.574+00:00] RPC /platform.Platform/UserStatus done in 3.471542ms with gRPC error: rpc error: code = Unauthenticated desc = Failed to get access token. Please sign in using your Percona Account. request=6596ae0b-aba4-11ed-adc8-0242ac110002
WARN[2023-02-13T13:43:56.169+00:00] RPC /server.Server/CheckUpdates done in 25.659366631s with gRPC error: rpc error: code = Unavailable desc = failed to check for updates request=66be8582-aba4-11ed-adc8-0242ac110002
WARN[2023-02-14T08:58:16.710+00:00] RPC /platform.Platform/UserStatus done in 3.836486ms with gRPC error: rpc error: code = Unauthenticated desc = Failed to get access token. Please sign in using your Percona Account. request=b8892494-ac45-11ed-adc8-0242ac110002
WARN[2023-02-14T08:58:49.783+00:00] RPC /server.Server/CheckUpdates done in 30.004190392s with gRPC error: rpc error: code = Unavailable desc = failed to check for updates request=ba5def0b-ac45-11ed-adc8-0242ac110002
WARN[2023-02-14T08:59:00.688+00:00] RPC /platform.Platform/UserStatus done in 3.31798ms with gRPC error: rpc error: code = Unauthenticated desc = Failed to get access token. Please sign in using your Percona Account. request=d2bfb9fb-ac45-11ed-adc8-0242ac110002
WARN[2023-02-14T08:59:32.802+00:00] RPC /server.Server/CheckUpdates done in 30.003524038s with gRPC error: rpc error: code = Unavailable desc = failed to check for updates request=d40238d2-ac45-11ed-adc8-0242ac110002
ERRO[2023-02-14T09:00:09.881+00:00] RPC /server.Server/CheckUpdates done in 30.014879351s with unexpected error: signal: killed
WARN[2023-02-14T09:13:56.746+00:00] RPC /platform.Platform/UserStatus done in 3.447462ms with gRPC error: rpc error: code = Unauthenticated desc = Failed to get access token. Please sign in using your Percona Account. request=e8d7428f-ac47-11ed-adc8-0242ac110002
WARN[2023-02-14T09:14:27.906+00:00] RPC /server.Server/CheckUpdates done in 30.00105979s with gRPC error: rpc error: code = Unavailable desc = failed to check for updates request=e9889200-ac47-11ed-adc8-0242ac110002
WARN[2023-02-14T09:14:55.522+00:00] RPC /platform.Platform/UserStatus done in 3.131078ms with gRPC error: rpc error: code = Unauthenticated desc = Failed to get access token. Please sign in using your Percona Account. request=0bdfd2d6-ac48-11ed-adc8-0242ac110002
WARN[2023-02-14T09:15:27.686+00:00] RPC /server.Server/CheckUpdates done in 30.004451807s with gRPC error: rpc error: code = Unavailable desc = failed to check for updates request=0d29d5d9-ac48-11ed-adc8-0242ac110002
ERRO[2023-02-14T12:01:26.736+00:00] failed to receive message: rpc error: code = Canceled desc = context canceled agent_id=/agent_id/e2d9ab1b-d58e-4992-ab42-72f5ae693b60 request=252967c1-ac5f-11ed-adc8-0242ac110002
ERRO[2023-02-14T12:30:21.634+00:00] failed to receive message: rpc error: code = Canceled desc = context canceled agent_id=/agent_id/5dab6b2b-d2e8-49c4-920c-c462b3784b7e request=54fcd75b-ac63-11ed-adc8-0242ac110002
ERRO[2023-02-14T13:52:46.789+00:00] failed to receive message: rpc error: code = Canceled desc = context canceled agent_id=/agent_id/3bdd7171-64bc-4c0e-9935-39bcc16bac84 request=89c724e9-ac67-11ed-adc8-0242ac110002
WARN[2023-02-14T15:47:26.523+00:00] Failed to execute check mysql_version of type MYSQL_SHOW on target /agent_id/e2d9ab1b-d58e-4992-ab42-72f5ae693b60: rpc error: code = FailedPrecondition desc = pmm-agent with ID “/agent_id/e2d9ab1b-d58e-4992-ab42-72f5ae693b60” is not currently connected
WARN[2023-02-14T15:47:29.725+00:00] Failed to execute check mysql_version of type MYSQL_SHOW on target /agent_id/3bdd7171-64bc-4c0e-9935-39bcc16bac84: rpc error: code = FailedPrecondition desc = pmm-agent with ID “/agent_id/3bdd7171-64bc-4c0e-9935-39bcc16bac84” is not currently connected
WARN[2023-02-14T15:47:34.031+00:00] Failed to execute check mysql_version of type MYSQL_SHOW on target /agent_id/5dab6b2b-d2e8-49c4-920c-c462b3784b7e: rpc error: code = FailedPrecondition desc = pmm-agent with ID “/agent_id/5dab6b2b-d2e8-49c4-920c-c462b3784b7e” is not currently connected
WARN[2023-02-14T15:59:47.524+00:00] RPC /platform.Platform/UserStatus done in 3.279449ms with gRPC error: rpc error: code = Unauthenticated desc = Failed to get access token. Please sign in using your Percona Account. request=9b092fe0-ac80-11ed-adc8-0242ac110002
WARN[2023-02-14T15:59:52.254+00:00] RPC /server.Server/StartUpdate done in 187.2879ms with gRPC error: rpc error: code = FailedPrecondition desc = Update is already running. request=9dbec95e-ac80-11ed-adc8-0242ac110002

Update: I ran upgrade from UI it was in infinity loop, and now I executed docker restart pmm-server for now - container is unhealthy and shows HTTP error 500.

~# docker exec -it pmm-server supervisorctl status
alertmanager RUNNING pid 27, uptime 0:03:37
clickhouse RUNNING pid 15, uptime 0:03:37
dbaas-controller STOPPED Not started
grafana BACKOFF Exited too quickly (process log may have details)
nginx RUNNING pid 24, uptime 0:03:37
pmm-agent RUNNING pid 39, uptime 0:03:37
pmm-managed RUNNING pid 2985, uptime 0:02:03
pmm-update-perform STOPPED Not started
pmm-update-perform-init EXITED Feb 14 04:46 PM
postgresql FATAL Exited too quickly (process log may have details)
prometheus STOPPED Not started
qan-api2 RUNNING pid 170, uptime 0:03:33
victoriametrics RUNNING pid 25, uptime 0:03:37
vmalert RUNNING pid 26, uptime 0:03:37

Postgres log looks interesting. it’s looking for /srb/postgres, but know I have 2 folders postgres11 and postgres14 :upside_down_face:

cat /srv/logs/postgresql.log
2023-02-09 15:47:20.613 UTC [29] LOG: listening on IPv4 address “127.0.0.1”, port 5432
2023-02-09 15:47:20.613 UTC [29] LOG: could not bind IPv6 address “::1”: Cannot assign requested address
2023-02-09 15:47:20.613 UTC [29] HINT: Is another postmaster already running on port 5432? If not, wait a few seconds and retry.
2023-02-09 15:47:20.620 UTC [29] LOG: listening on Unix socket “/var/run/postgresql/.s.PGSQL.5432”
2023-02-09 15:47:20.628 UTC [29] LOG: listening on Unix socket “/tmp/.s.PGSQL.5432”
2023-02-09 15:47:20.821 UTC [82] LOG: database system was shut down at 2022-05-10 17:18:56 UTC
2023-02-09 15:47:20.842 UTC [29] LOG: database system is ready to accept connections
2023-02-09 15:47:21.045 UTC [97] FATAL: role “pmm-managed” does not exist
2023-02-09 15:47:21.413 UTC [98] ERROR: syntax error at or near “$1” at character 34
2023-02-09 15:47:21.413 UTC [98] STATEMENT: GRANT ALL PRIVILEGES ON DATABASE $1 TO $2
2023-02-09 15:47:22.471 UTC [160] ERROR: relation “schema_migrations” does not exist at character 16
2023-02-09 15:47:22.471 UTC [160] STATEMENT: SELECT id FROM schema_migrations ORDER BY id DESC LIMIT 1
2023-02-14 16:44:32.711 UTC [14] LOG: listening on IPv4 address “127.0.0.1”, port 5432
2023-02-14 16:44:32.711 UTC [14] LOG: could not bind IPv6 address “::1”: Cannot assign requested address
2023-02-14 16:44:32.711 UTC [14] HINT: Is another postmaster already running on port 5432? If not, wait a few seconds and retry.
2023-02-14 16:44:32.718 UTC [14] LOG: listening on Unix socket “/var/run/postgresql/.s.PGSQL.5432”
2023-02-14 16:44:32.724 UTC [14] LOG: listening on Unix socket “/tmp/.s.PGSQL.5432”
2023-02-14 16:44:32.885 UTC [51] LOG: database system was interrupted; last known up at 2023-02-14 15:49:40 UTC
2023-02-14 16:44:33.215 UTC [51] LOG: database system was not properly shut down; automatic recovery in progress
2023-02-14 16:44:33.223 UTC [51] LOG: redo starts at 0/1A46038
2023-02-14 16:44:33.224 UTC [51] LOG: invalid record length at 0/1A46118: wanted 24, got 0
2023-02-14 16:44:33.232 UTC [51] LOG: redo done at 0/1A460E0
2023-02-14 16:44:33.256 UTC [68] FATAL: the database system is starting up
2023-02-14 16:44:33.272 UTC [14] LOG: database system is ready to accept connections
2023-02-14 16:45:50.891 UTC [14] LOG: received fast shutdown request
2023-02-14 16:45:50.897 UTC [14] LOG: aborting any active transactions
2023-02-14 16:45:50.902 UTC [14] LOG: background worker “logical replication launcher” (PID 76) exited with exit code 1
2023-02-14 16:45:50.903 UTC [71] LOG: shutting down
2023-02-14 16:45:50.926 UTC [14] LOG: database system is shut down
postgres: could not access directory “/srv/postgres”: No such file or directory
Run initdb or pg_basebackup to initialize a PostgreSQL data directory.
postgres: could not access directory “/srv/postgres”: No such file or directory
Run initdb or pg_basebackup to initialize a PostgreSQL data directory.
postgres: could not access directory “/srv/postgres”: No such file or directory
Run initdb or pg_basebackup to initialize a PostgreSQL data directory.
postgres: could not access directory “/srv/postgres”: No such file or directory
Run initdb or pg_basebackup to initialize a PostgreSQL data directory.
postgres: could not access directory “/srv/postgres”: No such file or directory
Run initdb or pg_basebackup to initialize a PostgreSQL data directory.
postgres: could not access directory “/srv/postgres”: No such file or directory
Run initdb or pg_basebackup to initialize a PostgreSQL data directory.
postgres: could not access directory “/srv/postgres”: No such file or directory
Run initdb or pg_basebackup to initialize a PostgreSQL data directory.
postgres: could not access directory “/srv/postgres”: No such file or directory
Run initdb or pg_basebackup to initialize a PostgreSQL data directory.
postgres: could not access directory “/srv/postgres”: No such file or directory
Run initdb or pg_basebackup to initialize a PostgreSQL data directory.
postgres: could not access directory “/srv/postgres”: No such file or directory
Run initdb or pg_basebackup to initialize a PostgreSQL data directory.
postgres: could not access directory “/srv/postgres”: No such file or directory
Run initdb or pg_basebackup to initialize a PostgreSQL data directory.

ll /srv/postgres/
ls: cannot access /srv/postgres/: No such file or directory

ll /srv/postgres11/
total 4
drwxr-xr-x 19 root root 4096 Feb 14 16:46 postgres

ll /srv/postgres14/
total 132
drwx------ 6 postgres postgres 4096 Feb 14 16:45 base
-rw------- 1 postgres postgres 30 Feb 14 16:45 current_logfiles
drwx------ 2 postgres postgres 4096 Feb 14 16:45 global
drwx------ 2 postgres postgres 4096 Feb 14 16:45 log
drwx------ 2 postgres postgres 4096 Feb 14 16:45 pg_commit_ts
drwx------ 2 postgres postgres 4096 Feb 14 16:45 pg_dynshmem
-rw------- 1 postgres postgres 4789 Feb 14 16:45 pg_hba.conf
-rw------- 1 postgres postgres 1636 Feb 14 16:45 pg_ident.conf
drwx------ 4 postgres postgres 4096 Feb 14 16:46 pg_logical
drwx------ 4 postgres postgres 4096 Feb 14 16:45 pg_multixact
drwx------ 2 postgres postgres 4096 Feb 14 16:45 pg_notify
drwx------ 2 postgres postgres 4096 Feb 14 16:45 pg_replslot
drwx------ 2 postgres postgres 4096 Feb 14 16:45 pg_serial
drwx------ 2 postgres postgres 4096 Feb 14 16:45 pg_snapshots
drwx------ 2 postgres postgres 4096 Feb 14 16:46 pg_stat
drwx------ 2 postgres postgres 4096 Feb 14 16:46 pg_stat_tmp
drwx------ 2 postgres postgres 4096 Feb 14 16:45 pg_subtrans
drwx------ 2 postgres postgres 4096 Feb 14 16:45 pg_tblspc
drwx------ 2 postgres postgres 4096 Feb 14 16:45 pg_twophase
-rw------- 1 postgres postgres 3 Feb 14 16:45 PG_VERSION
drwx------ 3 postgres postgres 4096 Feb 14 16:45 pg_wal
drwx------ 2 postgres postgres 4096 Feb 14 16:45 pg_xact
-rw------- 1 postgres postgres 88 Feb 14 16:45 postgresql.auto.conf
-rw------- 1 postgres postgres 28710 Feb 14 16:45 postgresql.conf
-rw------- 1 postgres postgres 50 Feb 14 16:45 postmaster.opts

Somewhere between where your original version was and current version we did an upgrade to postgres so you appear to have now been migrated to the latest version.

You appear to be stuck in some interim state with the upgrade because PMM is trying to start postgres 11 (which was in /srv/postgres) instead of postgres 14 which (in /srv/postgres14…at the same time postgres 11’s folder was renamed to postgres11).

I have had this happen to me once or twice before (almost always when I do a live demo :man_facepalming: ) of an upgrade and I have to restart my container a second time for the ansible scripts to run a second time through (first this PMM does is look at the data volume to see if there needs to be any data fixups).

That sounds reasonable, but what should I do to upgrade the version? Try to upgrade step by step 2.28->2.29->…->2.34? Or maybe there is some hotfix available?

I mean, you didn’t appear to do anything wrong. We test many upgrade paths with each release (as I understand it, we test upgrades from all of the last 5 minor releases and then every 5 releases back from there) so it’s possible 2.28.0 to 2.34.0 wasn’t explicitly tested but 2.24.0 to 2.34.0 would have been and passed or we’d not have released.

One thing I would highly recommend though, make a copy of your /srv directory as a backup:

docker exec -it pmm-server bash
supervisorctl stop all
cd / 
tar -cvf srv.tar /srv 
exit
docker cp pmm-server:/srv.tar /place/with/enough/space

That will at least give something so if you need to resort to a full restore it’s easier to do so.

I don’t think a multi-hop upgrade path is an option at this point if your data schema’s have been fixed up by 2.34.0, then it’s too late to connect a 2.29.0 or 2.30.0 container to the data vol (/srv) it won’t start. We need to figure out why postgres is in a mangled state.

How many clients did you have connected to PMM so I can think about restoration options. it could be a shorter path to actually get a “clean” pmm 2.34.0, shut down certain services and drop the two key data tables into place and that same approach can be done with clickhouse and victoriametrics data.

After some tests and investigation I found what is blocking online upgrade from UI.

I am using command for run pmm-server like:

docker run -d -p 443:443 --volumes-from pmm-data --name pmm-server -v $ssl_path:/srv/nginx -v /etc/pmm/grafana/ldap.toml:/etc/grafana/ldap.toml --restart always -e GF_AUTH_LDAP_CONFIG_FILE=/etc/grafana/ldap.toml -e GF_AUTH_LDAP_ENABLED=true percona/pmm-server:$pmm_version

And when $pmm_version is higher 2.29 or 2.34 new docker container doesn’t work.

So what I did.

  1. Using Docker - Percona Monitoring and Management I have upgraded to 2.29.0 without additional volumes or env variables.
  2. 2.29.0 pmm server started successfully.
  3. using alt button I called the upgrade button on the UI and click it. Upgrade was successful
  4. Remove the updated pmm-server and run the same latest version with volume and env variables.

So looks like the fresh PMM-server version is not good with customization. We are using LDAP and it’s very usefull for us to restrict access to PMM UI by only LDAP users. And also it’s easy to automate volume with ssl certs. Any idea what’s wrong?

I’m looking over your docker run and I can’t see anything that PMM wouldn’t like at first glance.

Here’s mine:

docker run -v /opt/srv:/srv -d --restart always --publish 8443:443 
--name pmm-server -e PMM_DEBUG=1 -e GF_SMTP_ENABLED=true 
-e GF_SMTP_HOST=smtp.gmail.com:587 -e GF_SMTP_USER=<email> 
-e GF_SMTP_PASSWORD=<password> -e GF_SMTP_SKIP_VERIFY=false 
-e GF_SMTP_FROM_ADDRESS=<email> -e GF_SMTP_FROM_NAME=PMM 
-e GF_AUTH_LDAP_ENABLED=true -e GF_AUTH_LDAP_CONFIG_FILE=/srv/grafana/ldap.toml percona/pmm-server:2.32.0

so volume mapping works, the GF_ variables work (I’m still getting emails and can login with LDAP…Active Directory) and because my /srv directory is mapped to my main drive I have my ssl certs hard-linked…

At this point…what isn’t working? is the container still unhealthy or are you getting the UI but can’t login?

LDAP errors would show in /srv/logs/grafana.log
Nginx issues in /srv/logs/nginx.log

Yeah, your run is much more … customized ))) Also I am worried about why PMM can’t understand it’s not up-to-date.

For now, sandbox and test envs are working fine on the latest version 2.34.0. But it works only If I upgrade the server using the steps I described earlier and only manually way.