Staged Upgrade for PMM2

I’m trying to upgrade an old instance of PMM2, running version 2.16.0 to 2.41.1. When I followed the steps and clicked on the “Upgrade to 2.41.1” button on the main dashboard in Grafana, the upgrade failed in the middle and appeared to have gone into a loop of retries. Checking the documentations, it indicates that for versions prior to 2.32.0 you should upgrade to that first, then upgrade to latest. However, there is no mention of how exactly you do this. The Grafana upgrade button does not let you pick version, nor does the API call to start the upgrade found here Resolve issues - Percona Monitoring and Management. Can anyone provide some guidance on this?

TASK [sqlite-to-postgres : Check if initial data were created] *****************
fatal: [localhost]: FAILED! => {"msg": "The conditional check 'psql_result.rowcount == 1' failed. The error was: error while evaluating conditional (psql_result.rowcount == 1): 'dict object' has no attribute 'rowcount'"}
...ignoring

TASK [sqlite-to-postgres : Wait for grafana database initialization] ***********
Pausing for 10 seconds
(ctrl+C then 'C' = continue early, ctrl+C then 'A' = abort)
ok: [localhost]

TASK [sqlite-to-postgres : Stop grafana before upgrade] ************************
changed: [localhost]

TASK [sqlite-to-postgres : Remove default admin user] **************************
ok: [localhost]

TASK [sqlite-to-postgres : Run grafana migrator] *******************************
fatal: [localhost]: FAILED! => {"changed": false, "cmd": ["grafana-db-migrator", "--change-char-to-text", "/srv/grafana/grafana.db", "postgres://grafana:grafana@localhost:5432/grafana?sslmode=disable"], "delta": "0:00:20.623634", "end": "2024-02-21 18:51:12.111319", "msg": "non-zero return code", "rc": 1, "start": "2024-02-21 18:50:51.487685", "stderr": "time=\"2024-02-21T18:50:51Z\" level=info msg=\"📁 SQLlite file: /srv/grafana/grafana.db\"\ntime=\"2024-02-21T18:50:51Z\" level=info msg=\"📁 Dump directory: /tmp\"\ntime=\"2024-02-21T18:50:51Z\" level=info msg=\"✅ sqlite3 command exists\"\ntime=\"2024-02-21T18:50:51Z\" level=info msg=\"✅ sqlite3 database dumped to /tmp/grafana.sql\"\ntime=\"2024-02-21T18:50:53Z\" level=info msg=\"✅ CREATE statements removed from dump file\"\ntime=\"2024-02-21T18:51:00Z\" level=info msg=\"✅ sqlite3 dump sanitized\"\ntime=\"2024-02-21T18:51:04Z\" level=info msg=\"✅ migration_log statements removed\"\ntime=\"2024-02-21T18:51:04Z\" level=info msg=\"✅ char keyword transformed\"\ntime=\"2024-02-21T18:51:04Z\" level=info msg=\"✅ hex-encoded data values wrapped for insertion\"\ntime=\"2024-02-21T18:51:12Z\" level=warning msg=\"duplicate key: pq: duplicate key value violates unique constraint \\\"dashboard_acl_pkey\\\"\"\ntime=\"2024-02-21T18:51:12Z\" level=warning msg=\"duplicate key: pq: duplicate key value violates unique constraint \\\"dashboard_acl_pkey\\\"\"\ntime=\"2024-02-21T18:51:12Z\" level=fatal msg=\"❌ pq: relation \\\"alert_configuration\\\" does not exist INSERT INTO \\\"alert_configuration\\\" VALUES(1,'{\\n\\t\\\"alertmanager_config\\\": {\\n\\t\\t\\\"route\\\": {\\n\\t\\t\\t\\\"receiver\\\": \\\"grafana-default-email\\\",\\n\\t\\t\\t\\\"group_by\\\": [\\\"grafana_folder\\\", \\\"alertname\\\"]\\n\\t\\t},\\n\\t\\t\\\"receivers\\\": [{\\n\\t\\t\\t\\\"name\\\": \\\"grafana-default-email\\\",\\n\\t\\t\\t\\\"grafana_managed_receiver_configs\\\": [{\\n\\t\\t\\t\\t\\\"uid\\\": \\\"\\\",\\n\\t\\t\\t\\t\\\"name\\\": \\\"email receiver\\\",\\n\\t\\t\\t\\t\\\"type\\\": \\\"email\\\",\\n\\t\\t\\t\\t\\\"isDefault\\\": true,\\n\\t\\t\\t\\t\\\"settings\\\": {\\n\\t\\t\\t\\t\\t\\\"addresses\\\": \\\"<example@email.com>\\\"\\n\\t\\t\\t\\t}\\n\\t\\t\\t}]\\n\\t\\t}]\\n\\t}\\n}\\n','v1',1708540508,1,1,'e0528a75784033ae7b15c40851d89484') - failed to import dump file to Postgres.\"", "stderr_lines": ["time=\"2024-02-21T18:50:51Z\" level=info msg=\"📁 SQLlite file: /srv/grafana/grafana.db\"", "time=\"2024-02-21T18:50:51Z\" level=info msg=\"📁 Dump directory: /tmp\"", "time=\"2024-02-21T18:50:51Z\" level=info msg=\"✅ sqlite3 command exists\"", "time=\"2024-02-21T18:50:51Z\" level=info msg=\"✅ sqlite3 database dumped to /tmp/grafana.sql\"", "time=\"2024-02-21T18:50:53Z\" level=info msg=\"✅ CREATE statements removed from dump file\"", "time=\"2024-02-21T18:51:00Z\" level=info msg=\"✅ sqlite3 dump sanitized\"", "time=\"2024-02-21T18:51:04Z\" level=info msg=\"✅ migration_log statements removed\"", "time=\"2024-02-21T18:51:04Z\" level=info msg=\"✅ char keyword transformed\"", "time=\"2024-02-21T18:51:04Z\" level=info msg=\"✅ hex-encoded data values wrapped for insertion\"", "time=\"2024-02-21T18:51:12Z\" level=warning msg=\"duplicate key: pq: duplicate key value violates unique constraint \\\"dashboard_acl_pkey\\\"\"", "time=\"2024-02-21T18:51:12Z\" level=warning msg=\"duplicate key: pq: duplicate key value violates unique constraint \\\"dashboard_acl_pkey\\\"\"", "time=\"2024-02-21T18:51:12Z\" level=fatal msg=\"❌ pq: relation \\\"alert_configuration\\\" does not exist INSERT INTO \\\"alert_configuration\\\" VALUES(1,'{\\n\\t\\\"alertmanager_config\\\": {\\n\\t\\t\\\"route\\\": {\\n\\t\\t\\t\\\"receiver\\\": \\\"grafana-default-email\\\",\\n\\t\\t\\t\\\"group_by\\\": [\\\"grafana_folder\\\", \\\"alertname\\\"]\\n\\t\\t},\\n\\t\\t\\\"receivers\\\": [{\\n\\t\\t\\t\\\"name\\\": \\\"grafana-default-email\\\",\\n\\t\\t\\t\\\"grafana_managed_receiver_configs\\\": [{\\n\\t\\t\\t\\t\\\"uid\\\": \\\"\\\",\\n\\t\\t\\t\\t\\\"name\\\": \\\"email receiver\\\",\\n\\t\\t\\t\\t\\\"type\\\": \\\"email\\\",\\n\\t\\t\\t\\t\\\"isDefault\\\": true,\\n\\t\\t\\t\\t\\\"settings\\\": {\\n\\t\\t\\t\\t\\t\\\"addresses\\\": \\\"<example@email.com>\\\"\\n\\t\\t\\t\\t}\\n\\t\\t\\t}]\\n\\t\\t}]\\n\\t}\\n}\\n','v1',1708540508,1,1,'e0528a75784033ae7b15c40851d89484') - failed to import dump file to Postgres.\""], "stdout": "", "stdout_lines": []}

PLAY RECAP *********************************************************************
localhost                  : ok=110  changed=38   unreachable=0    failed=1    skipped=62   rescued=0    ignored=1

Hello @Jimmy_Chen,
You can do this manually by removing the docker container, downloading the latest container image, then start a new container with the new image. All data is safe as the data is stored in a separate docker volume. Our documentation has these steps.

We’re not using Docker. What is the upgrade steps for non-container deployed?

We only support Docker, Podman, OVA, and AWS AMI. How did you install PMM server?

If you’re using an OVF or AMI the process is a bit more manual and amounts to a “lift and shift” for the manual efforts. I wrote a utility that can help streamline that by boiling it down to a backup and restore process (really the utility was to be able to take hot backups of PMM via cron).

I think the good news is that the upgrade process shouldn’t harm your data so you effectively need to get all the files and folders out of /srv/* to the new server and set the folder permissions correctly.

Let us know how you run PMM and we can help a bit from there!

I tried your script but it seems like it doesn’t complete the backup fully. It gets to the part that says

Starting configuration and supporting files backup

However, it does not complete this and as such the backup is not actually zipped up into a tar ball.

Looked around in the script, the issue appears to be this line

run_root "cp -af /srv/alerting \"${backup_dir}\"/folders/"

On the version I’m running, 2.16, there is no alerting folder. The script seems to die here since it could not copy the folder.

Huh! You can likely comment out that line and try again. I can’t recall when alerting was added but I only tested as far back as 2.25 or so (make sure to also comment out the restore and permissions change if you plan to use it to restore too). If I can find free time I’ll try to make it a little more graceful in failures!

I assume you mean comment out the alerting section for restore/permission change if I was restoring to the same version? I’m trying to use it to migrate and restore to the current version of PMM so I don’t think that will be an issue since the folder is there now.

I have used the script to migrate from an older version to a newer version but haven’t tested all migration paths (you have to pass a -r and -u or it will warn about upgrade danger and make you start over given the script can tell what version the backup was taken from and what you’re trying to restore to).

But if you just added an empty alerting directory that will also get you moving forward!

It looks like the migration worked, however the existing services for MySQL didn’t quite get migrated correctly. It’s not a huge deal since I was planning to redo them as they’re running the 2.16 agents.