Pmm2-server 2.21 docker container unhealthy

pmm2 UI is unavailable with a 500 internal server error after a docker pull percona/pmm-server:2

docker container logs

2021-09-01 20:33:28,154 INFO Included extra file “/etc/supervisord.d/alertmanager.ini” during parsing
2021-09-01 20:33:28,154 INFO Included extra file “/etc/supervisord.d/dbaas-controller.ini” during parsing
2021-09-01 20:33:28,154 INFO Included extra file “/etc/supervisord.d/pmm.ini” during parsing
2021-09-01 20:33:28,154 INFO Included extra file “/etc/supervisord.d/prometheus.ini” during parsing
2021-09-01 20:33:28,154 INFO Included extra file “/etc/supervisord.d/qan-api2.ini” during parsing
2021-09-01 20:33:28,154 INFO Included extra file “/etc/supervisord.d/victoriametrics.ini” during parsing
2021-09-01 20:33:28,154 INFO Included extra file “/etc/supervisord.d/vmalert.ini” during parsing
2021-09-01 20:33:28,154 INFO Set uid to user 0 succeeded
2021-09-01 20:33:28,163 INFO RPC interface ‘supervisor’ initialized
2021-09-01 20:33:28,163 INFO supervisord started with pid 1
2021-09-01 20:33:29,166 INFO spawned: ‘postgresql’ with pid 14
2021-09-01 20:33:29,167 INFO spawned: ‘clickhouse’ with pid 15
2021-09-01 20:33:29,169 INFO spawned: ‘grafana’ with pid 16
2021-09-01 20:33:29,170 INFO spawned: ‘nginx’ with pid 17
2021-09-01 20:33:29,171 INFO spawned: ‘cron’ with pid 18
2021-09-01 20:33:29,173 INFO spawned: ‘victoriametrics’ with pid 19
2021-09-01 20:33:29,175 INFO spawned: ‘vmalert’ with pid 20
2021-09-01 20:33:29,176 INFO spawned: ‘alertmanager’ with pid 24
2021-09-01 20:33:29,177 INFO spawned: ‘dashboard-upgrade’ with pid 25
2021-09-01 20:33:29,179 INFO spawned: ‘qan-api2’ with pid 27
2021-09-01 20:33:29,182 INFO spawned: ‘pmm-managed’ with pid 32
2021-09-01 20:33:29,184 INFO spawned: ‘pmm-agent’ with pid 43
2021-09-01 20:33:29,185 INFO success: dashboard-upgrade entered RUNNING state, process has stayed up for > than 0 seconds (startsecs)
2021-09-01 20:33:29,230 INFO exited: postgresql (exit status 1; not expected)
2021-09-01 20:33:29,242 INFO exited: qan-api2 (exit status 1; not expected)
2021-09-01 20:33:29,291 INFO exited: pmm-managed (exit status 1; not expected)
2021-09-01 20:33:29,330 INFO exited: dashboard-upgrade (exit status 0; expected)
2021-09-01 20:33:30,241 INFO spawned: ‘postgresql’ with pid 140
2021-09-01 20:33:30,242 INFO success: clickhouse entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2021-09-01 20:33:30,242 INFO success: grafana entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2021-09-01 20:33:30,242 INFO success: nginx entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2021-09-01 20:33:30,242 INFO success: cron entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2021-09-01 20:33:30,242 INFO success: victoriametrics entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2021-09-01 20:33:30,242 INFO success: vmalert entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2021-09-01 20:33:30,242 INFO success: alertmanager entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2021-09-01 20:33:30,243 INFO spawned: ‘qan-api2’ with pid 141
2021-09-01 20:33:30,244 INFO success: pmm-agent entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2021-09-01 20:33:30,287 INFO exited: postgresql (exit status 1; not expected)
2021-09-01 20:33:30,294 INFO spawned: ‘pmm-managed’ with pid 150
2021-09-01 20:33:30,342 INFO exited: pmm-managed (exit status 1; not expected)
2021-09-01 20:33:31,722 INFO success: qan-api2 entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2021-09-01 20:33:32,843 INFO spawned: ‘postgresql’ with pid 167
2021-09-01 20:33:32,845 INFO spawned: ‘pmm-managed’ with pid 168
2021-09-01 20:33:32,886 INFO exited: postgresql (exit status 1; not expected)
2021-09-01 20:33:32,888 INFO exited: pmm-managed (exit status 1; not expected)
2021-09-01 20:33:36,013 INFO spawned: ‘postgresql’ with pid 183
2021-09-01 20:33:36,015 INFO spawned: ‘pmm-managed’ with pid 184
2021-09-01 20:33:36,055 INFO exited: postgresql (exit status 1; not expected)
2021-09-01 20:33:36,060 INFO exited: pmm-managed (exit status 1; not expected)
2021-09-01 20:33:40,610 INFO spawned: ‘postgresql’ with pid 205
2021-09-01 20:33:40,612 INFO spawned: ‘pmm-managed’ with pid 206
2021-09-01 20:33:40,655 INFO exited: postgresql (exit status 1; not expected)
2021-09-01 20:33:40,664 INFO exited: pmm-managed (exit status 1; not expected)
2021-09-01 20:33:46,121 INFO spawned: ‘postgresql’ with pid 221
2021-09-01 20:33:46,123 INFO spawned: ‘pmm-managed’ with pid 222
2021-09-01 20:33:46,168 INFO exited: postgresql (exit status 1; not expected)
2021-09-01 20:33:46,180 INFO exited: pmm-managed (exit status 1; not expected)
2021-09-01 20:33:52,394 INFO spawned: ‘postgresql’ with pid 261
2021-09-01 20:33:52,396 INFO spawned: ‘pmm-managed’ with pid 262
2021-09-01 20:33:52,436 INFO exited: postgresql (exit status 1; not expected)
2021-09-01 20:33:52,443 INFO exited: pmm-managed (exit status 1; not expected)
2021-09-01 20:33:59,485 INFO spawned: ‘postgresql’ with pid 291
2021-09-01 20:33:59,487 INFO spawned: ‘pmm-managed’ with pid 292
2021-09-01 20:33:59,533 INFO exited: postgresql (exit status 1; not expected)
2021-09-01 20:33:59,537 INFO exited: pmm-managed (exit status 1; not expected)
2021-09-01 20:34:07,676 INFO spawned: ‘postgresql’ with pid 314
2021-09-01 20:34:07,677 INFO spawned: ‘pmm-managed’ with pid 315
2021-09-01 20:34:07,782 INFO exited: postgresql (exit status 1; not expected)
2021-09-01 20:34:07,783 INFO exited: pmm-managed (exit status 1; not expected)
2021-09-01 20:34:16,826 INFO spawned: ‘postgresql’ with pid 342
2021-09-01 20:34:16,828 INFO spawned: ‘pmm-managed’ with pid 343
2021-09-01 20:34:16,867 INFO exited: postgresql (exit status 1; not expected)
2021-09-01 20:34:16,873 INFO exited: pmm-managed (exit status 1; not expected)
2021-09-01 20:34:27,019 INFO spawned: ‘postgresql’ with pid 372
2021-09-01 20:34:27,023 INFO spawned: ‘pmm-managed’ with pid 373
2021-09-01 20:34:27,066 INFO exited: postgresql (exit status 1; not expected)
2021-09-01 20:34:27,067 INFO gave up: postgresql entered FATAL state, too many start retries too quickly
2021-09-01 20:34:27,068 INFO exited: pmm-managed (exit status 1; not expected)
2021-09-01 20:34:38,437 INFO spawned: ‘pmm-managed’ with pid 407
2021-09-01 20:34:38,490 INFO exited: pmm-managed (exit status 1; not expected)
2021-09-01 20:34:50,883 INFO spawned: ‘pmm-managed’ with pid 441
2021-09-01 20:34:50,924 INFO exited: pmm-managed (exit status 1; not expected)
2021-09-01 20:35:04,286 INFO spawned: ‘pmm-managed’ with pid 475
2021-09-01 20:35:04,356 INFO exited: pmm-managed (exit status 1; not expected)
2021-09-01 20:35:19,359 INFO spawned: ‘pmm-managed’ with pid 514
2021-09-01 20:35:19,410 INFO exited: pmm-managed (exit status 1; not expected)
2021-09-01 20:35:34,601 INFO spawned: ‘pmm-managed’ with pid 554
2021-09-01 20:35:34,652 INFO exited: pmm-managed (exit status 1; not expected)
2021-09-01 20:35:50,866 INFO spawned: ‘pmm-managed’ with pid 594
2021-09-01 20:35:50,912 INFO exited: pmm-managed (exit status 1; not expected)
2021-09-01 20:36:08,392 INFO spawned: ‘pmm-managed’ with pid 635
2021-09-01 20:36:08,448 INFO exited: pmm-managed (exit status 1; not expected)
2021-09-01 20:36:26,740 INFO spawned: ‘pmm-managed’ with pid 681
2021-09-01 20:36:26,787 INFO exited: pmm-managed (exit status 1; not expected)
2021-09-01 20:36:45,958 INFO spawned: ‘pmm-managed’ with pid 726
2021-09-01 20:36:46,002 INFO exited: pmm-managed (exit status 1; not expected)
2021-09-01 20:37:06,084 INFO spawned: ‘pmm-managed’ with pid 779
2021-09-01 20:37:06,139 INFO exited: pmm-managed (exit status 1; not expected)
2021-09-01 20:37:27,401 INFO spawned: ‘pmm-managed’ with pid 825
2021-09-01 20:37:27,446 INFO exited: pmm-managed (exit status 1; not expected)
2021-09-01 20:37:49,528 INFO spawned: ‘pmm-managed’ with pid 878
2021-09-01 20:37:49,576 INFO exited: pmm-managed (exit status 1; not expected)
2021-09-01 20:38:12,777 INFO spawned: ‘pmm-managed’ with pid 939
2021-09-01 20:38:12,825 INFO exited: pmm-managed (exit status 1; not expected)
2021-09-01 20:38:36,971 INFO spawned: ‘pmm-managed’ with pid 997
2021-09-01 20:38:37,027 INFO exited: pmm-managed (exit status 1; not expected)
2021-09-01 20:39:02,143 INFO spawned: ‘pmm-managed’ with pid 1056
2021-09-01 20:39:02,200 INFO exited: pmm-managed (exit status 1; not expected)
2021-09-01 20:39:28,419 INFO spawned: ‘pmm-managed’ with pid 1113
2021-09-01 20:39:28,487 INFO exited: pmm-managed (exit status 1; not expected)
2021-09-01 20:39:55,688 INFO spawned: ‘pmm-managed’ with pid 1177
2021-09-01 20:39:55,734 INFO exited: pmm-managed (exit status 1; not expected)
2021-09-01 20:40:24,151 INFO spawned: ‘pmm-managed’ with pid 1243
2021-09-01 20:40:24,201 INFO exited: pmm-managed (exit status 1; not expected)
2021-09-01 20:40:53,492 INFO spawned: ‘pmm-managed’ with pid 1308
2021-09-01 20:40:53,536 INFO exited: pmm-managed (exit status 1; not expected)
2021-09-01 20:41:23,559 INFO spawned: ‘pmm-managed’ with pid 1374
2021-09-01 20:41:23,602 INFO exited: pmm-managed (exit status 1; not expected)
2021-09-01 20:41:54,841 INFO spawned: ‘pmm-managed’ with pid 1517
2021-09-01 20:41:54,892 INFO exited: pmm-managed (exit status 1; not expected)
2021-09-01 20:42:27,161 INFO spawned: ‘pmm-managed’ with pid 2083
2021-09-01 20:42:27,212 INFO exited: pmm-managed (exit status 1; not expected)
2021-09-01 20:43:00,294 INFO spawned: ‘pmm-managed’ with pid 2158
2021-09-01 20:43:00,340 INFO exited: pmm-managed (exit status 1; not expected)
2021-09-01 20:43:34,615 INFO spawned: ‘pmm-managed’ with pid 2232
2021-09-01 20:43:34,660 INFO exited: pmm-managed (exit status 1; not expected)
2021-09-01 20:44:10,238 INFO spawned: ‘pmm-managed’ with pid 2310
2021-09-01 20:44:10,296 INFO exited: pmm-managed (exit status 1; not expected)
2021-09-01 20:44:46,457 INFO spawned: ‘pmm-managed’ with pid 2393
2021-09-01 20:44:46,505 INFO exited: pmm-managed (exit status 1; not expected)

Was this a fresh install or an upgrade of an existing version? There’s an issue we’re sorting out that had to do with custom dashboards that have custom tags.

1 Like

Upgrade of pmm-server from 2.18.0

1 Like

@nurlan can you provide troubleshooting and remediation steps to ensure this is the same issue we are resolving to get this user back up and running?

1 Like

I can confirm also that update trough home dashboard update button also fails with “502 Bad Gateway” and occasionally it could be possible to reach home dashboard but with no content from Percona community on dashboard. Only default Grafana content.

Log window last entry:

PLAY RECAP *********************************************************************
localhost                  : ok=50   changed=24   unreachable=0    failed=0    skipped=10   rescued=0    ignored=0   

time="2021-09-02T10:36:53Z" level=info msg="Waiting for Grafana dashboards update to finish..."

and after that it waits indefinitely and log window fills with blanks every few seconds.

Upgrade is from version 2.20.

1 Like

I followed the upgrade steps here for the docker setup: Docker - Percona Monitoring and Management

I was able to get the UI working again by reverting the previous pmm-server image. Also not sure if it impacted but I ran the restore permissions steps from this article for pmm-data container prior to getting pmm-server working: Docker - Percona Monitoring and Management

Restore Permissions
docker run --rm --volumes-from pmm-data -it percona/pmm-server:2 chown -R root:root /srv && \
docker run --rm --volumes-from pmm-data -it percona/pmm-server:2 chown -R pmm:pmm /srv/alertmanager && \
docker run --rm --volumes-from pmm-data -it percona/pmm-server:2 chown -R root:pmm /srv/clickhouse && \
docker run --rm --volumes-from pmm-data -it percona/pmm-server:2 chown -R grafana:grafana /srv/grafana && \
docker run --rm --volumes-from pmm-data -it percona/pmm-server:2 chown -R pmm:pmm /srv/logs && \
docker run --rm --volumes-from pmm-data -it percona/pmm-server:2 chown -R postgres:postgres /srv/postgres && \
docker run --rm --volumes-from pmm-data -it percona/pmm-server:2 chown -R pmm:pmm /srv/prometheus && \
docker run --rm --volumes-from pmm-data -it percona/pmm-server:2 chown -R pmm:pmm /srv/victoriametrics && \
docker run --rm --volumes-from pmm-data -it percona/pmm-server:2 chown -R postgres:postgres /srv/logs/postgresql.log
1 Like

Hi @rhoffman, @Semir_Hadzic,

could you try this, please?

curl -LJOs https://raw.githubusercontent.com/percona/pmm-server/c2e92bc3aec123affda5f1992c96c95ac74f4a2d/import-dashboards.py
docker cp import-dashboards.py pmm-server:/usr/share/percona-dashboards/
docker exec -it pmm-server chmod a+x /usr/share/percona-dashboards/import-dashboards.py
1 Like

Hi @nurlan,

I had the same error after upgrading from 2.20 → 2.21. This should fix it, thanks!

1 Like

I tried the 2.21 pm-server again and ran the commands @nurlan mentioned. pmm-server is healthy although from the non-QAN dashboards i cannot see metrics unless the time range is set to 30 days. Query Analytics dashboard is working on the 2.21 pmm-server container with lower time resolutions and this is currently not working on the 2.18 pmm-server container that I reverted to.

1 Like

Hi nurlan.

It seems this fix issue and finish upgrade…

However
[PMM-8421] Listen-port ignored/removed for external services after server update to PMM 2.19 and higher - Percona JIRA issue still remain (not connected to this issue).

Thnx for the fix, hope this will be included/fixed in next release.

1 Like