AWS Marketplace PMM upgrade from 2.25.0 to 2.26.0

After using default AM base config pmm-managed keeps restarting with the same error as before?

ERRO[2022-02-11T14:09:43.585+00:00] Failed to update configuration, will retry: exit status 1
Checking ‘/tmp/pmm-managed-config-alertmanager-2940248744’ FAILED: yaml: unmarshal errors:
line 51: field max_alerts not found in type config.plain

amtool: error: failed to validate 1 file(s)
1 Like

Yes, pmm-managed continue to restarts. It is happening continuously every few seconds. I never experience any issues with upgrade in the AWS Marketplace PMM. Upgrades happen monthly and the setup is similar to all PMM instances. Few of the instances (2 out of 9) are failing with similar issues.

1 Like

I do see a memory issue in pmm-managed. when checking pmm-managed.log, it has this statement below.

panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x11afc0e]

Here is the overall log. Alert Manager can be ignore as I have my own installed. This is case for all PMM servers and see the same thing for healthy environments.

    /usr/local/go/src/runtime/asm_amd64.s:1581.  component=setup

INFO[2022-02-17T12:22:25.615+00:00] prometheus configuration not changed, doing nothing. component=supervisord
INFO[2022-02-17T12:22:25.615+00:00] victoriametrics configuration not changed, doing nothing. component=supervisord
INFO[2022-02-17T12:22:25.615+00:00] vmalert configuration not changed, doing nothing. component=supervisord
INFO[2022-02-17T12:22:25.615+00:00] alertmanager configuration not changed, doing nothing. component=supervisord
INFO[2022-02-17T12:22:25.615+00:00] qan-api2 configuration not changed, doing nothing. component=supervisord
INFO[2022-02-17T12:22:25.615+00:00] grafana configuration not changed, doing nothing. component=supervisord
INFO[2022-02-17T12:22:25.615+00:00] dbaas-controller configuration not changed, doing nothing. component=supervisord
INFO[2022-02-17T12:22:25.615+00:00] Checking VictoriaMetrics… component=setup
INFO[2022-02-17T12:22:25.616+00:00] Checking VMAlert… component=setup
INFO[2022-02-17T12:22:25.617+00:00] Checking Alertmanager… component=setup
INFO[2022-02-17T12:22:25.617+00:00] Setup completed. component=setup
INFO[2022-02-17T12:22:25.958+00:00] Starting services… component=main
INFO[2022-02-17T12:22:25.958+00:00] Starting… component=victoriametrics
INFO[2022-02-17T12:22:25.958+00:00] Starting server on http://127.0.0.1:7771/ … component=gRPC
INFO[2022-02-17T12:22:25.958+00:00] Starting… component=alertmanager
INFO[2022-02-17T12:22:25.958+00:00] Starting… component=checks
INFO[2022-02-17T12:22:25.958+00:00] Starting… component=vmalert
INFO[2022-02-17T12:22:25.958+00:00] Using default SaaS host “check.percona.com”.
INFO[2022-02-17T12:22:25.958+00:00] Using SaaS host “check.percona.com”.
INFO[2022-02-17T12:22:25.958+00:00] Environment variable “PERCONA_PLATFORM_API_TIMEOUT” is not set, using “30s” as a default timeout for platform API. component=auth
INFO[2022-02-17T12:22:25.958+00:00] Starting server on http://127.0.0.1:7772/ … component=JSON
INFO[2022-02-17T12:22:25.958+00:00] Starting server on http://127.0.0.1:7773/debug
Registered handlers:
http://127.0.0.1:7773/debug/metrics
http://127.0.0.1:7773/debug/vars
http://127.0.0.1:7773/debug/requests
http://127.0.0.1:7773/debug/events
http://127.0.0.1:7773/debug/pprof component=debug
INFO[2022-02-17T12:22:26.748+00:00] Starting Stream /agent.Agent/Connect … agent_id=pmm-server request=4497a2aa-8fec-11ec-90b2-1260e5508cc3
INFO[2022-02-17T12:22:26.750+00:00] Connected pmm-agent: &{ID:pmm-server Version:2.26.0 MetricsPort:0}. agent_id=pmm-server request=4497a2aa-8fec-11ec-90b2-1260e5508cc3
INFO[2022-02-17T12:22:26.750+00:00] Starting runStateChangeHandler … agent_id=pmm-server request=4497a2aa-8fec-11ec-90b2-1260e5508cc3
INFO[2022-02-17T12:22:26.961+00:00] Configuration reloaded. component=vmalert
INFO[2022-02-17T12:22:26.991+00:00] Configuration reloaded. component=alertmanager
INFO[2022-02-17T12:22:27.138+00:00] Done. component=victoriametrics
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x11afc0e]

goroutine 917 [running]:
github.com/percona/pmm-managed/services/victoriametrics.AddScrapeConfigs(0xc00026c840, 0xc00026c840, 0x14ff92f, 0x8, 0x187b300, 0x0)
/home/builder/rpm/BUILD/pmm-managed-6914083707b5478605e7bc816de8fc68b5f511f6/src/github.com/percona/pmm-managed/services/victoriametrics/prometheus.go:90 +0x68e
github.com/percona/pmm-managed/services/victoriametrics.(*Service).populateConfig.func1(0xc000300078)
/home/builder/rpm/BUILD/pmm-managed-6914083707b5478605e7bc816de8fc68b5f511f6/src/github.com/percona/pmm-managed/services/victoriametrics/victoriametrics.go:337 +0x625
gopkg.in/reform%2ev1.(*DB).InTransactionContext(0xc001054b01, {0x18630c8, 0xc00003e038}, 0x1863138, 0xc000cb5d90)
/home/builder/go/pkg/mod/gopkg.in/reform.v1@v1.5.1/db.go:93 +0xa2
gopkg.in/reform%2ev1.(*DB).InTransaction(…)
/home/builder/go/pkg/mod/gopkg.in/reform.v1@v1.5.1/db.go:74
github.com/percona/pmm-managed/services/victoriametrics.(*Service).populateConfig(0xc000be70b0, 0xc07bad6179333954)
/home/builder/rpm/BUILD/pmm-managed-6914083707b5478605e7bc816de8fc68b5f511f6/src/github.com/percona/pmm-managed/services/victoriametrics/victoriametrics.go:322 +0x56
github.com/percona/pmm-managed/services/victoriametrics.(*Service).marshalConfig(0xc000612140, 0x0)
/home/builder/rpm/BUILD/pmm-managed-6914083707b5478605e7bc816de8fc68b5f511f6/src/github.com/percona/pmm-managed/services/victoriametrics/victoriametrics.go:215 +0x25
github.com/percona/pmm-managed/services/victoriametrics.(*Service).updateConfiguration(0xc000612140, {0x1863100, 0xc001052360})
/home/builder/rpm/BUILD/pmm-managed-6914083707b5478605e7bc816de8fc68b5f511f6/src/github.com/percona/pmm-managed/services/victoriametrics/victoriametrics.go:157 +0xb5
github.com/percona/pmm-managed/services/victoriametrics.(*Service).Run(0xc000612140, {0x1863138, 0xc0005a8c30})
/home/builder/rpm/BUILD/pmm-managed-6914083707b5478605e7bc816de8fc68b5f511f6/src/github.com/percona/pmm-managed/services/victoriametrics/victoriametrics.go:130 +0x325
main.main.func6()
/home/builder/rpm/BUILD/pmm-managed-6914083707b5478605e7bc816de8fc68b5f511f6/src/github.com/percona/pmm-managed/main.go:838 +0x65
created by main.main
/home/builder/rpm/BUILD/pmm-managed-6914083707b5478605e7bc816de8fc68b5f511f6/src/github.com/percona/pmm-managed/main.go:836 +0x3fe5

AWS Marketplace PMM updates seem to be done by Ansible. What is the best way to see if process is complete? The pmm-update logs are not completing. It stops at time=“2022-02-17T16:32:25Z” level=info msg=“Waiting for Grafana dashboards update to finish…” in the logs. It always stops at this one.

pmm-update-perform.log
Starting “yum update --assumeyes pmm-update” …
Loaded plugins: changelog, fastestmirror
Loading mirror speeds from cached hostfile

time=“2022-02-17T16:31:49Z” level=info msg=“pmm-update:\nbefore update = {Version:2.26.0 FullVersion:2.26.0-64.2201121502.f13eec0.el7 BuildTime:2022-01-12 15:02:19 +0000 UTC Repo:pmm2-server}\n after update = {Version:2.26.0 FullVersion:2.26.0-64.2201121502.f13eec0.el7 BuildTime:2022-01-12 15:02:19 +0000 UTC Repo:pmm2-server}”
time=“2022-02-17T16:31:49Z” level=info msg=“Version did not change.”
Starting “ansible-playbook --flush-cache /usr/share/pmm-update/ansible/playbook/tasks/update.yml” …
[WARNING]: provided hosts list is empty, only localhost is available. Note that
the implicit localhost does not match ‘all’

PLAY [localhost] ***************************************************************

TASK [Gathering Facts] *********************************************************
ok: [localhost]

TASK [detect /srv/pmm-distribution] ********************************************
ok: [localhost]

TASK [detect containers] *******************************************************
ok: [localhost]

TASK [force container] *********************************************************
skipping: [localhost]

TASK [Remove percona-dashboard without architecture] ***************************
ok: [localhost]

TASK [Update percona-dashboards package] ***************************************
ok: [localhost]

TASK [Upgrade grafana database (New schema)] ***********************************
changed: [localhost]

TASK [Create provisioning directory] *******************************************
ok: [localhost] => (item=datasources)
ok: [localhost] => (item=plugins)
ok: [localhost] => (item=dashboards)

TASK [Copy grafana provisioning files] *****************************************
ok: [localhost] => (item=datasources)
ok: [localhost] => (item=plugins)
ok: [localhost] => (item=dashboards)

TASK [Supervisord start | Start supervisord service for AMI/OVF] ********
ok: [localhost]

TASK [Check that supervisor socket exists] *************************************
ok: [localhost]

TASK [Supervisord start | Start supervisord for docker] ****************
skipping: [localhost]

TASK [Run initialization playbook] *********************************************

TASK [initialization : Get current version] ************************************
ok: [localhost]

TASK [initialization : Get image version] **************************************
ok: [localhost]

TASK [initialization : Set current version if VERSION fail doen’t exist] *******
skipping: [localhost]

TASK [initialization : Setting current and image version] **********************
ok: [localhost]

TASK [initialization : Setting current and image version] **********************
ok: [localhost]

TASK [initialization : Print current version] **********************************
ok: [localhost] => {
“msg”: “Current version: 2.26.0 Image Version: 2.25.0”
}

PLAY RECAP *********************************************************************
localhost : ok=15 changed=1 unreachable=0 failed=0 skipped=3 rescued=0 ignored=0

time=“2022-02-17T16:32:25Z” level=info msg=“Waiting for Grafana dashboards update to finish…”

1 Like