AWS Marketplace PMM upgrade from 2.25.0 to 2.26.0

I upgraded multiple environments today from the AWS Marketplace for PMM with 2 failures. Never had an issue for over 2 years. The upgrade was from 2.25.0 to 2.26.0. Grafana is not loading and pmm-managed keeps restarting. Here is the end of logs for the upgrade for PMM. It would not let me enter all of them

ok: [localhost]

TASK [Upgrade grafana database (New schema)] ***********************************
changed: [localhost]

TASK [Create provisioning directory] *******************************************
ok: [localhost] => (item=datasources)
ok: [localhost] => (item=plugins)
ok: [localhost] => (item=dashboards)

TASK [Copy grafana provisioning files] *****************************************
ok: [localhost] => (item=datasources)
ok: [localhost] => (item=plugins)
ok: [localhost] => (item=dashboards)

TASK [Supervisord start | Start supervisord service for AMI/OVF] ********
ok: [localhost]

TASK [Check that supervisor socket exists] *************************************
ok: [localhost]

TASK [Supervisord start | Start supervisord for docker] ****************
skipping: [localhost]

TASK [Run initialization playbook] *********************************************

TASK [initialization : Get current version] ************************************
ok: [localhost]

TASK [initialization : Get image version] **************************************
ok: [localhost]

TASK [initialization : Set current version if VERSION fail doen’t exist] *******
skipping: [localhost]

TASK [initialization : Setting current and image version] **********************
ok: [localhost]

TASK [initialization : Setting current and image version] **********************
ok: [localhost]

TASK [initialization : Print current version] **********************************
ok: [localhost] => {
“msg”: “Current version: 2.26.0 Image Version: 2.26.0”
}

PLAY RECAP *********************************************************************
localhost : ok=15 changed=1 unreachable=0 failed=0 skipped=3 rescued=0 ignored=0

time=“2022-02-10T19:07:04Z” level=info msg=“Waiting for Grafana dashboards update to finish…”

1 Like

Do you have CLI access to your instance? in /srv/logs does anything jump out if you tail -f pmm-managed.log that might shed light? If no CLI access you can get the complete logs by going to the PMM settings page (gear icon → PMM Settings) and you’ll see the “download server diagnostics” which will give all the logs.

1 Like

Yes, I do have access by CLI. PMM site keeps going down so can’t download from UI. Due to being a new user, I have limitation amount of logs but the important ones are listed below. well, I hope so…

INFO[2022-02-10T21:31:55.149+00:00] Downloading templates from check.percona.com … component=management/ia/templates
INFO[2022-02-10T21:31:55.319+00:00] Updating settings… component=setup
INFO[2022-02-10T21:31:55.321+00:00] Updating supervisord configuration… component=setup
WARN[2022-02-10T21:31:55.321+00:00] Failed to get Percona SSO Details: sql: no rows in result set
failed to get Percona SSO Details
github.com/percona/pmm-managed/models.GetPerconaSSODetails
/home/builder/rpm/BUILD/pmm-managed-6914083707b5478605e7bc816de8fc68b5f511f6/src/github.com/percona/pmm-managed/models/percona_sso_model_helpers.go:45
main.setup
/home/builder/rpm/BUILD/pmm-managed-6914083707b5478605e7bc816de8fc68b5f511f6/src/github.com/percona/pmm-managed/main.go:471
main.main
/home/builder/rpm/BUILD/pmm-managed-6914083707b5478605e7bc816de8fc68b5f511f6/src/github.com/percona/pmm-managed/main.go:766
runtime.main
/usr/local/go/src/runtime/proc.go:255
runtime.goexit
/usr/local/go/src/runtime/asm_amd64.s:1581. component=setup

1 Like

Hmmm…the warning about SSO details is interesting but I can’t see that blocking the dashboard upgrade. Did you previously connect your Percona Account to PMM?

If you run egrep ERROR pmm-managed.log does anything more show up?

What are the most recently touched files in /srv/logs (ls -alrt /srv/logs/) and do any of them have errors? maybe grafana.log?

1 Like

This is what the pmm-agent.log is showing. Grafana is not loading and failing also. I will send some grafana logs on the next post.

INFO[2022-02-11T13:29:48.469+00:00] Using /usr/local/percona/pmm2/exporters/node_exporter component=main
INFO[2022-02-11T13:29:48.470+00:00] Using /usr/local/percona/pmm2/exporters/mysqld_exporter component=main
INFO[2022-02-11T13:29:48.470+00:00] Using /usr/local/percona/pmm2/exporters/mongodb_exporter component=main
INFO[2022-02-11T13:29:48.470+00:00] Using /usr/local/percona/pmm2/exporters/postgres_exporter component=main
INFO[2022-02-11T13:29:48.470+00:00] Using /usr/local/percona/pmm2/exporters/proxysql_exporter component=main
INFO[2022-02-11T13:29:48.470+00:00] Using /usr/local/percona/pmm2/exporters/rds_exporter component=main
INFO[2022-02-11T13:29:48.470+00:00] Using /usr/local/percona/pmm2/exporters/azure_exporter component=main
INFO[2022-02-11T13:29:48.470+00:00] Using /usr/local/percona/pmm2/exporters/vmagent component=main
INFO[2022-02-11T13:29:48.470+00:00] Starting local API server on http://127.0.0.1:7777/ … component=local-server/JSON
INFO[2022-02-11T13:29:48.470+00:00] Starting… component=client
INFO[2022-02-11T13:29:48.470+00:00] Connecting to https://127.0.0.1:443/ … component=client
INFO[2022-02-11T13:29:48.471+00:00] Started. component=local-server/JSON
INFO[2022-02-11T13:29:48.472+00:00] Connected to 127.0.0.1:443. component=client
INFO[2022-02-11T13:29:48.472+00:00] Establishing two-way communication channel … component=client
ERRO[2022-02-11T13:29:48.473+00:00] Failed to establish two-way communication channel: Internal Server Error: HTTP status code 500; transport: received the unexpected content-type “text/html”. component=client
INFO[2022-02-11T13:29:49.370+00:00] Connecting to https://127.0.0.1:443/ … component=client
INFO[2022-02-11T13:29:49.373+00:00] Connected to 127.0.0.1:443. component=client
INFO[2022-02-11T13:29:49.373+00:00] Establishing two-way communication channel … component=client
ERRO[2022-02-11T13:29:49.373+00:00] Failed to establish two-way communication channel: Internal Server Error: HTTP status code 500; transport: received the unexpected content-type “text/html”. component=client
INFO[2022-02-11T13:29:50.924+00:00] Connecting to https://127.0.0.1:443/ … component=client
INFO[2022-02-11T13:29:50.927+00:00] Connected to 127.0.0.1:443. component=client
INFO[2022-02-11T13:29:50.927+00:00] Establishing two-way communication channel … component=client
ERRO[2022-02-11T13:29:50.927+00:00] Failed to establish two-way communication channel: Internal Server Error: HTTP status code 500; transport: received the unexpected content-type “text/html”. component=client
INFO[2022-02-11T13:29:53.124+00:00] Connecting to https://127.0.0.1:443/ … component=client
INFO[2022-02-11T13:29:53.127+00:00] Connected to 127.0.0.1:443.

1 Like

Here are the logs for Grafana.

t=2022-02-10T11:38:03+0000 lvl=info msg=“Registering plugin” logger=plugins id=pmm-check-panel-home
t=2022-02-10T11:38:03+0000 lvl=info msg=“Registering plugin” logger=plugins id=pmm-qan-app-panel
t=2022-02-10T11:38:03+0000 lvl=info msg=“Registering plugin” logger=plugins id=camptocamp-prometheus-alertmanager-datasource
t=2022-02-10T11:38:03+0000 lvl=info msg=“Registering plugin” logger=plugins id=grafana-polystat-panel
t=2022-02-10T11:38:03+0000 lvl=info msg=“Registering plugin” logger=plugins id=grafana-worldmap-panel
t=2022-02-10T11:38:03+0000 lvl=info msg=“Registering plugin” logger=plugins id=pmm-app
t=2022-02-10T11:38:03+0000 lvl=info msg=“Registering plugin” logger=plugins id=jdbranham-diagram-panel
t=2022-02-10T11:38:03+0000 lvl=info msg=“Registering plugin” logger=plugins id=natel-discrete-panel
t=2022-02-10T11:38:03+0000 lvl=info msg=“Registering plugin” logger=plugins id=pmm-pt-summary-datasource
t=2022-02-10T11:38:03+0000 lvl=info msg=“Registering plugin” logger=plugins id=pmm-pt-summary-panel
t=2022-02-10T11:38:03+0000 lvl=info msg=“Registering plugin” logger=plugins id=pmm-update
t=2022-02-10T11:38:03+0000 lvl=info msg=“Registering plugin” logger=plugins id=yesoreyeram-boomtable-panel
{“@level”:“debug”,“@message”:“datasource: registering query type handler”,“@timestamp”:“2022-02-10T11:38:03.483244Z”,“queryType”:“exponential_heatmap_bucket_data”}
{“@level”:“debug”,“@message”:“datasource: registering query type handler”,“@timestamp”:“2022-02-10T11:38:03.483304Z”,“queryType”:“linear_heatmap_bucket_data”}
{“@level”:“debug”,“@message”:“datasource: registering query type handler”,“@timestamp”:“2022-02-10T11:38:03.483316Z”,“queryType”:“random_walk”}
{“@level”:“debug”,“@message”:“datasource: registering query type handler”,“@timestamp”:“2022-02-10T11:38:03.483329Z”,“queryType”:“predictable_pulse”}
{“@level”:“debug”,“@message”:“datasource: registering query type handler”,“@timestamp”:“2022-02-10T11:38:03.483338Z”,“queryType”:“predictable_csv_wave”}
{“@level”:“debug”,“@message”:“datasource: registering query type handler”,“@timestamp”:“2022-02-10T11:38:03.483348Z”,“queryType”:“random_walk_table”}
{“@level”:“debug”,“@message”:“datasource: registering query type handler”,“@timestamp”:“2022-02-10T11:38:03.483361Z”,“queryType”:“slow_query”}
{“@level”:“debug”,“@message”:“datasource: registering query type handler”,“@timestamp”:“2022-02-10T11:38:03.483369Z”,“queryType”:“no_data_points”}
{“@level”:“debug”,“@message”:“datasource: registering query type handler”,“@timestamp”:“2022-02-10T11:38:03.483385Z”,“queryType”:“datapoints_outside_range”}
{“@level”:“debug”,“@message”:“datasource: registering query type handler”,“@timestamp”:“2022-02-10T11:38:03.483407Z”,“queryType”:“manual_entry”}
{“@level”:“debug”,“@message”:“datasource: registering query type handler”,“@timestamp”:“2022-02-10T11:38:03.483431Z”,“queryType”:“csv_metric_values”}
{“@level”:“debug”,“@message”:“datasource: registering query type handler”,“@timestamp”:“2022-02-10T11:38:03.483449Z”,“queryType”:“streaming_client”}
{“@level”:“debug”,“@message”:“datasource: registering query type handler”,“@timestamp”:“2022-02-10T11:38:03.483530Z”,“queryType”:“live”}
{“@level”:“debug”,“@message”:“datasource: registering query type handler”,“@timestamp”:“2022-02-10T11:38:03.483548Z”,“queryType”:“grafana_api”}
{“@level”:“debug”,“@message”:“datasource: registering query type handler”,“@timestamp”:“2022-02-10T11:38:03.483559Z”,“queryType”:“arrow”}
{“@level”:“debug”,“@message”:“datasource: registering query type handler”,“@timestamp”:“2022-02-10T11:38:03.483588Z”,“queryType”:“annotations”}
{“@level”:“debug”,“@message”:“datasource: registering query type handler”,“@timestamp”:“2022-02-10T11:38:03.483612Z”,“queryType”:“table_static”}
{“@level”:“debug”,“@message”:“datasource: registering query type handler”,“@timestamp”:“2022-02-10T11:38:03.483671Z”,“queryType”:“random_walk_with_error”}
{“@level”:“debug”,“@message”:“datasource: registering query type handler”,“@timestamp”:“2022-02-10T11:38:03.483671Z”,“queryType”:“server_error_500”}
{“@level”:“debug”,“@message”:“datasource: registering query type handler”,“@timestamp”:“2022-02-10T11:38:03.483688Z”,“queryType”:“logs”}
{“@level”:“debug”,“@message”:“datasource: registering query type handler”,“@timestamp”:“2022-02-10T11:38:03.483708Z”,“queryType”:“node_graph”}
{“@level”:“debug”,“@message”:“datasource: registering query type fallback handler”,“@timestamp”:“2022-02-10T11:38:03.483754Z”}
t=2022-02-10T11:38:03+0000 lvl=info msg=“Updating app from configuration " logger=provisioning.plugins type=pmm-app enabled=true
t=2022-02-10T11:38:03+0000 lvl=info msg=“HTTP Server Listen” logger=http.server address=[::]:3000 protocol=http subUrl=/graph socket=
t=2022-02-10T11:38:04+0000 lvl=info msg=“Starting ClickHouse datasource backend…” logger=plugins.backend pluginId=vertamedia-clickhouse-datasource
t=2022-02-10T11:38:23+0000 lvl=eror msg=“Data proxy error” logger=data-proxy-log userId=2 orgId=1 uname=userid path=/api/datasources/proxy/7/api/v1/series remote_addr=“xx.xxx.xx.xxx, xx.xxx.xx.xxx” referer=“https://server/graph/d/pmm-home/home-dashboard?orgId=1&refresh=1m” error=“http: proxy error: dial tcp 127.0.0.1:9090: connect: connection refused”
t=2022-02-10T11:38:23+0000 lvl=eror msg=“Request Completed” logger=context userId=2 orgId=1 uname=userid method=POST path=/api/datasources/proxy/7/api/v1/series status=502 remote_addr=“xx.xxx.xx.xxx, xx.xxx.xx.xxx” time_ms=1 size=0 referer=“https://server/graph/d/pmm-home/home-dashboard?orgId=1&refresh=1m
t=2022-02-10T11:38:23+0000 lvl=eror msg=“Data proxy error” logger=data-proxy-log userId=2 orgId=1 uname=userid path=/api/datasources/proxy/7/api/v1/series remote_addr=“xx.xxx.xx.xxx, xx.xxx.xx.xxx” referer=“https://server/graph/d/pmm-home/home-dashboard?orgId=1&refresh=1m” error=“http: proxy error: dial tcp 127.0.0.1:9090: connect: connection refused”
t=2022-02-10T11:38:23+0000 lvl=eror msg=“Request Completed” logger=context userId=2 orgId=1 uname=userid method=POST path=/api/datasources/proxy/7/api/v1/series status=502 remote_addr=“xx.xxx.xx.xxx, xx.xxx.xx.xxx” time_ms=1 size=0 referer=“https://server/graph/d/pmm-home/home-dashboard?orgId=1&refresh=1m
t=2022-02-10T11:38:23+0000 lvl=eror msg=“Data proxy error” logger=data-proxy-log userId=2 orgId=1 uname=userid path=/api/datasources/proxy/7/api/v1/series remote_addr=“xx.xxx.xx.xxx, xx.xxx.xx.xxx” referer=“https://server/graph/d/pmm-home/home-dashboard?orgId=1&refresh=1m” error=“http: proxy error: dial tcp 127.0.0.1:9090: connect: connection refused”
t=2022-02-10T11:38:23+0000 lvl=eror msg=“Request Completed” logger=context userId=2 orgId=1 uname=userid method=POST path=/api/datasources/proxy/7/api/v1/series status=502 remote_addr=“xx.xxx.xx.xxx, xx.xxx.xx.xxx” time_ms=2 size=0 referer=“https://server/graph/d/pmm-home/home-dashboard?orgId=1&refresh=1m
t=2022-02-10T11:38:23+0000 lvl=eror msg=“Data proxy error” logger=data-proxy-log userId=2 orgId=1 uname=userid path=/api/datasources/proxy/7/api/v1/series remote_addr=“xx.xxx.xx.xxx, xx.xxx.xx.xxx” referer=“https://server/graph/d/pmm-home/home-dashboard?orgId=1&refresh=1m” error=“http: proxy error: dial tcp 127.0.0.1:9090: connect: connection refused”
t=2022-02-10T11:38:23+0000 lvl=eror msg=“Request Completed” logger=context userId=2 orgId=1 uname=userid method=POST path=/api/datasources/proxy/7/api/v1/series status=502 remote_addr=“xx.xxx.xx.xxx, xx.xxx.xx.xxx” time_ms=1 size=0 referer=“server/graph/d/pmm-home/home-dashboard?orgId=1&refresh=1m”
t=2022-02-10T11:38:23+0000 lvl=eror msg=“Data proxy error” logger=data-proxy-log userId=2 orgId=1 uname=userid path=/api/datasources/proxy/7/api/v1/series remote_addr=“xx.xxx.xx.xxx, xx.xxx.xx.xxx” referer=“https://server/graph/d/pmm-home/home-dashboard?orgId=1&refresh=1m” error=“http: proxy error: dial tcp 127.0.0.1:9090: connect: connection refused”
t=2022-02-10T11:38:23+0000 lvl=eror msg=“Request Completed” logger=context userId=2 orgId=1 uname=userid method=POST path=/api/datasources/proxy/7/api/v1/series status=502 remote_addr=“xx.xxx.xx.xxx, xx.xxx.xx.xxx” time_ms=0 size=0 referer=“https://server/graph/d/pmm-home/home-dashboard?orgId=1&refresh=1m
t=2022-02-10T11:38:23+0000 lvl=eror msg=“Data proxy error” logger=data-proxy-log userId=2 orgId=1 uname=userid path=/api/datasources/proxy/7/api/v1/series remote_addr=“xx.xxx.xx.xxx, xx.xxx.xx.xxx” referer=“https://server/graph/d/pmm-home/home-dashboard?orgId=1&refresh=1m” error=“http: proxy error: dial tcp 127.0.0.1:9090: connect: connection refused”
t=2022-02-10T11:38:23+0000 lvl=eror msg=“Request Completed” logger=context userId=2 orgId=1 uname=userid method=POST path=/api/datasources/proxy/7/api/v1/series status=502 remote_addr=“xx.xxx.xx.xxx, xx.xxx.xx.xxx” time_ms=1 size=0 referer=“server/graph/d/pmm-home/home-dashboard?orgId=1&refresh=1m”
t=2022-02-10T11:38:23+0000 lvl=eror msg=“Data proxy error” logger=data-proxy-log userId=2 orgId=1 uname=userid path=/api/datasources/proxy/7/api/v1/series remote_addr=“xx.xxx.xx.xxx, xx.xxx.xx.xxx” referer=“https://server/graph/d/pmm-home/home-dashboard?orgId=1&refresh=1m” error=“http: proxy error: dial tcp 127.0.0.1:9090: connect: connection refused”
t=2022-02-10T11:38:23+0000 lvl=eror msg=“Request Completed” logger=context userId=2 orgId=1 uname=userid method=POST path=/api/datasources/proxy/7/api/v1/series status=502 remote_addr=“xx.xxx.xx.xxx, xx.xxx.xx.xxx” time_ms=1 size=0 referer=“https://server/graph/d/pmm-home/home-dashboard?orgId=1&refresh=1m
t=2022-02-10T11:38:23+0000 lvl=eror msg=“Data proxy error” logger=data-proxy-log userId=2 orgId=1 uname=userid path=/api/datasources/proxy/7/api/v1/series remote_addr=“xx.xxx.xx.xxx, xx.xxx.xx.xxx” referer=“https://server/graph/d/pmm-home/home-dashboard?orgId=1&refresh=1m” error=“http: proxy error: dial tcp 127.0.0.1:9090: connect: connection refused”
t=2022-02-10T11:38:23+0000 lvl=eror msg=“Request Completed” logger=context userId=2 orgId=1 uname=userid method=POST path=/api/datasources/proxy/7/api/v1/series status=502 remote_addr=“xx.xxx.xx.xxx, xx.xxx.xx.xxx” time_ms=1 size=0 referer=“server/graph/d/pmm-
t=2022-02-10T11:55:56+0000 lvl=info msg=“Shutdown started” logger=server reason=“System signal: terminated”
t=2022-02-10T11:59:43+0000 lvl=info msg=“Starting Grafana” logger=server version=7.5.11 commit=v7.5.11-2.26.0 branch=master compiled=2022-02-02T11:33:38+0000
t=2022-02-10T11:59:43+0000 lvl=info msg=“Config loaded from” logger=settings file=/usr/share/grafana/conf/defaults.ini
t=2022-02-10T11:59:43+0000 lvl=info msg=“Config loaded from” logger=settings file=/etc/grafana/grafana.ini
t=2022-02-10T11:59:43+0000 lvl=info msg=“Config overridden from command line” logger=settings arg=“default.paths.data=/srv/grafana”
t=2022-02-10T11:59:43+0000 lvl=info msg=“Config overridden from command line” logger=settings arg=“default.paths.logs=/srv/logs”
t=2022-02-10T11:59:43+0000 lvl=info msg=“Config overridden from command line” logger=settings arg=“default.paths.plugins=/srv/grafana/plugins”
t=2022-02-10T11:59:43+0000 lvl=info msg=“Config overridden from command line” logger=settings arg=“default.server.root_url=https://%(domain)s/graph”
t=2022-02-10T11:59:43+0000 lvl=info msg=“Config overridden from command line” logger=settings arg=“default.log.mode=console”
t=2022-02-10T11:59:43+0000 lvl=info msg=“Config overridden from command line” logger=settings arg=“default.log.console.format=console”
t=2022-02-10T11:59:43+0000 lvl=info msg=“Path Home” logger=settings path=/usr/share/grafana
t=2022-02-10T11:59:43+0000 lvl=info msg=“Path Data” logger=settings path=/srv/grafana
t=2022-02-10T11:59:43+0000 lvl=info msg=“Path Logs” logger=settings path=/srv/logs
t=2022-02-10T11:59:43+0000 lvl=info msg=“Path Plugins” logger=settings path=/srv/grafana/plugins
t=2022-02-10T11:59:43+0000 lvl=info msg=“Path Provisioning” logger=settings path=/usr/share/grafana/conf/provisioning
t=2022-02-10T11:59:43+0000 lvl=info msg=“App mode production” logger=settings
t=2022-02-10T11:59:43+0000 lvl=info msg=“Connecting to DB” logger=sqlstore dbtype=sqlite3
t=2022-02-10T11:59:43+0000 lvl=warn msg=“SQLite database file has broader permissions than it should” logger=sqlstore path=/srv/grafana/grafana.db mode=-rw-r–r-- expected=-rw-r-----
t=2022-02-10T11:59:43+0000 lvl=info msg=“Starting DB migrations” logger=migrator
t=2022-02-10T11:59:43+0000 lvl=info msg=“migrations completed” logger=migrator performed=0 skipped=279 duration=2.174145ms
t=2022-02-10T11:59:43+0000 lvl=info msg=“Starting plugin search” logger=plugins
t=2022-02-10T11:59:44+0000 lvl=info msg=“Registering plugin” logger=plugins id=pmm-qan-app-panel
t=2022-02-10T11:59:44+0000 lvl=info msg=“Registering plugin” logger=plugins id=yesoreyeram-boomtable-panel
t=2022-02-10T11:59:44+0000 lvl=info msg=“Registering plugin” logger=plugins id=digiapulssi-breadcrumb-panel
t=2022-02-10T11:59:44+0000 lvl=info msg=“Registering plugin” logger=plugins id=grafana-piechart-panel
t=2022-02-10T11:59:44+0000 lvl=info msg=“Registering plugin” logger=plugins id=natel-discrete-panel
t=2022-02-10T11:59:44+0000 lvl=info msg=“Registering plugin” logger=plugins id=pmm-app
t=2022-02-10T11:59:44+0000 lvl=info msg=“Registering plugin” logger=plugins id=petrslavotinek-carpetplot-panel
t=2022-02-10T11:59:44+0000 lvl=info msg=“Registering plugin” logger=plugins id=pmm-pt-summary-datasource
t=2022-02-10T11:59:44+0000 lvl=info msg=“Registering plugin” logger=plugins id=pmm-update
t=2022-02-10T11:59:44+0000 lvl=info msg=“Registering plugin” logger=plugins id=pmm-check-panel-home
t=2022-02-10T11:59:44+0000 lvl=warn msg=“Running an unsigned backend plugin” logger=plugins pluginID=vertamedia-clickhouse-datasource pluginDir=/srv/grafana/plugins/vertamedia-clickhouse-datasource/dist
t=2022-02-10T11:59:44+0000 lvl=info msg=“Registering plugin” logger=plugins id=vertamedia-clickhouse-datasource
t=2022-02-10T11:59:44+0000 lvl=info msg=“Registering plugin” logger=plugins id=camptocamp-prometheus-alertmanager-datasource
t=2022-02-10T11:59:44+0000 lvl=info msg=“Registering plugin” logger=plugins id=grafana-polystat-panel
t=2022-02-10T11:59:44+0000 lvl=info msg=“Registering plugin” logger=plugins id=grafana-worldmap-panel
t=2022-02-10T11:59:44+0000 lvl=info msg=“Registering plugin” logger=plugins id=jdbranham-diagram-panel
t=2022-02-10T11:59:44+0000 lvl=info msg=“Registering plugin” logger=plugins id=pmm-pt-summary-panel
{”@level”:“debug”,“@message”:“datasource: registering query type handler”,“@timestamp”:“2022-02-10T11:59:44.962002Z”,“queryType”:“exponential_heatmap_bucket_data”}
{“@level”:“debug”,“@message”:“datasource: registering query type handler”,“@timestamp”:“2022-02-10T11:59:44.962077Z”,“queryType”:“linear_heatmap_bucket_data”}
{“@level”:“debug”,“@message”:“datasource: registering query type handler”,“@timestamp”:“2022-02-10T11:59:44.962098Z”,“queryType”:“random_walk”}
{“@level”:“debug”,“@message”:“datasource: registering query type handler”,“@timestamp”:“2022-02-10T11:59:44.962121Z”,“queryType”:“predictable_pulse”}
{“@level”:“debug”,“@message”:“datasource: registering query type handler”,“@timestamp”:“2022-02-10T11:59:44.962131Z”,“queryType”:“predictable_csv_wave”}
{“@level”:“debug”,“@message”:“datasource: registering query type handler”,“@timestamp”:“2022-02-10T11:59:44.962148Z”,“queryType”:“random_walk_table”}
{“@level”:“debug”,“@message”:“datasource: registering query type handler”,“@timestamp”:“2022-02-10T11:59:44.962161Z”,“queryType”:“slow_query”}
{“@level”:“debug”,“@message”:“datasource: registering query type handler”,“@timestamp”:“2022-02-10T11:59:44.962177Z”,“queryType”:“no_data_points”}
{“@level”:“debug”,“@message”:“datasource: registering query type handler”,“@timestamp”:“2022-02-10T11:59:44.962197Z”,“queryType”:“datapoints_outside_range”}
{“@level”:“debug”,“@message”:“datasource: registering query type handler”,“@timestamp”:“2022-02-10T11:59:44.962214Z”,“queryType”:“manual_entry”}
{“@level”:“debug”,“@message”:“datasource: registering query type handler”,“@timestamp”:“2022-02-10T11:59:44.962229Z”,“queryType”:“csv_metric_values”}
{“@level”:“debug”,“@message”:“datasource: registering query type handler”,“@timestamp”:“2022-02-10T11:59:44.962244Z”,“queryType”:“streaming_client”}
{“@level”:“debug”,“@message”:“datasource: registering query type handler”,“@timestamp”:“2022-02-10T11:59:44.962294Z”,“queryType”:“live”}
{“@level”:“debug”,“@message”:“datasource: registering query type handler”,“@timestamp”:“2022-02-10T11:59:44.962309Z”,“queryType”:“grafana_api”}
{“@level”:“debug”,“@message”:“datasource: registering query type handler”,“@timestamp”:“2022-02-10T11:59:44.962329Z”,“queryType”:“arrow”}
{“@level”:“debug”,“@message”:“datasource: registering query type handler”,“@timestamp”:“2022-02-10T11:59:44.962341Z”,“queryType”:“annotations”}
{“@level”:“debug”,“@message”:“datasource: registering query type handler”,“@timestamp”:“2022-02-10T11:59:44.962356Z”,“queryType”:“table_static”}
{“@level”:“debug”,“@message”:“datasource: registering query type handler”,“@timestamp”:“2022-02-10T11:59:44.962370Z”,“queryType”:“random_walk_with_error”}
{“@level”:“debug”,“@message”:“datasource: registering query type handler”,“@timestamp”:“2022-02-10T11:59:44.962391Z”,“queryType”:“server_error_500”}
{“@level”:“debug”,“@message”:“datasource: registering query type handler”,“@timestamp”:“2022-02-10T11:59:44.962408Z”,“queryType”:“logs”}
{“@level”:“debug”,“@message”:“datasource: registering query type handler”,“@timestamp”:“2022-02-10T11:59:44.962422Z”,“queryType”:“node_graph”}
{“@level”:“debug”,“@message”:“datasource: registering query type fallback handler”,“@timestamp”:“2022-02-10T11:59:44.962440Z”}
t=2022-02-10T11:59:45+0000 lvl=info msg=“Updating app from configuration " logger=provisioning.plugins type=pmm-app enabled=true
t=2022-02-10T11:59:46+0000 lvl=info msg=“HTTP Server Listen” logger=http.server address=[::]:3000 protocol=http subUrl=/graph socket=
t=2022-02-10T11:59:47+0000 lvl=info msg=“Starting ClickHouse datasource backend…” logger=plugins.backend pluginId=vertamedia-clickhouse-datasource
t=2022-02-10T18:40:10+0000 lvl=eror msg=“Failed to look up user based on cookie” logger=context error=“user token not found”
t=2022-02-10T18:40:10+0000 lvl=info msg=“Request Completed” logger=context userId=0 orgId=0 uname= method=GET path=/ status=302 remote_addr=“xx.xxx.xx.xxx, xx.xxx.xx.xxx” time_ms=0 size=35 referer=
t=2022-02-10T18:40:10+0000 lvl=eror msg=“Failed to look up user based on cookie” logger=context error=“user token not found”
t=2022-02-10T18:40:33+0000 lvl=eror msg=“Failed to look up user based on cookie” logger=context error=“user token not found”
t=2022-02-10T18:40:33+0000 lvl=info msg=“LDAP enabled, reading config file” logger=ldap file=/etc/grafana/ldap.toml
t=2022-02-10T18:40:33+0000 lvl=info msg=“Successful Login” logger=http.server User=
t=2022-02-11T12:02:49+0000 lvl=info msg=“Shutdown started” logger=server reason=“System signal: terminated”
t=2022-02-11T12:06:36+0000 lvl=info msg=“Starting Grafana” logger=server version=7.5.11 commit=v7.5.11-2.26.0 branch=master compiled=2022-02-02T11:33:38+0000
t=2022-02-11T12:06:36+0000 lvl=info msg=“Config loaded from” logger=settings file=/usr/share/grafana/conf/defaults.ini
t=2022-02-11T12:06:36+0000 lvl=info msg=“Config loaded from” logger=settings file=/etc/grafana/grafana.ini
t=2022-02-11T12:06:36+0000 lvl=info msg=“Config overridden from command line” logger=settings arg=“default.paths.data=/srv/grafana”
t=2022-02-11T12:06:36+0000 lvl=info msg=“Config overridden from command line” logger=settings arg=“default.paths.logs=/srv/logs”
t=2022-02-11T12:06:36+0000 lvl=info msg=“Config overridden from command line” logger=settings arg=“default.paths.plugins=/srv/grafana/plugins”
t=2022-02-11T12:06:36+0000 lvl=info msg=“Config overridden from command line” logger=settings arg=“default.server.root_url=https://%(domain)s/graph”
t=2022-02-11T12:06:36+0000 lvl=info msg=“Config overridden from command line” logger=settings arg=“default.log.mode=console”
t=2022-02-11T12:06:36+0000 lvl=info msg=“Config overridden from command line” logger=settings arg=“default.log.console.format=console”
t=2022-02-11T12:06:36+0000 lvl=info msg=“Path Home” logger=settings path=/usr/share/grafana
t=2022-02-11T12:06:36+0000 lvl=info msg=“Path Data” logger=settings path=/srv/grafana
t=2022-02-11T12:06:36+0000 lvl=info msg=“Path Logs” logger=settings path=/srv/logs
t=2022-02-11T12:06:36+0000 lvl=info msg=“Path Plugins” logger=settings path=/srv/grafana/plugins
t=2022-02-11T12:06:36+0000 lvl=info msg=“Path Provisioning” logger=settings path=/usr/share/grafana/conf/provisioning
t=2022-02-11T12:06:36+0000 lvl=info msg=“App mode production” logger=settings
t=2022-02-11T12:06:36+0000 lvl=info msg=“Connecting to DB” logger=sqlstore dbtype=sqlite3
t=2022-02-11T12:06:36+0000 lvl=warn msg=“SQLite database file has broader permissions than it should” logger=sqlstore path=/srv/grafana/grafana.db mode=-rw-r–r-- expected=-rw-r-----
t=2022-02-11T12:06:36+0000 lvl=info msg=“Starting DB migrations” logger=migrator
t=2022-02-11T12:06:36+0000 lvl=info msg=“migrations completed” logger=migrator performed=0 skipped=279 duration=3.325436ms
t=2022-02-11T12:06:36+0000 lvl=info msg=“Starting plugin search” logger=plugins
t=2022-02-11T12:06:38+0000 lvl=info msg=“Registering plugin” logger=plugins id=grafana-piechart-panel
t=2022-02-11T12:06:38+0000 lvl=info msg=“Registering plugin” logger=plugins id=natel-discrete-panel
t=2022-02-11T12:06:38+0000 lvl=info msg=“Registering plugin” logger=plugins id=pmm-update
t=2022-02-11T12:06:38+0000 lvl=info msg=“Registering plugin” logger=plugins id=digiapulssi-breadcrumb-panel
t=2022-02-11T12:06:38+0000 lvl=info msg=“Registering plugin” logger=plugins id=grafana-polystat-panel
t=2022-02-11T12:06:38+0000 lvl=info msg=“Registering plugin” logger=plugins id=yesoreyeram-boomtable-panel
t=2022-02-11T12:06:38+0000 lvl=info msg=“Registering plugin” logger=plugins id=pmm-qan-app-panel
t=2022-02-11T12:06:38+0000 lvl=warn msg=“Running an unsigned backend plugin” logger=plugins pluginID=vertamedia-clickhouse-datasource pluginDir=/srv/grafana/plugins/vertamedia-clickhouse-datasource/dist
t=2022-02-11T12:06:38+0000 lvl=info msg=“Registering plugin” logger=plugins id=vertamedia-clickhouse-datasource
t=2022-02-11T12:06:38+0000 lvl=info msg=“Registering plugin” logger=plugins id=camptocamp-prometheus-alertmanager-datasource
t=2022-02-11T12:06:38+0000 lvl=info msg=“Registering plugin” logger=plugins id=grafana-worldmap-panel
t=2022-02-11T12:06:38+0000 lvl=info msg=“Registering plugin” logger=plugins id=jdbranham-diagram-panel
t=2022-02-11T12:06:38+0000 lvl=info msg=“Registering plugin” logger=plugins id=pmm-app
t=2022-02-11T12:06:38+0000 lvl=info msg=“Registering plugin” logger=plugins id=pmm-pt-summary-panel
t=2022-02-11T12:06:38+0000 lvl=info msg=“Registering plugin” logger=plugins id=petrslavotinek-carpetplot-panel
t=2022-02-11T12:06:38+0000 lvl=info msg=“Registering plugin” logger=plugins id=pmm-check-panel-home
t=2022-02-11T12:06:38+0000 lvl=info msg=“Registering plugin” logger=plugins id=pmm-pt-summary-datasource
{”@level":“debug”,“@message”:“datasource: registering query type handler”,“@timestamp”:“2022-02-11T12:06:38.725510Z”,“queryType”:“exponential_heatmap_bucket_data”}
{“@level”:“debug”,“@message”:“datasource: registering query type handler”,“@timestamp”:“2022-02-11T12:06:38.725619Z”,“queryType”:“linear_heatmap_bucket_data”}
{“@level”:“debug”,“@message”:“datasource: registering query type handler”,“@timestamp”:“2022-02-11T12:06:38.725662Z”,“queryType”:“random_walk”}
{“@level”:“debug”,“@message”:“datasource: registering query type handler”,“@timestamp”:“2022-02-11T12:06:38.725689Z”,“queryType”:“predictable_pulse”}
{“@level”:“debug”,“@message”:“datasource: registering query type handler”,“@timestamp”:“2022-02-11T12:06:38.725717Z”,“queryType”:“predictable_csv_wave”}
{“@level”:“debug”,“@message”:“datasource: registering query type handler”,“@timestamp”:“2022-02-11T12:06:38.725741Z”,“queryType”:“random_walk_table”}
{“@level”:“debug”,“@message”:“datasource: registering query type handler”,“@timestamp”:“2022-02-11T12:06:38.725771Z”,“queryType”:“slow_query”}
{“@level”:“debug”,“@message”:“datasource: registering query type handler”,“@timestamp”:“2022-02-11T12:06:38.725799Z”,“queryType”:“no_data_points”}
{“@level”:“debug”,“@message”:“datasource: registering query type handler”,“@timestamp”:“2022-02-11T12:06:38.725827Z”,“queryType”:“datapoints_outside_range”}
{“@level”:“debug”,“@message”:“datasource: registering query type handler”,“@timestamp”:“2022-02-11T12:06:38.725862Z”,“queryType”:“manual_entry”}
{“@level”:“debug”,“@message”:“datasource: registering query type handler”,“@timestamp”:“2022-02-11T12:06:38.725886Z”,“queryType”:“csv_metric_values”}
{“@level”:“debug”,“@message”:“datasource: registering query type handler”,“@timestamp”:“2022-02-11T12:06:38.725924Z”,“queryType”:“streaming_client”}
{“@level”:“debug”,“@message”:“datasource: registering query type handler”,“@timestamp”:“2022-02-11T12:06:38.725951Z”,“queryType”:“live”}
{“@level”:“debug”,“@message”:“datasource: registering query type handler”,“@timestamp”:“2022-02-11T12:06:38.726007Z”,“queryType”:“grafana_api”}
{“@level”:“debug”,“@message”:“datasource: registering query type handler”,“@timestamp”:“2022-02-11T12:06:38.726035Z”,“queryType”:“arrow”}
{“@level”:“debug”,“@message”:“datasource: registering query type handler”,“@timestamp”:“2022-02-11T12:06:38.726067Z”,“queryType”:“annotations”}
{“@level”:“debug”,“@message”:“datasource: registering query type handler”,“@timestamp”:“2022-02-11T12:06:38.726087Z”,“queryType”:“table_static”}
{“@level”:“debug”,“@message”:“datasource: registering query type handler”,“@timestamp”:“2022-02-11T12:06:38.726110Z”,“queryType”:“random_walk_with_error”}
{“@level”:“debug”,“@message”:“datasource: registering query type handler”,“@timestamp”:“2022-02-11T12:06:38.726136Z”,“queryType”:“server_error_500”}
{“@level”:“debug”,“@message”:“datasource: registering query type handler”,“@timestamp”:“2022-02-11T12:06:38.726166Z”,“queryType”:“logs”}
{“@level”:“debug”,“@message”:“datasource: registering query type handler”,“@timestamp”:“2022-02-11T12:06:38.726193Z”,“queryType”:“node_graph”}
{“@level”:“debug”,“@message”:“datasource: registering query type fallback handler”,“@timestamp”:“2022-02-11T12:06:38.726259Z”}
t=2022-02-11T12:06:39+0000 lvl=info msg="Updating app from configuration " logger=provisioning.plugins type=pmm-app enabled=true
t=2022-02-11T12:06:39+0000 lvl=info msg=“HTTP Server Listen” logger=http.server address=[::]:3000 protocol=http subUrl=/graph socket=
t=2022-02-11T12:06:39+0000 lvl=info msg=“Starting ClickHouse datasource backend…” logger=plugins.backend pluginId=vertamedia-clickhouse-datasource

1 Like

I believe this is the issue from pmm-managed. It can’t find the alert manager temp file.

INFO[2022-02-11T14:09:42.548+00:00] Using default SaaS host “check.percona.com”.
INFO[2022-02-11T14:09:42.549+00:00] Using SaaS host “check.percona.com”.
INFO[2022-02-11T14:09:42.549+00:00] Environment variable “PERCONA_PLATFORM_API_TIMEOUT” is not set, using “30s” as a default timeout for platform API. component=auth
INFO[2022-02-11T14:09:42.548+00:00] Starting server on http://127.0.0.1:7772/ … component=JSON
INFO[2022-02-11T14:09:42.548+00:00] Starting… component=checks
INFO[2022-02-11T14:09:42.548+00:00] Starting… component=victoriametrics
INFO[2022-02-11T14:09:43.549+00:00] Configuration reloaded. component=vmalert
ERRO[2022-02-11T14:09:43.585+00:00] Checking ‘/tmp/pmm-managed-config-alertmanager-2940248744’ FAILED: yaml: unmarshal errors:
line 51: field max_alerts not found in type config.plain

amtool: error: failed to validate 1 file(s)
component=alertmanager
ERRO[2022-02-11T14:09:43.585+00:00] Failed to update configuration, will retry: exit status 1
Checking ‘/tmp/pmm-managed-config-alertmanager-2940248744’ FAILED: yaml: unmarshal errors:
line 51: field max_alerts not found in type config.plain

amtool: error: failed to validate 1 file(s)

github.com/percona/pmm-managed/services/alertmanager.(*Service).validateConfig
/home/builder/rpm/BUILD/pmm-managed-6914083707b5478605e7bc816de8fc68b5f511f6/src/github.com/percona/pmm-managed/services/alertmanager/alertmanager.go:292
github.com/percona/pmm-managed/services/alertmanager.(*Service).configAndReload
/home/builder/rpm/BUILD/pmm-managed-6914083707b5478605e7bc816de8fc68b5f511f6/src/github.com/percona/pmm-managed/services/alertmanager/alertmanager.go:325
github.com/percona/pmm-managed/services/alertmanager.(*Service).updateConfiguration
/home/builder/rpm/BUILD/pmm-managed-6914083707b5478605e7bc816de8fc68b5f511f6/src/github.com/percona/pmm-managed/services/alertmanager/alertmanager.go:205
github.com/percona/pmm-managed/services/alertmanager.(*Service).Run
/home/builder/rpm/BUILD/pmm-managed-6914083707b5478605e7bc816de8fc68b5f511f6/src/github.com/percona/pmm-managed/services/alertmanager/alertmanager.go:173
main.main.func7
/home/builder/rpm/BUILD/pmm-managed-6914083707b5478605e7bc816de8fc68b5f511f6/src/github.com/percona/pmm-managed/main.go:843
runtime.goexit
/usr/local/go/src/runtime/asm_amd64.s:1581. component=alertmanager
INFO[2022-02-11T14:09:43.883+00:00] Done. component=victoriametrics
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x11afc0e]

1 Like

Hello @odemark1 . Did you use Integrated Alerting (this feature is in technical preview)? Or did you apply some changes to alertmanager base config (located in /srv/alertmanager/alertmanager.base.yml)?

Default base config looks like this:

# You can edit this file; changes will be preserved.

route:
    receiver: empty
    routes: []

receivers:
    - name: empty
1 Like

I apply custom changes to Alert Manager setup in the /srv/alertmanager/alertmanager.base.yml setup.

1 Like

Checking the /var/log/messages, it seems to point to pmm-agent not to start. I will provide the logs for pmm-agent in next posting.

Aug 30 11:47:42 ip-xx-xxx-xx-xx supervisord: 2021-08-30 11:47:42,033 INFO spawned: ‘clickhouse’ with pid 5924
Aug 30 11:47:42 ip-xx-xxx-xx-xx systemd: pmm-agent.service holdoff time over, scheduling restart.
Aug 30 11:47:42 ip-xx-xxx-xx-xx systemd: Stopped pmm-agent.
Aug 30 11:47:42 ip-xx-xxx-xx-xx systemd: Started pmm-agent.
Aug 30 11:47:42 ip-xx-xxx-xx-xx ansible-systemd: Invoked with no_block=False force=None name=pmm-agent daemon_reexec=False enabled=False daemon_reload=False state=stopped masked=None scope=None user=None
Aug 30 11:47:42 ip-xx-xxx-xx-xx pmm-agent: #033[36mINFO#033[0m[2021-08-30T11:47:42.553+00:00] Loading configuration file /usr/local/percona/pmm2/config/pmm-agent.yaml. #033[36mcomponent#033[0m=main
Aug 30 11:47:42 ip-xx-xxx-xx-xx pmm-agent: #033[36mINFO#033[0m[2021-08-30T11:47:42.553+00:00] Using /usr/local/percona/pmm2/exporters/node_exporter #033[36mcomponent#033[0m=main
Aug 30 11:47:42 ip-xx-xxx-xx-xx pmm-agent: #033[36mINFO#033[0m[2021-08-30T11:47:42.553+00:00] Using /usr/local/percona/pmm2/exporters/mysqld_exporter #033[36mcomponent#033[0m=main
Aug 30 11:47:42 ip-xx-xxx-xx-xx pmm-agent: #033[36mINFO#033[0m[2021-08-30T11:47:42.553+00:00] Using /usr/local/percona/pmm2/exporters/mongodb_exporter #033[36mcomponent#033[0m=main
Aug 30 11:47:42 ip-xx-xxx-xx-xx pmm-agent: #033[36mINFO#033[0m[2021-08-30T11:47:42.553+00:00] Using /usr/local/percona/pmm2/exporters/postgres_exporter #033[36mcomponent#033[0m=main
Aug 30 11:47:42 ip-xx-xxx-xx-xx pmm-agent: #033[36mINFO#033[0m[2021-08-30T11:47:42.553+00:00] Using /usr/local/percona/pmm2/exporters/proxysql_exporter #033[36mcomponent#033[0m=main
Aug 30 11:47:42 ip-xx-xxx-xx-xx pmm-agent: #033[36mINFO#033[0m[2021-08-30T11:47:42.553+00:00] Using /usr/local/percona/pmm2/exporters/rds_exporter #033[36mcomponent#033[0m=main
Aug 30 11:47:42 ip-xx-xxx-xx-xx pmm-agent: #033[36mINFO#033[0m[2021-08-30T11:47:42.553+00:00] Using /usr/local/percona/pmm2/exporters/azure_exporter #033[36mcomponent#033[0m=main
Aug 30 11:47:42 ip-xx-xxx-xx-xx pmm-agent: #033[36mINFO#033[0m[2021-08-30T11:47:42.553+00:00] Using /usr/local/percona/pmm2/exporters/vmagent #033[36mcomponent#033[0m=main
Aug 30 11:47:42 ip-xx-xxx-xx-xx pmm-agent: #033[36mINFO#033[0m[2021-08-30T11:47:42.553+00:00] Starting local API server on http://127.0.0.1:7777/#033[36mcomponent#033[0m=local-server/JSON
Aug 30 11:47:42 ip-xx-xxx-xx-xx pmm-agent: #033[36mINFO#033[0m[2021-08-30T11:47:42.553+00:00] Starting… #033[36mcomponent#033[0m=client
Aug 30 11:47:42 ip-xx-xxx-xx-xx pmm-agent: #033[36mINFO#033[0m[2021-08-30T11:47:42.553+00:00] Connecting to https://127.0.0.1:443/#033[36mcomponent#033[0m=client
Aug 30 11:47:42 ip-xx-xxx-xx-xx pmm-agent: #033[36mINFO#033[0m[2021-08-30T11:47:42.556+00:00] Started. #033[36mcomponent#033[0m=local-server/JSON
Aug 30 11:47:42 ip-xx-xxx-xx-xx pmm-agent: #033[31mPANI#033[0m[2021-08-30T11:47:42.556+00:00] listen tcp 127.0.0.1:7777: bind: address already in use #033[31mcomponent#033[0m=local-server/JSON
Aug 30 11:47:42 ip-xx-xxx-xx-xx pmm-agent: panic: (*logrus.Entry) 0xc0001fe2a0
Aug 30 11:47:42 ip-xx-xxx-xx-xx pmm-agent: goroutine 25 [running]:
Aug 30 11:47:42 ip-xx-xxx-xx-xx pmm-agent: github.com/sirupsen/logrus.Entry.log(0xc0000f6cb0, 0xc0006c1380, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, …)
Aug 30 11:47:42 ip-xx-xxx-xx-xx pmm-agent: /tmp/go/pkg/mod/github.com/sirupsen/logrus@v1.4.2/entry.go:239 +0x2ea
Aug 30 11:47:42 ip-xx-xxx-xx-xx pmm-agent: github.com/sirupsen/logrus.(*Entry).Log(0xc0001c09a0, 0x0, 0xc0004b9f90, 0x1, 0x1)
Aug 30 11:47:42 ip-xx-xxx-xx-xx pmm-agent: /tmp/go/pkg/mod/github.com/sirupsen/logrus@v1.4.2/entry.go:268 +0xf0
Aug 30 11:47:42 ip-xx-xxx-xx-xx pmm-agent: github.com/sirupsen/logrus.(*Entry).Panic(0xc0001c09a0, 0xc0004b9f90, 0x1, 0x1)
Aug 30 11:47:42 ip-xx-xxx-xx-xx pmm-agent: /tmp/go/pkg/mod/github.com/sirupsen/logrus@v1.4.2/entry.go:306 +0x55
Aug 30 11:47:42 ip-xx-xxx-xx-xx pmm-agent: github.com/percona/pmm-agent/agentlocal.(*Server).runJSONServer.func2(0xc0001c09a0, 0xc0001bc0e0)
Aug 30 11:47:42 ip-xx-xxx-xx-xx pmm-agent: /tmp/go/src/github.com/percona/pmm-agent/agentlocal/agent_local.go:297 +0xf2
Aug 30 11:47:42 ip-xx-xxx-xx-xx pmm-agent: created by github.com/percona/pmm-agent/agentlocal.(*Server).runJSONServer
Aug 30 11:47:42 ip-xx-xxx-xx-xx pmm-agent: /tmp/go/src/github.com/percona/pmm-agent/agentlocal/agent_local.go:294 +0xcf3
Aug 30 11:47:42 ip-xx-xxx-xx-xx systemd: pmm-agent.service: main process exited, code=exited, status=2/INVALIDARGUMENT
Aug 30 11:47:42 ip-xx-xxx-xx-xx systemd: Unit pmm-agent.service entered failed state.
Aug 30 11:47:42 ip-xx-xxx-xx-xx systemd: pmm-agent.service failed.
Aug 30 11:47:42 ip-xx-xxx-xx-xx supervisord: 2021-08-30 11:47:42,782 INFO exited: alertmanager (exit status 1; not expected)
Aug 30 11:47:42 ip-xx-xxx-xx-xx supervisord: 2021-08-30 11:47:42,794 INFO spawned: ‘alertmanager’ with pid 6010
Aug 30 11:47:43 ip-xx-xxx-xx-xx supervisord: 2021-08-30 11:47:43,035 INFO success: clickhouse entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
Aug 30 11:47:43 ip-xx-xxx-xx-xx supervisord: 2021-08-30 11:47:43,793 INFO success: alertmanager entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
Aug 30 11:47:43 ip-xx-xxx-xx-xx systemd: Stopped pmm-agent.
Aug 30 11:47:44 ip-xx-xxx-xx-xx ansible-yum: Invoked with lock_timeout=30 update_cache=False disable_excludes=None exclude= allow_downgrade=False disable_gpg_check=False conf_file=None use_backend=auto state=absent disablerepo= releasever=None skip_broken=False autoremove=False download_dir=None enable_plugin= installroot=/ install_weak_deps=True name=[‘percona-qan-app’, ‘mariadb-libs’, ‘logrotate’, ‘screen’] download_only=False bugfix=False list=None install_repoquery=True update_only=False disable_plugin= enablerepo= security=False validate_certs=True
Aug 30 11:47:45 ip-xx-xxx-xx-xx supervisord: 2021-08-30 11:47:45,276 INFO exited: alertmanager (exit status 1; not expected)
Aug 30 11:47:45 ip-xx-xxx-xx-xx supervisord: 2021-08-30 11:47:45,280 INFO spawned: ‘alertmanager’ with pid 6212
Aug 30 11:47:46 ip-xx-xxx-xx-xx supervisord: 2021-08-30 11:47:46,282 INFO success: alertmanager entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)

1 Like

I have checked pmm-managed log and it is failing. This is not happening on nodes that are working for PMM and were upgraded similarly with the same setup.

INFO[2022-02-11T17:01:12.940+00:00] Connecting to https://127.0.0.1:443/ … component=client
INFO[2022-02-11T17:01:12.943+00:00] Connected to 127.0.0.1:443. component=client
INFO[2022-02-11T17:01:12.943+00:00] Establishing two-way communication channel … component=client
INFO[2022-02-11T17:01:12.949+00:00] Two-way communication channel established in 5.642023ms. Estimated clock drift: 2.15166ms. component=client
INFO[2022-02-11T17:01:13.711+00:00] Connection closed. component=client
INFO[2022-02-11T17:01:13.711+00:00] Done. component=supervisor
INFO[2022-02-11T17:01:13.711+00:00] Done. component=actions-runner
INFO[2022-02-11T17:01:13.711+00:00] Done. component=client
INFO[2022-02-11T17:01:13.711+00:00] Stopped. component=local-server/JSON
INFO[2022-02-11T17:01:13.713+00:00] Done. component=local-server
INFO[2022-02-11T17:01:13.713+00:00] Loading configuration file /usr/local/percona/pmm2/config/pmm-agent.yaml. component=main
INFO[2022-02-11T17:01:13.713+00:00] Using /usr/local/percona/pmm2/exporters/node_exporter component=main
INFO[2022-02-11T17:01:13.713+00:00] Using /usr/local/percona/pmm2/exporters/mysqld_exporter component=main
INFO[2022-02-11T17:01:13.714+00:00] Using /usr/local/percona/pmm2/exporters/mongodb_exporter component=main
INFO[2022-02-11T17:01:13.714+00:00] Using /usr/local/percona/pmm2/exporters/postgres_exporter component=main
INFO[2022-02-11T17:01:13.714+00:00] Using /usr/local/percona/pmm2/exporters/proxysql_exporter component=main
INFO[2022-02-11T17:01:13.714+00:00] Using /usr/local/percona/pmm2/exporters/rds_exporter component=main
INFO[2022-02-11T17:01:13.714+00:00] Using /usr/local/percona/pmm2/exporters/azure_exporter component=main
INFO[2022-02-11T17:01:13.714+00:00] Using /usr/local/percona/pmm2/exporters/vmagent component=main
INFO[2022-02-11T17:01:13.714+00:00] Starting local API server on http://127.0.0.1:7777/ … component=local-server/JSON
INFO[2022-02-11T17:01:13.714+00:00] Starting… component=client
INFO[2022-02-11T17:01:13.714+00:00] Connecting to https://127.0.0.1:443/ … component=client
INFO[2022-02-11T17:01:13.714+00:00] Started. component=local-server/JSON
INFO[2022-02-11T17:01:13.717+00:00] Connected to 127.0.0.1:443. component=client
INFO[2022-02-11T17:01:13.717+00:00] Establishing two-way communication channel … component=client
ERRO[2022-02-11T17:01:13.717+00:00] Failed to establish two-way communication channel: Internal Server Error: HTTP status code 500; transport: received the unexpected content-type “text/html”. component=client
INFO[2022-02-11T17:01:14.845+00:00] Connecting to https://127.0.0.1:443/ … component=client
INFO[2022-02-11T17:01:14.848+00:00] Connected to 127.0.0.1:443. component=client
INFO[2022-02-11T17:01:14.848+00:00] Establishing two-way communication channel … component=client
ERRO[2022-02-11T17:01:14.849+00:00] Failed to establish two-way communication channel: Internal Server Error: HTTP status code 500; transport: received the unexpected content-type “text/html”. component=client
INFO[2022-02-11T17:01:16.088+00:00] Connecting to https://127.0.0.1:443/ … component=client
INFO[2022-02-11T17:01:16.091+00:00] Connected to 127.0.0.1:443. component=client
INFO[2022-02-11T17:01:16.091+00:00] Establishing two-way communication channel … component=client
ERRO[2022-02-11T17:01:16.091+00:00] Failed to establish two-way communication channel: Internal Server Error: HTTP status code 500; transport: received the unexpected content-type “text/html”. component=client
INFO[2022-02-11T17:01:18.181+00:00] Connecting to https://127.0.0.1:443/ … component=client
INFO[2022-02-11T17:01:18.183+00:00] Connected to 127.0.0.1:443. component=client
INFO[2022-02-11T17:01:18.183+00:00] Establishing two-way communication channel … component=client
ERRO[2022-02-11T17:01:18.184+00:00] Failed to establish two-way communication channel: Internal Server Error: HTTP status code 500; transport: received the unexpected content-type “text/html”. component=client
INFO[2022-02-11T17:01:22.122+00:00] Connecting to https://127.0.0.1:443/ … component=client
INFO[2022-02-11T17:01:22.125+00:00] Connected to 127.0.0.1:443. component=client
INFO[2022-02-11T17:01:22.125+00:00] Establishing two-way communication channel … component=client
ERRO[2022-02-11T17:01:22.126+00:00] Failed to establish two-way communication channel: Internal Server Error: HTTP status code 500; transport: received the unexpected content-type “text/html”. component=client
INFO[2022-02-11T17:01:26.513+00:00] Connecting to https://127.0.0.1:443/ … component=client
INFO[2022-02-11T17:01:26.515+00:00] Connected to 127.0.0.1:443. component=client
INFO[2022-02-11T17:01:26.515+00:00] Establishing two-way communication channel … component=client
ERRO[2022-02-11T17:01:26.516+00:00] Failed to establish two-way communication channel: Internal Server Error: HTTP status code 500; transport: received the unexpected content-type “text/html”. component=client
INFO[2022-02-11T17:01:34.596+00:00] Connecting to https://127.0.0.1:443/ … component=client
INFO[2022-02-11T17:01:34.599+00:00] Connected to 127.0.0.1:443. component=client
INFO[2022-02-11T17:01:34.599+00:00] Establishing two-way communication channel … component=client
ERRO[2022-02-11T17:01:34.599+00:00] Failed to establish two-way communication channel: Internal Server Error: HTTP status code 500; transport: received the unexpected content-type “text/html”. component=client
INFO[2022-02-11T17:01:45.984+00:00] Connecting to https://127.0.0.1:443/ … component=client
INFO[2022-02-11T17:01:45.986+00:00] Connected to 127.0.0.1:443. component=client
INFO[2022-02-11T17:01:45.986+00:00] Establishing two-way communication channel … component=client
ERRO[2022-02-11T17:01:45.987+00:00] Failed to establish two-way communication channel: Internal Server Error: HTTP status code 500; transport: received the unexpected content-type “text/html”. component=client
INFO[2022-02-11T17:01:57.913+00:00] Connecting to https://127.0.0.1:443/ … component=client
INFO[2022-02-11T17:01:57.917+00:00] Connected to 127.0.0.1:443. component=client
INFO[2022-02-11T17:01:57.917+00:00] Establishing two-way communication channel … component=client
ERRO[2022-02-11T17:01:57.917+00:00] Failed to establish two-way communication channel: Internal Server Error: HTTP status code 500; transport: received the unexpected content-type “text/html”. component=client
INFO[2022-02-11T17:02:09.359+00:00] Connecting to https://127.0.0.1:443/ … component=client
INFO[2022-02-11T17:02:09.361+00:00] Connected to 127.0.0.1:443. component=client
INFO[2022-02-11T17:02:09.361+00:00] Establishing two-way communication channel … component=client
ERRO[2022-02-11T17:02:09.362+00:00] Failed to establish two-way communication channel: Internal Server Error: HTTP status code 500; transport: received the unexpected content-type “text/html”. component=client

1 Like

I was hoping to get some feedback on how to address the issue with failure of PMM Upgrade from 2.25.0 to 2.26.0. This is using the AWS Marketplace PMM image. Upgrade did not complete fully even if restarting by running supervisorctl restart all. Grafana keeps crashing and pmm-managed keeps restarting. pmm-agent also is crashing based on the logs. Many of the logs are listed above.

PMM Grafana loads at first but then it goes to screen showing errors below. It ask to reload Grafana due to missing application file

Unable to find application file

Grafana has likely been updated. Please try reloading the page.

Loading chunk 2 failed.
(error: https://server/graph/public/build/default~DashboardPage~NgAlertingPage~sandbox.a28fb9668cf7589fa831.js)
ChunkLoadError: Loading chunk 2 failed.
(error: https://server/graph/public/build/default~DashboardPage~NgAlertingPage~sandbox.a28fb9668cf7589fa831.js)
at Function.i.e (https://server/graph/public/build/runtime.a28fb9668cf7589fa831.js:1:2826)
at r (https://server/graph/public/build/app.a28fb9668cf7589fa831.js:2:3262351)
at Object.invoke [as oldInvoke] (https://server/graph/public/build/angular~app.a28fb9668cf7589fa831.js:2:26513)
at https://server/graph/public/build/vendors~app.a28fb9668cf7589fa831.js:2:2335327
at Object.x [as forEach] (https://server/graph/public/build/angular~app.a28fb9668cf7589fa831.js:2:1796)
at w (https://server/graph/public/build/vendors~app.a28fb9668cf7589fa831.js:2:2335272)
at https://server/graph/public/build/angular~app.a28fb9668cf7589fa831.js:2:95662
at https://server/graph/public/build/angular~app.a28fb9668cf7589fa831.js:2:95800
at c.$digest (https://server/graph/public/build/angular~app.a28fb9668cf7589fa831.js:2:100969)
at c.$apply (https://server/graph/public/build/angular~app.a28fb9668cf7589fa831.js:2:103148)

After clicking on Reload on PMM Grafana, it goes into “500 Internal Server Error”. This is due to pmm-managed continue to crash and upgrade does not finished. PMM has been used since May 2020 with no issues with upgrades. Upgrades have been performed on regular monthly basis. I listed many of the error logs above for Grafana, pmm-agent, and pmm-managed.

1 Like

Unfortunately nothing is jumping out besides the possibility of under the hood customizations of alert manager possibly clashing with upgrade routines or pmm-agent failing with port already in use but I can’t see how either of them would get you to a failed grafana state (remote monitoring of nodes breaking or alert manager breaking, sure).

If you are wanting to get it back up and running fast you might try spinning up an new ami, stopping services on both new and old systems and scp /srv/* from old to the new instance (there are permissions changes needed too I think which you can find in the backup/restore section of the online docs). If that got you up you’d just need to reIP (or update dns) so your agents could find the server.

We do also offer paid support plans where you just need open a ticket and the support team will work on this until it’s fixed. The forums are community driven and our team does monitor them to help when they can but that’s rarely fast enough for mission critical.

1 Like

Actually. What is the output of supervisorctl status? I see failure to connect to Victoriametrics on port 9090 which could put grafana in a very bad mood!

Info might be in /srv/logs/victoriametrics.log that would help.

1 Like

Here is the current status of supervisorrctl status. I will check port 9090.

supervisorctl status all
alertmanager RUNNING pid 64009, uptime 0:07:06
clickhouse RUNNING pid 64002, uptime 0:07:06
cron RUNNING pid 64005, uptime 0:07:06
dbaas-controller RUNNING pid 64006, uptime 0:07:06
grafana RUNNING pid 64003, uptime 0:07:06
nginx RUNNING pid 64004, uptime 0:07:06
pmm-agent RUNNING pid 64012, uptime 0:07:06
pmm-managed STARTING
pmm-update-perform EXITED Feb 12 01:26 PM
pmm-update-perform-init EXITED Feb 12 01:25 PM
postgresql RUNNING pid 64001, uptime 0:07:06
prometheus EXITED Feb 12 01:25 PM
qan-api2 RUNNING pid 64091, uptime 0:07:04
victoriametrics RUNNING pid 64007, uptime 0:07:06
vmalert RUNNING pid 83502, uptime 0:05:38

pmm-managed keeps restarting. My concern there is some kind of bug for the upgrade from 2.25.0 to 2.26.00. 9 Servers were upgraded for PMM and 2 were failures. All of them are setup similar setup and all AWS Marketplace images.

1 Like

I’m seeing lots of errors for pmm-agent and nginx. nginx has lots of errors and maybe this is causing connections issues for Grafana. I don’t see these issues with healthy PMM servers.

127.0.0.1 - - [12/Feb/2022:13:40:06 +0000] “POST /agent.Agent/Connect HTTP/2.0” 200 23 “-” “pmm-agent/2.26.0 grpc-go/1.32.0” “-”
2022/02/12 13:40:06 [error] 64030#64030: *568 connect() failed (111: Connection refused) while connecting to upstream, client: 127.0.0.1, server: _, request: “POST /agent.Agent/Connect HTTP/2.0”, subrequest: “/auth_request”, upstream: “http://127.0.0.1:7772/auth_request”, host: “127.0.0.1”
2022/02/12 13:40:06 [error] 64030#64030: *568 auth request unexpected status: 502 while sending to client, client: 127.0.0.1, server: _, request: “POST /agent.Agent/Connect HTTP/2.0”, host: “127.0.0.1”
127.0.0.1 - - [12/Feb/2022:13:40:06 +0000] “POST /agent.Agent/Connect HTTP/2.0” 500 170 “-” “pmm-agent/2.26.0 grpc-go/1.32.0” “-”
2022/02/12 13:40:07 [error] 64030#64030: *570 connect() failed (111: Connection refused) while connecting to upstream, client: 127.0.0.1, server: _, request: “POST /agent.Agent/Connect HTTP/2.0”, subrequest: “/auth_request”, upstream: “http://127.0.0.1:7772/auth_request”, host: “127.0.0.1”
2022/02/12 13:40:07 [error] 64030#64030: *570 auth request unexpected status: 502 while sending to client, client: 127.0.0.1, server: _, request: “POST /agent.Agent/Connect HTTP/2.0”, host: “127.0.0.1”
127.0.0.1 - - [12/Feb/2022:13:40:07 +0000] “POST /agent.Agent/Connect HTTP/2.0” 500 170 “-” “pmm-agent/2.26.0 grpc-go/1.32.0” “-”
2022/02/12 13:40:09 [error] 64030#64030: *572 connect() failed (111: Connection refused) while connecting to upstream, client: 127.0.0.1, server: _, request: “POST /agent.Agent/Connect HTTP/2.0”, subrequest: “/auth_request”, upstream: “http://127.0.0.1:7772/auth_request”, host: “127.0.0.1”
2022/02/12 13:40:09 [error] 64030#64030: *572 auth request unexpected status: 502 while sending to client, client: 127.0.0.1, server: _, request: “POST /agent.Agent/Connect HTTP/2.0”, host: “127.0.0.1”
127.0.0.1 - - [12/Feb/2022:13:40:09 +0000] “POST /agent.Agent/Connect HTTP/2.0” 500 170 “-” “pmm-agent/2.26.0 grpc-go/1.32.0” “-”
2022/02/12 13:40:11 [error] 64030#64030: *574 connect() failed (111: Connection refused) while connecting to upstream, client: 127.0.0.1, server: _, request: “POST /agent.Agent/Connect HTTP/2.0”, subrequest: “/auth_request”, upstream: “http://127.0.0.1:7772/auth_request”, host: “127.0.0.1”
2022/02/12 13:40:11 [error] 64030#64030: *574 auth request unexpected status: 502 while sending to client, client: 127.0.0.1, server: _, request: “POST /agent.Agent/Connect HTTP/2.0”, host: “127.0.0.1”
127.0.0.1 - - [12/Feb/2022:13:40:11 +0000] “POST /agent.Agent/Connect HTTP/2.0” 500 170 “-” “pmm-agent/2.26.0 grpc-go/1.32.0” “-”
2022/02/12 13:40:15 [error] 64030#64030: *576 connect() failed (111: Connection refused) while connecting to upstream, client: 127.0.0.1, server: _, request: “POST /agent.Agent/Connect HTTP/2.0”, subrequest: “/auth_request”, upstream: “http://127.0.0.1:7772/auth_request”, host: “127.0.0.1”
2022/02/12 13:40:15 [error] 64030#64030: *576 auth request unexpected status: 502 while sending to client, client: 127.0.0.1, server: _, request: “POST /agent.Agent/Connect HTTP/2.0”, host: “127.0.0.1”
127.0.0.1 - - [12/Feb/2022:13:40:15 +0000] “POST /agent.Agent/Connect HTTP/2.0” 500 170 “-” “pmm-agent/2.26.0 grpc-go/1.32.0” “-”
10.117.13.62 - - [12/Feb/2022:13:40:17 +0000] “GET / HTTP/1.1” 400 248 “-” “ELB-HealthChecker/2.0” “-”
2022/02/12 13:40:19 [error] 64030#64030: *579 connect() failed (111: Connection refused) while connecting to upstream, client: 127.0.0.1, server: _, request: “POST /agent.Agent/Connect HTTP/2.0”, subrequest: “/auth_request”, upstream: “http://127.0.0.1:7772/auth_request”, host: “127.0.0.1”
2022/02/12 13:40:19 [error] 64030#64030: *579 auth request unexpected status: 502 while sending to client, client: 127.0.0.1, server: _, request: “POST /agent.Agent/Connect HTTP/2.0”, host: “127.0.0.1”
127.0.0.1 - - [12/Feb/2022:13:40:19 +0000] “POST /agent.Agent/Connect HTTP/2.0” 500 170 “-” “pmm-agent/2.26.0 grpc-go/1.32.0” “-”
2022/02/12 13:40:26 [error] 64030#64030: *581 upstream prematurely closed connection while reading upstream, client: 127.0.0.1, server: _, request: “POST /agent.Agent/Connect HTTP/2.0”, upstream: “grpc://127.0.0.1:7771”, host: “127.0.0.1”
127.0.0.1 - - [12/Feb/2022:13:40:26 +0000] “POST /agent.Agent/Connect HTTP/2.0” 200 23 “-” “pmm-agent/2.26.0 grpc-go/1.32.0” “-”
2022/02/12 13:40:26 [error] 64030#64030: *584 connect() failed (111: Connection refused) while connecting to upstream, client: 127.0.0.1, server: _, request: “POST /agent.Agent/Connect HTTP/2.0”, subrequest: “/auth_request”, upstream: “http://127.0.0.1:7772/auth_request”, host: “127.0.0.1”
2022/02/12 13:40:26 [error] 64030#64030: *584 auth request unexpected status: 502 while sending to client, client: 127.0.0.1, server: _, request: “POST /agent.Agent/Connect HTTP/2.0”, host: “127.0.0.1”
127.0.0.1 - - [12/Feb/2022:13:40:26 +0000] “POST /agent.Agent/Connect HTTP/2.0” 500 170 “-” “pmm-agent/2.26.0 grpc-go/1.32.0” “-”
2022/02/12 13:40:27 [error] 64030#64030: *586 connect() failed (111: Connection refused) while connecting to upstream, client: 127.0.0.1, server: _, request: “POST /agent.Agent/Connect HTTP/2.0”, subrequest: “/auth_request”, upstream: “http://127.0.0.1:7772/auth_request”, host: “127.0.0.1”
2022/02/12 13:40:27 [error] 64030#64030: *586 auth request unexpected status: 502 while sending to client, client: 127.0.0.1, server: _, request: “POST /agent.Agent/Connect HTTP/2.0”, host: “127.0.0.1”
127.0.0.1 - - [12/Feb/2022:13:40:27 +0000] “POST /agent.Agent/Connect HTTP/2.0” 500 170 “-” “pmm-agent/2.26.0 grpc-go/1.32.0” “-”
2022/02/12 13:40:28 [error] 64030#64030: *588 connect() failed (111: Connection refused) while connecting to upstream, client: 127.0.0.1, server: _, request: “POST /agent.Agent/Connect HTTP/2.0”, subrequest: “/auth_request”, upstream: “http://127.0.0.1:7772/auth_request”, host: “127.0.0.1”
2022/02/12 13:40:28 [error] 64030#64030: *588 auth request unexpected status: 502 while sending to client, client: 127.0.0.1, server: _, request: “POST /agent.Agent/Connect HTTP/2.0”, host: “127.0.0.1”
127.0.0.1 - - [12/Feb/2022:13:40:28 +0000] “POST /agent.Agent/Connect HTTP/2.0” 500 170 “-” “pmm-agent/2.26.0 grpc-go/1.32.0” “-”
2022/02/12 13:40:31 [error] 64029#64029: *590 connect() failed (111: Connection refused) while connecting to upstream, client: 127.0.0.1, server: _, request: “POST /agent.Agent/Connect HTTP/2.0”, subrequest: “/auth_request”, upstream: “http://127.0.0.1:7772/auth_request”, host: “127.0.0.1”
2022/02/12 13:40:31 [error] 64029#64029: *590 auth request unexpected status: 502 while sending to client, client: 127.0.0.1, server: _, request: “POST /agent.Agent/Connect HTTP/2.0”, host: “127.0.0.1”
127.0.0.1 - - [12/Feb/2022:13:40:31 +0000] “POST /agent.Agent/Connect HTTP/2.0” 500 170 “-” “pmm-agent/2.26.0 grpc-go/1.32.0” “-”
10.117.9.127 - - [12/Feb/2022:13:40:33 +0000] “GET / HTTP/1.1” 400 248 “-” “ELB-HealthChecker/2.0” “-”
2022/02/12 13:40:35 [error] 64030#64030: *593 connect() failed (111: Connection refused) while connecting to upstream, client: 127.0.0.1, server: _, request: “POST /agent.Agent/Connect HTTP/2.0”, subrequest: “/auth_request”, upstream: “http://127.0.0.1:7772/auth_request”, host: “127.0.0.1”
2022/02/12 13:40:35 [error] 64030#64030: *593 auth request unexpected status: 502 while sending to client, client: 127.0.0.1, server: _, request: “POST /agent.Agent/Connect HTTP/2.0”, host: “127.0.0.1”
127.0.0.1 - - [12/Feb/2022:13:40:35 +0000] “POST /agent.Agent/Connect HTTP/2.0” 500 170 “-” “pmm-agent/2.26.0 grpc-go/1.32.0” “-”

1 Like

pmm-agent then is having two-way communication issues which is point back to port 443. It seems like connection problems might be with nginx and pmm-agent.

INFO[2022-02-12T13:44:21.555+00:00] Connecting to https://127.0.0.1:443/ … component=client
INFO[2022-02-12T13:44:21.558+00:00] Connected to 127.0.0.1:443. component=client
INFO[2022-02-12T13:44:21.558+00:00] Establishing two-way communication channel … component=client
INFO[2022-02-12T13:44:21.564+00:00] Two-way communication channel established in 6.411314ms. Estimated clock drift: 2.53919ms. component=client
INFO[2022-02-12T13:44:22.101+00:00] Connection closed. component=client
INFO[2022-02-12T13:44:22.101+00:00] Done. component=actions-runner
INFO[2022-02-12T13:44:22.101+00:00] Done. component=supervisor
INFO[2022-02-12T13:44:22.101+00:00] Stopped. component=local-server/JSON
INFO[2022-02-12T13:44:22.101+00:00] Done. component=client
INFO[2022-02-12T13:44:22.103+00:00] Done. component=local-server
INFO[2022-02-12T13:44:22.103+00:00] Loading configuration file /usr/local/percona/pmm2/config/pmm-agent.yaml. component=main
INFO[2022-02-12T13:44:22.104+00:00] Using /usr/local/percona/pmm2/exporters/node_exporter component=main
INFO[2022-02-12T13:44:22.104+00:00] Using /usr/local/percona/pmm2/exporters/mysqld_exporter component=main
INFO[2022-02-12T13:44:22.104+00:00] Using /usr/local/percona/pmm2/exporters/mongodb_exporter component=main
INFO[2022-02-12T13:44:22.104+00:00] Using /usr/local/percona/pmm2/exporters/postgres_exporter component=main
INFO[2022-02-12T13:44:22.104+00:00] Using /usr/local/percona/pmm2/exporters/proxysql_exporter component=main
INFO[2022-02-12T13:44:22.104+00:00] Using /usr/local/percona/pmm2/exporters/rds_exporter component=main
INFO[2022-02-12T13:44:22.104+00:00] Using /usr/local/percona/pmm2/exporters/azure_exporter component=main
INFO[2022-02-12T13:44:22.104+00:00] Using /usr/local/percona/pmm2/exporters/vmagent component=main
INFO[2022-02-12T13:44:22.104+00:00] Starting… component=client
INFO[2022-02-12T13:44:22.104+00:00] Starting local API server on http://127.0.0.1:7777/ … component=local-server/JSON
INFO[2022-02-12T13:44:22.104+00:00] Connecting to https://127.0.0.1:443/ … component=client
INFO[2022-02-12T13:44:22.105+00:00] Started. component=local-server/JSON
INFO[2022-02-12T13:44:22.107+00:00] Connected to 127.0.0.1:443. component=client
INFO[2022-02-12T13:44:22.107+00:00] Establishing two-way communication channel … component=client
ERRO[2022-02-12T13:44:22.107+00:00] Failed to establish two-way communication channel: Internal Server Error: HTTP status code 500; transport: received the unexpected content-type “text/html”. component=client
INFO[2022-02-12T13:44:23.326+00:00] Connecting to https://127.0.0.1:443/ … component=client
INFO[2022-02-12T13:44:23.329+00:00] Connected to 127.0.0.1:443. component=client
INFO[2022-02-12T13:44:23.329+00:00] Establishing two-way communication channel … component=client
ERRO[2022-02-12T13:44:23.329+00:00] Failed to establish two-way communication channel: Internal Server Error: HTTP status code 500; transport: received the unexpected content-type “text/html”. component=client
INFO[2022-02-12T13:44:24.627+00:00] Connecting to https://127.0.0.1:443/ … component=client
INFO[2022-02-12T13:44:24.630+00:00] Connected to 127.0.0.1:443. component=client
INFO[2022-02-12T13:44:24.630+00:00] Establishing two-way communication channel … component=client
ERRO[2022-02-12T13:44:24.631+00:00] Failed to establish two-way communication channel: Internal Server Error: HTTP status code 500; transport: received the unexpected content-type “text/html”. component=client
INFO[2022-02-12T13:44:27.129+00:00] Connecting to https://127.0.0.1:443/ … component=client
INFO[2022-02-12T13:44:27.132+00:00] Connected to 127.0.0.1:443. component=client
INFO[2022-02-12T13:44:27.132+00:00] Establishing two-way communication channel … component=client
ERRO[2022-02-12T13:44:27.133+00:00] Failed to establish two-way communication channel: Internal Server Error: HTTP status code 500; transport: received the unexpected content-type “text/html”. component=client
INFO[2022-02-12T13:44:30.943+00:00] Connecting to https://127.0.0.1:443/ … component=client
INFO[2022-02-12T13:44:30.947+00:00] Connected to 127.0.0.1:443. component=client
INFO[2022-02-12T13:44:30.948+00:00] Establishing two-way communication channel … component=client
ERRO[2022-02-12T13:44:30.949+00:00] Failed to establish two-way communication channel: Internal Server Error: HTTP status code 500; transport: received the unexpected content-type “text/html”. component=client
INFO[2022-02-12T13:44:35.886+00:00] Connecting to https://127.0.0.1:443/ … component=client
INFO[2022-02-12T13:44:35.889+00:00] Connected to 127.0.0.1:443. component=client
INFO[2022-02-12T13:44:35.889+00:00] Establishing two-way communication channel … component=client
ERRO[2022-02-12T13:44:35.889+00:00] Failed to establish two-way communication channel: Internal Server Error: HTTP status code 500; transport: received the unexpected content-type “text/html”. component=client

1 Like

For now it looks like a problem with alertmanager configuration. Because of that pmm server restarts over and over again. Please validate your alertmanager base config with this command: amtool check-config /srv/alertmanager/alertmanager.base.yml. If the base config is valid then try to comment/remove your custom changes.

1 Like

Hello - I changed the default of /srv/alertmanager/alertmanager.base.yml to following and removed custom settings.

route:
receiver: empty
routes:

receivers:
- name: empty

Here is the statement ran for amtool check-config /srv/alertmanager/alertmanager.base.yml.
Checking ‘/srv/alertmanager/alertmanager.base.yml’ SUCCESS
Found:

  • global config
  • route
  • 0 inhibit rules
  • 1 receivers
  • 0 templates

I then rebooted the server with similar issues with PMM continue restart over and over for pmm-managed which is impacting the application to restart for User interface.

1 Like

Here are the 3 logs from /srv/logs/victoriametrics.log. I’m seeing lots of scraping issues.

2022-02-15T10:58:27.680Z error /home/builder/rpm/BUILD/VictoriaMetrics-pmm-6401-v1.60.0/lib/promscrape/scrapework.go:258 error when scraping “http://127.0.0.1:42151/metrics?collect[]=custom_query.mr&collect[]=engine_innodb_status&collect[]=info_schema.innodb_cmp&collect[]=info_schema.innodb_cmpmem&collect[]=info_schema.processlist&collect[]=info_schema.query_response_time&collect[]=perf_schema.eventswaits&collect[]=perf_schema.file_events&collect[]=perf_schema.tablelocks&collect[]=slave_status” from job “mysqld_exporter_agent_id_a679778e-2e38-41c5-bed8-3e3f4593adee_mr-10s” with labels {agent_id=“/agent_id/a679778e-2e38-41c5-bed8-3e3f4593adee”,agent_type=“mysqld_exporter”,az=“us-east-1b”,cluster=“dine-admin-tool-mariadb-latest.cguqqmi4ku27.us-east-1.rds.amazonaws.com”,environment=“latest”,instance=“/agent_id/a679778e-2e38-41c5-bed8-3e3f4593adee”,job=“mysqld_exporter_agent_id_a679778e-2e38-41c5-bed8-3e3f4593adee_mr-10s”,key1=“cgs”,key2=“wdprrevmgmtdev”,key3=“yes”,key4=“1”,node_id=“/node_id/50ed4534-0f1d-40ac-bab0-1406c0ac44d5”,node_model=“dinerdsmariadbserver.account.us-east-1.rds.amazonaws.com”,node_name=“dine-admin”,node_type=“remote_rds”,region=“us-east-1”,replication_set=“rdsmariadb”,service_id=“/service_id/5e21aa69-8de4-4211-a33c-f5ea6ff81122”,service_name=“dine-admin-latest”,service_type=“mysql”}: cannot read data: cannot scrape “http://127.0.0.1:42151/metrics?collect[]=custom_query.mr&collect[]=engine_innodb_status&collect[]=info_schema.innodb_cmp&collect[]=info_schema.innodb_cmpmem&collect[]=info_schema.processlist&collect[]=info_schema.query_response_time&collect[]=perf_schema.eventswaits&collect[]=perf_schema.file_events&collect[]=perf_schema.tablelocks&collect[]=slave_status”: Get “http://127.0.0.1:42151/metrics?collect[]=custom_query.mr&collect[]=engine_innodb_status&collect[]=info_schema.innodb_cmp&collect[]=info_schema.innodb_cmpmem&collect[]=info_schema.processlist&collect[]=info_schema.query_response_time&collect[]=perf_schema.eventswaits&collect[]=perf_schema.file_events&collect[]=perf_schema.tablelocks&collect[]=slave_status”: dial tcp4 127.0.0.1:42151: connect: connection refused; try -enableTCP6 command-line flag if you scrape ipv6 addresses
2022-02-15T10:58:27.681Z error /home/builder/rpm/BUILD/VictoriaMetrics-pmm-6401-v1.60.0/lib/promscrape/scrapework.go:258 error when scraping “http://127.0.0.1:42098/metrics?collect[]=custom_query.hr&collect[]=global_status&collect[]=info_schema.innodb_metrics&collect[]=standard.go&collect[]=standard.process” from job “mysqld_exporter_agent_id_65ba1e6d-a57e-4121-9521-9b448dd6ec02_hr-5s” with labels {agent_id=“/agent_id/65ba1e6d-a57e-4121-9521-9b448dd6ec02”,agent_type=“mysqld_exporter”,az=“[us-east-1a]”,cluster=“rdsmariadbserver.us-east-1.rds.amazonaws.com”,environment=“latest”,instance=“/agent_id/65ba1e6d-a57e-4121-9521-9b448dd6ec02”,job=“mysqld_exporter_agent_id_65ba1e6d-a57e-4121-9521-9b448dd6ec02_hr-5s”,key1=“dse”,key2=“dclappsdev”,key3=“yes”,key4=“1”,node_id=“/node_id/d4d0b27a-7eed-43be-9ee5-4f7bea79787d”,node_model=“rdsmariadbserver.account.us-east-1.rds.amazonaws.com”,node_name=“mariadb-latest”,node_type=“remote_rds”,region=“us-east-1”,replication_set=“rdsma”,service_id=“/service_id/c170eeea-1aed-4a0f-9d03-5f8da4d821d4”,service_name=“docdb-mariadb-latest”,service_type=“mysql”}: cannot read data: cannot scrape “http://127.0.0.1:42098/metrics?collect[]=custom_query.hr&collect[]=global_status&collect[]=info_schema.innodb_metrics&collect[]=standard.go&collect[]=standard.process”: Get “http://127.0.0.1:42098/metrics?collect[]=custom_query.hr&collect[]=global_status&collect[]=info_schema.innodb_metrics&collect[]=standard.go&collect[]=standard.process”: dial tcp4 127.0.0.1:42098: connect: connection refused; try -enableTCP6 command-line flag if you scrape ipv6 addresses
2022-02-15T10:58:27.702Z error /home/builder/rpm/BUILD/VictoriaMetrics-pmm-6401-v1.60.0/lib/promscrape/scrapework.go:258 error when scraping “http://127.0.0.1:42223/metrics?collect[]=custom_query.hr&collect[]=global_status&collect[]=info_schema.innodb_metrics&collect[]=standard.go&collect[]=standard.process” from job “mysqld_exporter_agent_id_a01322b8-e0c0-4658-8f32-65be69ef8a15_hr-5s” with labels {agent_id=“/agent_id/a01322b8-e0c0-4658-8f32-65be69ef8a15”,agent_type=“mysqld_exporter”,az=“”,cluster=“auroramysqlserver.celzpl4l1tus.us-east-1.rds.amazonaws.com”,environment=“latest”,instance=“/agent_id/a01322b8-e0c0-4658-8f32-65be69ef8a15”,job=“mysqld_exporter_agent_id_a01322b8-e0c0-4658-8f32-65be69ef8a15_hr-5s”,key1=“peeti”,key2=“wdprattractionsdev”,key3=“yes”,key4=“1”,node_id=“/node_id/97005e56-7e7a-433a-bfb9-d9e322357428”,node_model=“swgsgrafana-aurora-mysql-latest-reader-1.celzpl4l1tus.us-east-1.rds.amazonaws.com”,node_name=“auroramysql”,node_type=“remote_rds”,region=“us-east-1”,replication_set=“auroramysql”,service_id=“/service_id/d1b3634e-d85f-47fa-9c3f-68254c73cc29”,service_name=“swgsgrafana-aurora-mysql-latest-reader-1”,service_type=“mysql”}: cannot read data: cannot scrape “http://127.0.0.1:42223/metrics?collect[]=custom_query.hr&collect[]=global_status&collect[]=info_schema.innodb_metrics&collect[]=standard.go&collect[]=standard.process”: Get “http://127.0.0.1:42223/metrics?collect[]=custom_query.hr&collect[]=global_status&collect[]=info_schema.innodb_metrics&collect[]=standard.go&collect[]=standard.process”: dial tcp4 127.0.0.1:42223: connect: connection refused; try -enableTCP6 command-line flag if you scrape ipv6 addresses

1 Like