Grafana API keys go invalid intermittently

Grigoriy_Frolov · December 4, 2023, 2:56pm

Description:

Dear Percona community,
I see the strange issues in PMM-server 2.39. Sometimes I see tons of messages like this:

logger=context t=2023-12-04T14:34:40.147436769Z level=error msg="invalid API key" error="invalid API key" traceID=
logger=context userId=0 orgId=0 uname= t=2023-12-04T14:34:40.148321642Z level=info msg="Request Completed" method=GET path=/api/auth/key status=401 remote_addr=127.0.0.1 time_ms=1 duration=1.154743ms size=67 referer=

When it happens, Grafana is out of service as it can’t handle the frequent recurring requests from the agents. top shows that Grafana consumes 100 to 200 %CPU. To mitigate the issues I had to edit the configuration file /usr/local/percona/pmm2/config/pmm-agent.yaml and change .server.password to plain PMM admin password instead of base64-encoded API KEY. That had to be done for each of about 150 agents I have.

This kind of incident repeated several times with new agents I registered. I couldn’t find what caused the API keys to go invalid, but I would like to prevent those situations from happening in the future.
So could you please help me answer the following questions?

What can cause Grafana API tokens to go invalid?
Which metric expressions could be used for alerting on the cases when pmm-agent has an invalid API key and is unable to connect to the server?
Should I completely give up using API keys and get back to using the plain admin password in pmm-agent configuration?

Steps to Reproduce:

Deploy pmm-server into Kubernetes with /srv directory mounted to persistent storage.
Register multiple agents with /usr/sbin/pmm-admin config --server-url "https://${PMM_USERNAME}:${PMM_PASSWORD}@${PMM_HOST}:${PMM_PORT}" --force command.
Unclear what happens next.

Version:

pmm-server 2.39
pmm-agent 2.39

nurlan · December 4, 2023, 3:19pm

Hello @Grigoriy_Frolov, please update to the latest PMM. This problem was caused because of SQLite used in PMM, we fixed it recently.

Which metric expressions could be used for alerting on the cases when pmm-agent has an invalid API key and is unable to connect to the server?
We have alert template for pmm-agent being down, so you can use it.
Should I completely give up using API keys and get back to using the plain admin password in pmm-agent configuration?
No, please don’t give up and use the latest PMM

Grigoriy_Frolov · December 5, 2023, 7:14am

Thanks Nurlan, I will update my PMM deployment to 2.40.1 today and watch.

Topic		Replies	Views
Grafana and PMM API PMM 2.x grafana	6	1035	October 26, 2023
AWS Marketplace PMM upgrade from 2.25.0 to 2.26.0 Percona Monitoring and Management (PMM)	23	2873	February 17, 2022
After upgrade to 2.39.0, the grafana takes too much CPU PMM 2.x	6	1367	September 27, 2023
Unable to recover pmm failed while upgrading 2.41.2 Percona Monitoring and Management (PMM) pmm	13	383	June 26, 2024
Grafana has failed to load Percona Monitoring and Management (PMM)	20	2737	October 15, 2023

Grafana API keys go invalid intermittently

Description:

Steps to Reproduce:

Version:

Related topics