PMM2 Alerting not working since upgrade to 2.31

I had an alert set up on MySQL Down, which sent me an email whenever a system went down. This has worked fine until fairly recently. I upgraded to 2.31 a week or so ago so I can only assume it’s related to this.

Today a system went down and I didn’t get an email.

I am unable to create new alerts - “Failed to save rule: Internal server error.; Internal server error.” on the webpage and:

logger=context traceID=00000000000000000000000000000000 userId=5 orgId=1 uname=psumner t=2022-11-02T13:29:27.62533991Z level=error msg="Error from access control system" error="could not resolve datasources:id:1: data source not found" accessErrorID=ACE2718563073

logger=context traceID=00000000000000000000000000000000 userId=5 orgId=1 uname=psumner t=2022-11-02T13:29:27.625439734Z level=info msg="Request Completed" method=GET path=/api/datasources/1 status=403 remote_addr=127.0.0.1 time_ms=2 duration=2.428558ms size=166 referer= traceID=00000000000000000000000000000000

This from grafana.log

Any clues where to start looking? I’ve tried various created and default templates, tried creating alerts in existing folder, and also created a new folder for the alerts. Same issue.

I’ve tried disabling and re-enabling Alerting in the settings menu and the problem persists.

[Edit]

Since writing this I’ve found the python script to migrate alerts. I’ve run it and…

[root@rsnpmm01 ~]# python3 alertrulemigration.py -uadmin -padmin
Request existing IA rules.
Request existing IA rules done.
There are no rules to migrate, exiting.
1 Like

hi where can i find this python file . it is in the server please give to me .py file location or where can i find thank you.

1 Like
1 Like

thank you for your answer but when i run this command in server ;
python ia_migration.py -u admin -p admin

I get an error in bellow

(python3: can’t open file ‘ia_migration.py’: [Errno 2] No such file or directory)

1 Like

@k3r1m you may need to do it like python3 /path/to/ia_migration.py -u admin - p admin

@psumner I’m posting this internally to see if the team that works on alerting can offer some help.

1 Like

Hello psumner,It might be helpful to know - you upgraded from what previous version to 2.31.0?

2 Likes

that’s the issue I do not know the path where is migration.py what can i find it ?

1 Like

ahh, you have to download it here:

and then run the command from the folder you downloaded the file to.

1 Like

Oh I thought I’d included that. Previous version was 2.30

1 Like

Is there anything I can do to completely reset the alerting system without defaulting everything else? It’s a real nuisance this not working.

I’ve since upgraded to 2.35 and still get the same error on the webpage, but now seeing this error in the pmm-managed.log:

^[[36mINFO^[[0m[2023-03-08T08:47:11.916+00:00] Starting RPC /alerting.v1.Alerting/CreateRule … ^[[36mrequest^[[0m=d160a25b-bd8d-11ed-ac03-000c29e5ef7c
^[[31mERRO^[[0m[2023-03-08T08:47:11.926+00:00] RPC /alerting.v1.Alerting/CreateRule done in 10.299066ms with unexpected error: status: 404, body: {“message”:“Data source not found”} ^[[31mrequest^[[0m=d160a25b-bd8d-11ed-ac03-000c29e5ef7c

Right I’ve done some more digging around and it looks like I’ve somehow ended up with a datasource that isn’t ID #1 which everything assumes it will be

[
{
“id”: 15,
“uid”: “ge_Z6m-Vk”,
“orgId”: 1,
“name”: “Alertmanager”,
“type”: “alertmanager”,
“typeName”: “Alertmanager”,
“typeLogoUrl”: “public/app/plugins/datasource/alertmanager/img/logo.svg”,
“access”: “proxy”,
“url”: “”,
“user”: “”,
“database”: “”,
“basicAuth”: false,
“isDefault”: false,
“jsonData”: {},
“readOnly”: false
},
{
“id”: 12,
“uid”: “PT9pF_Lnz”,
“orgId”: 1,
“name”: “ClickHouse”,
“type”: “vertamedia-clickhouse-datasource”,
“typeName”: “Altinity plugin for ClickHouse”,
“typeLogoUrl”: “public/plugins/vertamedia-clickhouse-datasource/img/altinity_logo.svg”,
“access”: “proxy”,
“url”: “http://127.0.0.1:8123”,
“user”: “”,
“database”: “”,
“basicAuth”: false,
“isDefault”: false,
“jsonData”: {
“keepCookies”: null
},
“readOnly”: true
},
{
“id”: 10,
“uid”: “0TrpFlYnk”,
“orgId”: 1,
“name”: “Metrics”,
“type”: “prometheus”,
“typeName”: “Prometheus”,
“typeLogoUrl”: “public/app/plugins/datasource/prometheus/img/prometheus_logo.svg”,
“access”: “proxy”,
“url”: “http://127.0.0.1:8430/prometheus/”,
“user”: “”,
“database”: “”,
“basicAuth”: false,
“isDefault”: true,
“jsonData”: {
“httpMethod”: “POST”,
“keepCookies”: null,
“timeInterval”: “1s”
},
“readOnly”: true
},
{
“id”: 11,
“uid”: “ao9tFlY7z”,
“orgId”: 1,
“name”: “PostgreSQL”,
“type”: “postgres”,
“typeName”: “PostgreSQL”,
“typeLogoUrl”: “public/app/plugins/datasource/postgres/img/postgresql_logo.svg”,
“access”: “proxy”,
“url”: “localhost:5432”,
“user”: “postgres”,
“database”: “pmm-managed”,
“basicAuth”: false,
“isDefault”: false,
“jsonData”: {
“postgresVersion”: “1100”,
“sslmode”: “disable”
},
“readOnly”: true
},
{
“id”: 14,
“uid”: “uo9tK_Ynz”,
“orgId”: 1,
“name”: “Prometheus AlertManager”,
“type”: “camptocamp-prometheus-alertmanager-datasource”,
“typeName”: “Prometheus AlertManager”,
“typeLogoUrl”: “public/plugins/camptocamp-prometheus-alertmanager-datasource/img/json-logo.svg”,
“access”: “proxy”,
“url”: “http://localhost:9093/alertmanager/”,
“user”: “”,
“database”: “”,
“basicAuth”: false,
“isDefault”: false,
“jsonData”: {
“keepCookies”: null
},
“readOnly”: true
},
{
“id”: 13,
“uid”: “8orpKlYnk”,
“orgId”: 1,
“name”: “PTSummary”,
“type”: “pmm-pt-summary-datasource”,
“typeName”: “PTSummary”,
“typeLogoUrl”: “public/img/icn-datasource.svg”,
“access”: “proxy”,
“url”: “”,
“user”: “”,
“database”: “”,
“basicAuth”: false,
“isDefault”: false,
“jsonData”: {},
“readOnly”: true
}
]

I’ve mangled the Python script to copy rules and hard-coded rule 10 to be the source and blow me did that almost work.

Request existing IA rules.
Request existing IA rules done.
Found rules: 2.
Create alert group for migrated rules.
Create alert group for migrated rules done.
Get datasource UID.
Get datasource UID done.
Convert IA rules.
Convert IA rules done.
Send request to create migrated alerts.
Send request to create migrated alerts done.
Server response:
{“message”:“rule group updated successfully”}

This is great, I’ve got my two previous alerts showing up - but I still can’t create new ones. Presumably datasource 1 is hardcoded somewhere.

How do I fix this?

One final update from me - I’ve worked out that I can create Grafana managed alerts so I have been copying details from the PMM alert templates to the Grafana Managed Alert pages.

I can at least get things going again even if it’s not ideal.

Final final final update - as it’s now working.

# sqlite3 /srv/grafana/grafana.db
[…]
sqlite> select * from data_source;

10|1|1|prometheus|Metrics|proxy|http://127.0.0.1:8430/prometheus/||||0|||1|{“httpMethod”:“POST”,“keepCookies”:null,“timeInterval”:“1s”}|2022-03-09 15:01:14|2023-03-16 14:23:43|0|{}|1|0TrpFlYnk

sqlite> .schema data_source

[…]

sqlite> update data_source set id=1 where id=10;

I can now create PMM-Managed alerts using the UI. Everything else seems to have carried on working.