Upgraded from 2.12 to 2.13 now alerting via msteam does not work or through smpt

dipeshacharya · January 8, 2021, 6:09pm

Hi i have been using docker pmm2 version . recently i upgraded from version 2.12 to 2.13 and now our alerting via email/or microsoft teams does not work and it used to work before. I have couple of questions if you all can help me.

I just tested bringing one of the monitored mysql instances down. i see the alert getting generated in the alertmanager UI. however there is no email or msteam alert. Below is my configuration which use to work in earlier version before 2.13

[root@479c571a4535 alertmanager]# pwd
/srv/alertmanager
[root@479c571a4535 alertmanager]# cat alertmanager.base.yml
---
# You can edit this file; changes will be preserved.
global:
  smtp_smarthost: '<outlookaddreress'
  smtp_from: 'alertmanager@test.com'
route:
  receiver: 'prometheus-msteams'
  group_by: ['alertname','cluster']
  group_wait: 15s
  group_interval: 10m
  repeat_interval: 1h

  routes:
#  - match_re:
#      alertname: ^(Instance Down|MysqlDown)$
#    receiver: 'prometheus-msteams'
  - match_re:
      alertname: ^(Instance Down|MysqlDown)$
    receiver: 'opensourcedb-email'

receivers:
- name: opensourcedb-email
  email_configs:
    - to: 'emailaddress'
      send_resolved: true
- name: 'pagerduty'
  pagerduty_configs:
    - service_key: ad18da17b088d1d8
- name: 'prometheus-msteams'
  webhook_configs: # https://prometheus.io/docs/alerting/configuration/#webhook_config
    - send_resolved: true
      url: 'http://<ipaddresss/alertmanager' # the prometheus-msteams proxy

steve.hoffman · January 8, 2021, 9:54pm

I wont’ have time to reply in depth right now but perhaps can get you started on the right path.

It appears you found the embedded alertmanager we were actively working on wiring into PMM and set up alerts inside the container. Up until now, we’ve only supported an alertmanager instance external to PMM.

In 2.13.0 we released a preview of “Integrated alerting” which actually allows users to leverage that internal alertmanager instance! It’s possible we’ve changed something in the development process that broke your config BUT it’s also possible it’s because of how we released the preview. Take a look at the PMM Settings page under “advanced settings” and you’ll see that “integrated alerting” is off by default…you might just be able to turn it on and viola!

It’s also possible you will have to turn it on and reconfigure to use the UI to paste in your config between notification channels and alert rules. I hope this helps some but I have to run for a few hours and will check back in later to see if you made it anywhere and I’ll tinker a bit myself.

dipeshacharya · January 9, 2021, 12:00am

Hi Steve. I just enabled the integrated alerting but instead of sending the email which I have given for the notification channel. it is using msteam hooks and sending an alert in the Microsoft teams channel where we use to get alerts for the previous setup with version 2.12.

so just to put into context:-
I have 29 alert rules configured which looks are kept on pmm-server docker container location /srv/prometheus/rules/prom-rules.yml. when i use a promtool to check the rules its is good.

I have configured the hooks to send email or pagerduty inside docker container pmm-server under path /srv/alertmanager/alertmanager_base.yml. However, when I was testing I did not get any alerts in email. then I just went ahead and added hooks under /etc/alertmanager.yml which percona says do not edit. The reason why it was done was as the alertmanger was not picking up what I had for hooks. ALL these works I can see alerts coming in on the alert manager UI. But it does not send anything on my Teams channel which used to work before.

Couple of questions
can i copy my rules and add those to integrated alertmanager or i can only use what are predefined?

Am i missing something in configuration which obviously seems to be case that i am not receiving anything in teams channel and all other external exporter alerts?

i am thinking going back to 2.12 as my alerting rules works and i get notified in the email ch have been using pmm from version 1.16

steve.hoffman · January 11, 2021, 2:32pm

The way you’re currently using Alertmanager won’t work in 2.13.0 and probably won’t be supported for the next few releases as it’s not really the vanilla Alertmanager anymore, but more of an appliance version of it, you either need to use your own (external to pmm) alertmanager (there is an easy to get up and running docker container for it) and point PMM to it (PMM Settings → Alertmanager Integration) along with your complete config file or follow the new paradigm for the built-in Alertmanager:

We start with Communication Channels (currently supports Slack, SMTP, and Pagerduty with more options to come) which you configure after enabling the new Integrated Alerting (PMM Settings → Advanced Settings → Toggle Integrated Alerting (and optionally set “Public Address” so you can click a link that will resolve publicly if you so desire)

Next you can create notification templates although we’ve provided many by default. These are used to actually create the generic circumstance that an alert would be triggered (CPU over a threshold or MySQL down or Not enough Mongo instances running, etc).

From there the Alerts page uses those templates to create actual monitors with more robust filtering (so CPU over 90% when label = production is critical thus send to slack and pager duty as the example) but you can reuse the templates to create a warning for the same thing that just send an email to a group.

It IS possible to break your existing config into the individual pieces but I do not think the Teams connection will work yet (I’m going to continue tinkering though) but it may be simple enough to add another option.

You are probably better off sticking with 2.12.0 for the time being since you already have it working with the vanilla bundle of Alertmanager and we’ll continue with enhancements to the preview to get more functionality. Best thing you can do right now is submit an improvement request at https://jira.percona.com as we’re collecting feedback on this preview to see where the best places are we can invest our resources to make it a usable product.

And I’d be remiss if I didn’t say Thank You!!! for being such a loyal user! We’ll keep working to ensure you can stay one!

dipeshacharya · January 11, 2021, 7:41pm

Hi Steve,
I was able to set alerts and receive it through alertmanger /not the integrated alertmanager. I have also enabled the integrated alertmanager as well. Quick question

For the integrated alertmanger is it enabled by default to send alerts or i have to add one?
The Filter tab ( what is that used for?). i want to exclude some of the lables which comes when the alert is sent? how do we do that?
Also i was not able to find postgresql replication monitoring? can we monitor postgresql replication?
could you also show us how can we purge data that are older. sometimes we get issue with data filesystem filled up.

steve.hoffman · January 13, 2021, 1:18pm

Integrated alertmanager is off by default as it’s still a feature preview (early development to get feedback on our direction before we go too far and don’t meet the users needs) but that could change in a future release (From the home dashboard: PMM → PMM Settings → Advanced Settings and you’ll see it there).

If you’re talking about the filter column for integrated alerting that’s exactly what we intended that to be used for: label=production or label!=development or you can get a little more complex with regex too: environment!~“dev|test|qa” etc.

I checked with the team and at present there is not any monitoring for replication on postgres but there is a story in our backlog that’s should get at least a working prototype going.

And regarding data purging, it depends on what you mean? We have retention settings, which is 30 days by default. This is something you can set in the same tab you enable integrated alerting (from the home dashboard: PMM → PMM Settings → Advanced Settings tab). I do not think there is a way to selectively delete data but I’m curious at what size you’re seeing issues as we can certainly investigate bugs if you want to file a report at jira.percona.com against the pmm project.

dipeshacharya · January 15, 2021, 8:23pm

Thanks steve.

I was seeeing around 300 Gb of space used.
i have
20 postgresql services being monitored
3 proxysql
4 maraidb
and 18 nodes of cassandra

i was talking more about can filter be used to get only few labels instead if getting everything like shown below

Prometheus Alert (Firing)
Memory available for target is at 5.164414211940508%

  description
    Memory available for target  is at 5.164414211940508%
  
    agent_id
    /agent_id/0cde9a23-3e16-45e5-b25a-583fd
  
    agent_type
    node_exporter
  
    alertgroup
    OS LEVEL METRICS
  
    alertname
    MemoryAvailable
  
    instance
    /agent_id/0cde9a23-3e16-45e5-b25a-583fd80d5c45
  
    job
    node_exporter_agent_id_0cde9a23-3e16-45e5-b25a-583fd80d5c45_hr-5s
  
    machine_id
    /machine_id/5124ce34091d46c282d260807e25e351
  
    node_id
    /node_id/67b51f57-94fe-4941-8e32-426ddbbded83
  
    node_name
    ddb.dc.local
  
    node_type
    generic
  
    severity
    critical

i wil create a jira

Topic		Replies	Views
Integrated Alerting - Slack PMM 2.x	7	1382	October 27, 2021
Unable to receive alerts from mail/slack for PMM2 alerting PMM 2.x	10	1182	January 20, 2022
PMM 2.14 - Prometheus Alert manager PMM 2.x	8	2059	February 23, 2021
About alerting PMM 2.x	7	806	March 20, 2020
Integrated Alerting error to send email PMM 2.x	4	743	December 1, 2021

Upgraded from 2.12 to 2.13 now alerting via msteam does not work or through smpt

Related topics