Prometheus alert manager failed to start

Hi I’m trying to integrate Prometheus alertmanger with PMM 2.14 and following this document someone shared on this forum

However, Its failing to start the alertmanager . I don’t see any error but It shows that it failed when I look at the status

[root@xxxxx]# systemctl status alertmanager
● alertmanager.service - Alert Manager
Loaded: loaded (/usr/lib/systemd/system/alertmanager.service; enabled; vendor preset: disabled)
Active: failed (Result: start-limit) since Fri 2021-02-05 23:03:13 UTC; 12s ago
Process: 17571 ExecStart=/usr/local/bin/alertmanager --config.file=/etc/alertmanager/alertmanager.yml --storage.path=/data/alertmanager (code=exited, status=1/FAILURE)
Main PID: 17571 (code=exited, status=1/FAILURE)

Feb 05 23:03:13 xxx.systemd[1]: Unit alertmanager.service entered failed state.
Feb 05 23:03:13 xxx. systemd[1]: alertmanager.service failed.
Feb 05 23:03:13 xxx. systemd[1]: alertmanager.service holdoff time over, scheduling restart.
Feb 05 23:03:13 xxx.systemd[1]: Stopped Alert Manager.
Feb 05 23:03:13 xxx systemd[1]: start request repeated too quickly for alertmanager.service
Feb 05 23:03:13 xxx systemd[1]: Failed to start Alert Manager.
Feb 05 23:03:13 xxx systemd[1]: Unit alertmanager.service entered failed state.
Feb 05 23:03:13 xxx systemd[1]: alertmanager.service failed.

alertmanager.yml file

[root@xxx]# cat /etc/alertmanager/alertmanager.yml
global:
resolve_timeout: 5m

route:
group_by: [‘alertname’]
group_wait: 10s
group_interval: 10s
repeat_interval: 1h
receiver: ‘web.hook’
receivers:

  • name: ‘web.hook’
    webhook_configs:
    • url: ‘http://127.0.0.1:5001/
      inhibit_rules:
    • source_match:
      severity: ‘critical’
      target_match:
      severity: ‘warning’
      equal: [‘alertname’, ‘dev’, ‘instance’]

Please suggest !! Any help is appreciated !!

1 Like

Hi Deepthip,

You should verify your systemd unit file /etc/systemd/system/alertmanager.service

Here is an example of a unit file on my installation:

1 Like

Next command can be used for a validation.
systemd-analyze verify /etc/systemd/system/alertmanager.service

Also please notice that PMM 2.14.0 already has “Integrated Alerting” functionality.
It’s disabled by default.

1 Like

Thank adivinho, The issue was since I enable Integrated alerting , the port was already in use , so I had to use a different port. I was able to start it with a different port.

Like you said , I started off using Intergrated Alerting but heard from the support that its not GA and still a technical preview and is not ready for production servers. Even otherwise, I’m struggling to make all the alerts work with Integrated Alerting. Like for eg. I get alerted when mysql is down but not for OS level alerts like CPU, Memory etc. I’m using the built in templates for now .

Have you been using it ? if yes , how did it go for you ? Please suggest what I’m missing that I can’t get alerted for OS level alerts .

1 Like

There is an issue with alerts for symbol “less than”.
You may check if your alert rules are affected by next command
docker exec -it <pmm-server-docker-name> /usr/bin/grep -r lt /etc/ia/rules/

e.g.

So build-in template “Node out of memory” isn’t working due to the issue. But other node templates have to work.

1 Like

Understood . One more thing I forgot to mention above is I’m not using a docker. I’m using the AWS AMI provided from the market place and I’m monitoring Aurora MySQL clusters. So it has to get the OS related information via the cloudmetrics . I checked and I think enhanced monitoring is turned on for the RDS instances . Anything else I could be missing here ? Is Integrated alerting even supposed to be working for AWS RDS Instances being monitored by AWS AMI from market place ? So far I do not have any custom alerts . I’m still working around with Built in alerts

1 Like

You may check collected RDS metrics in Explore.
count by (__name__) ({__name__=~"rdsosmetrics_.*", node_name="rds-mysql57"})

We have got an RDS instance that is monitored by our PMM demo instance. Here is a link to the site.

1 Like

Got it ! Is this normal for the RDS hosts that are being monitored to have some graphs as no data returned . Please note that this node is being monitored and returned results from you query above .

1 Like

The reason I ask is , I still can’t get the CPU alerts no matter how low I set the threshold. So . just wondering if it has to do anything with the “CPU saturation and max core usage graph”

1 Like

You may use the attached alert template that is based on RDS metric node_cpu_average.
It can be added on the tab “Alert Rule Templates” and used for creating alert rule on the tab “Alert Rules”.

rds_cpu.txt (691 Bytes)

1 Like

Awesome !! This helped ! I was finally able to get the CPU alerts . Thanks much adivinho !!

On the same note ,can you suggest on the below ?

  1. So am I correct to assume that some of the in-built alert templates may not work for RDS mysql environments ?

  2. Can you please share any other alert templates for RDS, such as Out of memory, replication for MySQL etc .

  3. Can you point me to any sources explaining the RDS metrics that might be useful in building the alerts

  4. Finally how can I make these alerts appear in a more human readable format

Thanks again !!

1 Like

Mainly all metrics names are equal for rds exporter and node, mysqld exporters.
But there is an exception for CPU usage. I will create a ticket for making build-in template “Node high CPU load” compatible with RDS metrics.
Other templates use compatible expressions.

We have to wait the next release when the issue is solved. It’s pretty hard to invent an alert without using symbol “less then”.

You may find metric names and descriptions in next test data files for rds_exporter.

It will be changed in future releases.

1 Like

Understood ! Thanks adivinho

1 Like