Hi folks,
I grabbed the docker image: percona/pmm-server, v2.4.0, then I played with it today for a while. Then, I setup the SMTP inside the docker container, I configured a MySQL server to report the PMM Server, etc. So far, so good.
Then, I have been stopping and starting the MySQL services and taking the time when it fires 1st alert, when it will send the recovered e-mail, and so.
I am playing with the “Evaluate every 1m For 0m”. It realizes in 2 minutes, more of less of the 1st MySQL service down situation, but after a few seconds I am starting the MySQL service, let it run for a few minutes, then I stop the MySQL service. It takes over 5 minutes to sent the 2nd alert letting me know the server is down.
The query I am using:
avg by (service_name) (mysql_global_status_uptime{service_name=~“service-mysql”}), and I setup like a month.
What I am trying to archive, if the condition “no data or on error”, send me the alert.
Also, I am not seeing a 2nd alert “like a reminder” service is down.
Any idea how to make it more resilient ?
The goal will be: if server/service goes down, then an e-mail should be sent almost immediately “on a 1 minute”, and perhaps after 2 minutes a reminder should be sent. After the MySQL service gets recovered, it should send an e-mail about it, AND if after a few seconds the services goes down “again”, it resets: and it sends a new service down email.
Cheers,
Here is a screen shot of when it realized I stopped and started the MySQL service. The recovery took 1-2 minutes, to alert but it took up to 5-6 minutes in realize the service went down. When I received the recovery alert, I waited between 30-60 secs to stop the MySQL service, and it took way to long to realize service wen down and alert.
Anybody know about this?
Im doing same thing to backup my mongodb. ım using cronjob to stop mongodb, start snapshot, start mongodb. I sensed that, after starting service pmm-agent could not get the state. I found a workaround with restarting pmm-agent service after starting mongodb. you can try with this way.
Ghan said: Im doing same thing to backup my mongodb. ım using cronjob to stop mongodb, start snapshot, start mongodb. I sensed that, after starting service pmm-agent could not get the state. I found a workaround with restarting pmm-agent service after starting mongodb. you can try with this way.
I am not sure if We are on the same page. The goal is the PMM alerts are constant. E.g: a minute after one of the services (MySQL, MongoDB) doesn’t send their data then PMM will send the alert “email”. I have stopped and started just to test the behavior of PMM and it is not consistent.
I think this is managed by the Notification Policy ‘Group Interval’ setting which defaults to 5m
“The waiting time to send a batch of new alerts for that group after the first notification was sent. Default 5 minutes.”