Hi All,
Last Friday, 2/10, my one mysql instance was down, but the integrated alert wasn’t shown any error message. By running the select statement in explore window, I saw the mysql service was done a few miniutes. Please see attached screenshot. The reason was the VM node crashed. The VM was transfered to another node.
My question is how I investigate why the alert is failed to show in PMM. I’m not show which component cause this problem. Agent? Alertmanager? or Email system?
Thank,
Dillon
Hi, the alert won’t fire because mysql_up metric is not 0 but simply not present for some time. You might want to experiment with the absent_over_time function for alerting.
1 Like
Hi Ivan,
Thank you for suggestion.
Thanks,
Dillon
Hi Ivan,
I have to confirm a question again. The absent_over_time works. The explore page shows the correct result. However, the alert rule isn’t fired.
My expression of the alert rule is “sum by (service_name,node_name) (mysql_up) ==0 or absent_over_time(sum by (service_name,node_name) (mysql_up))==1”
Could you please give me some advise again? How to investigate?
Thanks,
Dillon
Hi Dillon, what about something like this?
sum by (service_name,node_name) (mysql_up) ==0 or sum by (service_name,node_name) (absent_over_time(mysql_up))==1
Hello Ivan,
Thank you for help again.
Your query can return metrics data in explore. However, the same problem, the alert rule isn’t fired. I attached two screenshots. One is alert rule; the other is explore page. I hope I miss something.
On above picture, I indicate the node suffix “115” I expect to show up.
Thanks,
Dillon
Hi Dillon, the built-in alert only catches the case were mysql is down but the pmm-agent is still up and reporting metrics to the server. I was expecting the absent_over_time would catch the case were server is down but looks like it won’t work. This issue was already raised to the dev team so I suggest you vote on it: [PMM-9544] Alerting doesn't seem to be able to monitor for "Host Down" - Percona JIRA
Regards
Ivan
1 Like