I have created my own alert template for replication oplog window.
---
templates:
- name: mongodb_replication_oplog_window_less_than_2_hours
version: 1
summary: Replication oplog window less than 2 hours
expr: |-
avg by (service_name,cluster) (mongodb_mongod_replset_oplog_head_timestamp-mongodb_mongod_replset_oplog_tail_timestamp) / 3600
< [[ .threshold ]]
params:
- name: threshold
summary: Time in seconds
unit: "s"
type: float
range: [0, 10000]
value: 7200
for: 5m
severity: critical
labels:
exporter_type: mongodb_exporter
channel: oncall
annotations:
summary: "Production | MongoDB | Instance: {{ $labels.service_name }} | Low Replication oplog Window"
message: "Replication oplog window less than 2 hours"
duration: "5 mins"
value: "{{ $value }}"
tag: replicaSet
The issue here is that, when a give mongo_service is in alerting state for sometime for this alert and another service also comes in the alerting state during this time, then the alert is being sent for the first service as well, this is causing repeated alerts. How can I resolve this?