components/alerting: Create an alert for unreachable cassandra nodes
Related to swh/infra/sysadm-environment#5026 (closed)
This template will deploy a PrometheusRule
in staging and production to trigger an alert if a cassandra cluster node is unreachable.
ᐅ helm template -f values.yaml -f values/archive-production-rke2.yaml cassandra-alerting . | grep -A 12 groups
groups:
- name: critical-cassandra-service.rules
rules:
- alert: Cassandra_Service_Degraded_In_Production
annotations:
description: "The {{ $labels.instance }} node is unreachable for more than 5 minutes. This node seems down."
summary: "The {{ $labels.service }} is degraded. Please check the {{ $labels.instance }} status."
expr: up{service="cassandra-servers-svc"} == 0
for: 5m
labels:
severity: critical
namespace: cattle-monitoring-system
ᐅ helm template -f values.yaml -f values/archive-production-rke2.yaml cassandra-alerting . | \
grep -A 12 groups | promtool check rules
Checking standard input
SUCCESS: 1 rules found