components/alerting: Create an alert for unreachable cassandra nodes (!207) · Merge requests · Platform / Infrastructure / CI CD / Helm charts for swh packages

Guillaume Samson requested to merge cassandra_svc_alerting into production Nov 07, 2023

This template will deploy a PrometheusRule in staging and production to trigger an alert if a cassandra cluster node is unreachable.

ᐅ helm template -f values.yaml -f values/archive-production-rke2.yaml cassandra-alerting . | grep -A 12 groups 
  groups:
  - name: critical-cassandra-service.rules
    rules:
    - alert: Cassandra_Service_Degraded_In_Production
      annotations:
        description: "The {{ $labels.instance }} node is unreachable for more than 5 minutes. This node seems down."
        summary: "The {{ $labels.service }} is degraded. Please check the {{ $labels.instance }} status."
      expr: up{service="cassandra-servers-svc"} == 0
      for: 5m
      labels:
        severity: critical
        namespace: cattle-monitoring-system

ᐅ helm template -f values.yaml -f values/archive-production-rke2.yaml cassandra-alerting . | \
grep -A 12 groups | promtool check rules
Checking standard input
  SUCCESS: 1 rules found

components/alerting: Create an alert for unreachable cassandra nodes

Merge request reports