Skip to content
Snippets Groups Projects

components/alerting: Create an alert for unreachable cassandra nodes

Merged Guillaume Samson requested to merge cassandra_svc_alerting into production
1 unresolved thread

Related to swh/infra/sysadm-environment#5026 (closed)

This template will deploy a PrometheusRule in staging and production to trigger an alert if a cassandra cluster node is unreachable.

ᐅ helm template -f values.yaml -f values/archive-production-rke2.yaml cassandra-alerting . | grep -A 12 groups 
  groups:
  - name: critical-cassandra-service.rules
    rules:
    - alert: Cassandra_Service_Degraded_In_Production
      annotations:
        description: "The {{ $labels.instance }} node is unreachable for more than 5 minutes. This node seems down."
        summary: "The {{ $labels.service }} is degraded. Please check the {{ $labels.instance }} status."
      expr: up{service="cassandra-servers-svc"} == 0
      for: 5m
      labels:
        severity: critical
        namespace: cattle-monitoring-system
ᐅ helm template -f values.yaml -f values/archive-production-rke2.yaml cassandra-alerting . | \
grep -A 12 groups | promtool check rules
Checking standard input
  SUCCESS: 1 rules found

Merge request reports

Loading
Loading

Activity

Filter activity
  • Approvals
  • Assignees & reviewers
  • Comments (from bots)
  • Comments (from users)
  • Commits & branches
  • Edits
  • Labels
  • Lock status
  • Mentions
  • Merge request status
  • Tracking
  • Vincent Sellier approved this merge request

    approved this merge request

  • Vincent Sellier
  • added 1 commit

    • 1ab6c384 - components/alerting: Create an alert for unreachable cassandra nodes

    Compare with previous version

  • Antoine R. Dumont approved this merge request

    approved this merge request

  • added 1 commit

    • a871c9d3 - components/alerting: Create an alert for unrepaired table size

    Compare with previous version

  • Guillaume Samson resolved all threads

    resolved all threads

  • Guillaume Samson added 4 commits

    added 4 commits

    • a871c9d3...58cf5d91 - 2 commits from branch production
    • dcd233f9 - components/alerting: Create an alert for unreachable cassandra nodes
    • bb2cb443 - components/alerting: Create an alert for unrepaired table size

    Compare with previous version

  • Please register or sign in to reply
    Loading