cluster-components: Add alert for save code now pending requests
As per this board [1], which identifies the peak we had this saturday.
[1] https://grafana.softwareheritage.org/goto/b7cggP-Iz?orgId=1
helm diff
[cluster-components] Comparing changes between branches production and mr/add-alert-on-savecodenow...
Switched to branch 'production'
Your branch is up to date with 'origin/production'.
[cluster-components] Generate config in production branch for cluster-components/values/admin-rke2.yaml...
[cluster-components] Generate config in production branch for cluster-components/values/archive-production-rke2.yaml...
[cluster-components] Generate config in production branch for cluster-components/values/archive-staging-rke2.yaml...
[cluster-components] Generate config in production branch for cluster-components/values/gitlab-production.yaml...
[cluster-components] Generate config in production branch for cluster-components/values/gitlab-staging.yaml...
[cluster-components] Generate config in production branch for cluster-components/values/minikube.yaml...
[cluster-components] Generate config in production branch for cluster-components/values/rancher.yaml...
[cluster-components] Generate config in production branch for cluster-components/values/test-staging-rke2.yaml...
Switched to branch 'mr/add-alert-on-savecodenow'
[cluster-components] Generate config in mr/add-alert-on-savecodenow branch for cluster-components/values/admin-rke2.yaml...
[cluster-components] Generate config in mr/add-alert-on-savecodenow branch for cluster-components/values/archive-production-rke2.yaml...
[cluster-components] Generate config in mr/add-alert-on-savecodenow branch for cluster-components/values/archive-staging-rke2.yaml...
[cluster-components] Generate config in mr/add-alert-on-savecodenow branch for cluster-components/values/gitlab-production.yaml...
[cluster-components] Generate config in mr/add-alert-on-savecodenow branch for cluster-components/values/gitlab-staging.yaml...
[cluster-components] Generate config in mr/add-alert-on-savecodenow branch for cluster-components/values/minikube.yaml...
[cluster-components] Generate config in mr/add-alert-on-savecodenow branch for cluster-components/values/rancher.yaml...
[cluster-components] Generate config in mr/add-alert-on-savecodenow branch for cluster-components/values/test-staging-rke2.yaml...
------------- diff for cluster-components/values/admin-rke2.yaml -------------
_ __ __
_| |_ _ / _|/ _| between /tmp/swh-chart.cluster-components.fnCaeg3n/admin-rke2.yaml.before, 29 documents
/ _' | | | | |_| |_ and /tmp/swh-chart.cluster-components.fnCaeg3n/admin-rke2.yaml.after, 29 documents
| (_| | |_| | _| _|
\__,_|\__, |_| |_| returned no differences
|___/
------------- diff for cluster-components/values/archive-production-rke2.yaml -------------
_ __ __
_| |_ _ / _|/ _| between /tmp/swh-chart.cluster-components.fnCaeg3n/archive-production-rke2.yaml.before, 15 documents
/ _' | | | | |_| |_ and /tmp/swh-chart.cluster-components.fnCaeg3n/archive-production-rke2.yaml.after, 15 documents
| (_| | |_| | _| _|
\__,_|\__, |_| |_| returned one difference
|___/
spec.groups.swh-production.rules.rules (monitoring.coreos.com/v1/PrometheusRule/cattle-monitoring-system/swh-production.rules)
+ one list entry added:
- alert: SaveCodeNow_Is_Stale_In_Production
│ annotations:
│ │ description: "The save code now request status may be lagging for more than 5 minutes."
│ │ summary: "Please check the svix server on cluster {{ $labels.cluster_name }}."
│ expr: "avg_over_time(swh_web_submitted_save_requests{environment="production",status="pending"}[1h]) > 10"
│ for: 5m
│ labels:
│ │ severity: warning
│ │ namespace: cattle-monitoring-system
------------- diff for cluster-components/values/archive-staging-rke2.yaml -------------
_ __ __
_| |_ _ / _|/ _| between /tmp/swh-chart.cluster-components.fnCaeg3n/archive-staging-rke2.yaml.before, 15 documents
/ _' | | | | |_| |_ and /tmp/swh-chart.cluster-components.fnCaeg3n/archive-staging-rke2.yaml.after, 15 documents
| (_| | |_| | _| _|
\__,_|\__, |_| |_| returned one difference
|___/
spec.groups.swh-staging.rules.rules (monitoring.coreos.com/v1/PrometheusRule/cattle-monitoring-system/swh-staging.rules)
+ one list entry added:
- alert: SaveCodeNow_Is_Stale_In_Staging
│ annotations:
│ │ description: "The save code now request status may be lagging for more than 5 minutes."
│ │ summary: "Please check the svix server on cluster {{ $labels.cluster_name }}."
│ expr: "avg_over_time(swh_web_submitted_save_requests{environment="staging",status="pending"}[1h]) > 5"
│ for: 5m
│ labels:
│ │ severity: warning
│ │ namespace: cattle-monitoring-system
------------- diff for cluster-components/values/gitlab-production.yaml -------------
_ __ __
_| |_ _ / _|/ _| between /tmp/swh-chart.cluster-components.fnCaeg3n/gitlab-production.yaml.before
/ _' | | | | |_| |_ and /tmp/swh-chart.cluster-components.fnCaeg3n/gitlab-production.yaml.after
| (_| | |_| | _| _|
\__,_|\__, |_| |_| returned no differences
|___/
------------- diff for cluster-components/values/gitlab-staging.yaml -------------
_ __ __
_| |_ _ / _|/ _| between /tmp/swh-chart.cluster-components.fnCaeg3n/gitlab-staging.yaml.before
/ _' | | | | |_| |_ and /tmp/swh-chart.cluster-components.fnCaeg3n/gitlab-staging.yaml.after
| (_| | |_| | _| _|
\__,_|\__, |_| |_| returned no differences
|___/
------------- diff for cluster-components/values/minikube.yaml -------------
_ __ __
_| |_ _ / _|/ _| between /tmp/swh-chart.cluster-components.fnCaeg3n/minikube.yaml.before
/ _' | | | | |_| |_ and /tmp/swh-chart.cluster-components.fnCaeg3n/minikube.yaml.after
| (_| | |_| | _| _|
\__,_|\__, |_| |_| returned no differences
|___/
------------- diff for cluster-components/values/rancher.yaml -------------
_ __ __
_| |_ _ / _|/ _| between /tmp/swh-chart.cluster-components.fnCaeg3n/rancher.yaml.before
/ _' | | | | |_| |_ and /tmp/swh-chart.cluster-components.fnCaeg3n/rancher.yaml.after
| (_| | |_| | _| _|
\__,_|\__, |_| |_| returned no differences
|___/
------------- diff for cluster-components/values/test-staging-rke2.yaml -------------
_ __ __
_| |_ _ / _|/ _| between /tmp/swh-chart.cluster-components.fnCaeg3n/test-staging-rke2.yaml.before, 13 documents
/ _' | | | | |_| |_ and /tmp/swh-chart.cluster-components.fnCaeg3n/test-staging-rke2.yaml.after, 13 documents
| (_| | |_| | _| _|
\__,_|\__, |_| |_| returned no differences
|___/
Edited by Antoine R. Dumont