swh/production: Activate autoscaling on the public r/o objstorage
Let's try the autoscaling on this "not critical" service. 2 replayers could be not enough if a couple of long requests are in progress in parallel.
helm diff
[swh] Comparing changes between branches production and objstorage-autoscaling (per environment)...
Your branch is up to date with 'origin/production'.
[swh] Generate config in production branch for environment staging, namespace swh...
[swh] Generate config in production branch for environment staging, namespace swh-cassandra...
[swh] Generate config in production branch for environment staging, namespace swh-cassandra-next-version...
[swh] Generate config in objstorage-autoscaling branch for environment staging...
[swh] Generate config in objstorage-autoscaling branch for environment staging...
[swh] Generate config in objstorage-autoscaling branch for environment staging...
Your branch is up to date with 'origin/production'.
[swh] Generate config in production branch for environment production, namespace swh...
[swh] Generate config in production branch for environment production, namespace swh-cassandra...
[swh] Generate config in production branch for environment production, namespace swh-cassandra-next-version...
[swh] Generate config in objstorage-autoscaling branch for environment production...
[swh] Generate config in objstorage-autoscaling branch for environment production...
[swh] Generate config in objstorage-autoscaling branch for environment production...
------------- diff for environment staging namespace swh -------------
No differences
------------- diff for environment staging namespace swh-cassandra -------------
No differences
------------- diff for environment staging namespace swh-cassandra-next-version -------------
No differences
------------- diff for environment production namespace swh -------------
No differences
------------- diff for environment production namespace swh-cassandra -------------
--- /tmp/swh-chart.swh.TOLvCdTG/production-swh-cassandra.before 2025-01-21 16:31:02.892365944 +0100
+++ /tmp/swh-chart.swh.TOLvCdTG/production-swh-cassandra.after 2025-01-21 16:31:03.460337208 +0100
@@ -23613,21 +23613,20 @@
# Source: swh/templates/objstorage/deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
namespace: swh-cassandra
name: objstorage-read-only
labels:
app: objstorage-read-only
spec:
revisionHistoryLimit: 2
- replicas: 2
selector:
matchLabels:
app: objstorage-read-only
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 1
template:
metadata:
labels:
@@ -34280,20 +34279,57 @@
triggers:
- type: kafka
metadata:
bootstrapServers: kafka1.internal.softwareheritage.org:9094,kafka2.internal.softwareheritage.org:9094,kafka3.internal.softwareheritage.org:9094,kafka4.internal.softwareheritage.org:9094
consumerGroup: swh-archive-prod-s3-content-replayer
lagThreshold: "1000"
offsetResetPolicy: earliest
authenticationRef:
name: keda-objstorage-replayer-s3-authentication
---
+# Source: swh/templates/objstorage/autoscaling.yaml
+apiVersion: keda.sh/v1alpha1
+kind: ScaledObject
+metadata:
+ name: objstorage-read-only-gunicorn-scaled-object
+ namespace: swh-cassandra
+ labels:
+ app: objstorage-read-only
+spec:
+ cooldownPeriod: 300
+ pollingInterval: 30
+ minReplicaCount: 2
+ maxReplicaCount: 10
+ scaleTargetRef:
+ apiVersion: apps/v1
+ kind: Deployment
+ name: objstorage-read-only
+ triggers:
+ - type: prometheus
+ metadata:
+ serverAddress: http://prometheus-operated.cattle-monitoring-system:9090
+ metricName: gunicorn_requests
+ threshold: "1"
+ # There is no environment when using the cluster's prometheus instance
+ #
+ # 1s of request time during 1s => 1 worker busy
+ # 10s of request time during 1s => 10 workers busy
+ # It's the closest we can get without the busy workers count
+ # the number of busy workers is divided by number of workers per pods and multiplied
+ # by a factor to have time to scale when the number of requests increase.
+ query: |
+ round(sum by (namespace, deployment) (rate(gunicorn_request_duration_sum{namespace="swh-cassandra", deployment="objstorage-read-only"}[2m]))
+ /
+ on(namespace,deployment) gunicorn_workers{namespace="swh-cassandra", deployment="objstorage-read-only"}
+ *
+ 1)
+---
# Source: swh/templates/storage/autoscaling.yaml
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
name: storage-cassandra-azure-readonly-gunicorn-scaled-object
namespace: swh-cassandra
labels:
app: storage-cassandra-azure-readonly
spec:
cooldownPeriod: 300