Skip to content

swh/production: Activate autoscaling on the public r/o objstorage

Vincent Sellier requested to merge objstorage-autoscaling into production

Let's try the autoscaling on this "not critical" service. 2 replayers could be not enough if a couple of long requests are in progress in parallel.

helm diff
[swh] Comparing changes between branches production and objstorage-autoscaling (per environment)...
Your branch is up to date with 'origin/production'.
[swh] Generate config in production branch for environment staging, namespace swh...
[swh] Generate config in production branch for environment staging, namespace swh-cassandra...
[swh] Generate config in production branch for environment staging, namespace swh-cassandra-next-version...
[swh] Generate config in objstorage-autoscaling branch for environment staging...
[swh] Generate config in objstorage-autoscaling branch for environment staging...
[swh] Generate config in objstorage-autoscaling branch for environment staging...
Your branch is up to date with 'origin/production'.
[swh] Generate config in production branch for environment production, namespace swh...
[swh] Generate config in production branch for environment production, namespace swh-cassandra...
[swh] Generate config in production branch for environment production, namespace swh-cassandra-next-version...
[swh] Generate config in objstorage-autoscaling branch for environment production...
[swh] Generate config in objstorage-autoscaling branch for environment production...
[swh] Generate config in objstorage-autoscaling branch for environment production...


------------- diff for environment staging namespace swh -------------

No differences


------------- diff for environment staging namespace swh-cassandra -------------

No differences


------------- diff for environment staging namespace swh-cassandra-next-version -------------

No differences


------------- diff for environment production namespace swh -------------

No differences


------------- diff for environment production namespace swh-cassandra -------------

--- /tmp/swh-chart.swh.TOLvCdTG/production-swh-cassandra.before	2025-01-21 16:31:02.892365944 +0100
+++ /tmp/swh-chart.swh.TOLvCdTG/production-swh-cassandra.after	2025-01-21 16:31:03.460337208 +0100
@@ -23613,21 +23613,20 @@
 # Source: swh/templates/objstorage/deployment.yaml
 apiVersion: apps/v1
 kind: Deployment
 metadata:
   namespace: swh-cassandra
   name: objstorage-read-only
   labels:
     app: objstorage-read-only
 spec:
   revisionHistoryLimit: 2
-  replicas: 2
   selector:
     matchLabels:
       app: objstorage-read-only
   strategy:
     type: RollingUpdate
     rollingUpdate:
       maxSurge: 1
   template:
     metadata:
       labels:
@@ -34280,20 +34279,57 @@
   triggers:
   - type: kafka
     metadata:
       bootstrapServers: kafka1.internal.softwareheritage.org:9094,kafka2.internal.softwareheritage.org:9094,kafka3.internal.softwareheritage.org:9094,kafka4.internal.softwareheritage.org:9094
       consumerGroup: swh-archive-prod-s3-content-replayer
       lagThreshold: "1000"
       offsetResetPolicy: earliest
     authenticationRef:
       name: keda-objstorage-replayer-s3-authentication
 ---
+# Source: swh/templates/objstorage/autoscaling.yaml
+apiVersion: keda.sh/v1alpha1
+kind: ScaledObject
+metadata:
+  name: objstorage-read-only-gunicorn-scaled-object
+  namespace: swh-cassandra
+  labels:
+    app: objstorage-read-only
+spec:
+  cooldownPeriod: 300
+  pollingInterval: 30
+  minReplicaCount: 2
+  maxReplicaCount: 10
+  scaleTargetRef:
+    apiVersion: apps/v1
+    kind: Deployment
+    name: objstorage-read-only
+  triggers:
+  - type: prometheus
+    metadata:
+      serverAddress: http://prometheus-operated.cattle-monitoring-system:9090
+      metricName: gunicorn_requests
+      threshold: "1"
+      # There is no environment when using the cluster's prometheus instance
+      #
+      # 1s of request time during 1s => 1 worker busy
+      # 10s of request time during 1s => 10 workers busy
+      # It's the closest we can get without the busy workers count
+      # the number of busy workers is divided by number of workers per pods and multiplied
+      # by a factor to have time to scale when the number of requests increase.
+      query: |
+        round(sum  by (namespace, deployment) (rate(gunicorn_request_duration_sum{namespace="swh-cassandra", deployment="objstorage-read-only"}[2m]))
+        /
+        on(namespace,deployment) gunicorn_workers{namespace="swh-cassandra", deployment="objstorage-read-only"}
+        *
+        1)
+---
 # Source: swh/templates/storage/autoscaling.yaml
 apiVersion: keda.sh/v1alpha1
 kind: ScaledObject
 metadata:
   name: storage-cassandra-azure-readonly-gunicorn-scaled-object
   namespace: swh-cassandra
   labels:
     app: storage-cassandra-azure-readonly
 spec:
   cooldownPeriod: 300

Merge request reports

Loading