Skip to content

swh/staging: limit the max number of connection of svix

Vincent Sellier requested to merge svix-rate-limiting into production

Ensure svix will not consume all the workers if something goes wrong in the infra.

It could also be a cascading effect if the webapp has some downtime, svix will try to send the backlog as fast as it can avoiding the webapp to recover.

The (no so) guesstimate is ~10 connections per webhook journal client instance of 1 connections per ~15 origin_visit_status/s.

This is true only for the scn endpoint, it will need to be adjusted of another endpoint is deployed or the current behavior modified

The estimation was done by trying to discover the limit with a big backlog if messages and 5 webhooks journal clients

Related to swh/infra/sysadm-environment#5339 (closed)

Another MR will follow to add the possibility to configure some alerting when a response time threshold is reached or a given percentage of requests are in error.

Helm diff

Using the good old diff because dyff hides the important '"' on the annotation value

[swh] Comparing changes between branches production and svix-rate-limiting (per environment)...
Your branch is up to date with 'origin/production'.
[swh] Generate config in production branch for environment staging, namespace swh...
[swh] Generate config in production branch for environment staging, namespace swh-cassandra...
[swh] Generate config in production branch for environment staging, namespace swh-cassandra-next-version...
[swh] Generate config in svix-rate-limiting branch for environment staging...
[swh] Generate config in svix-rate-limiting branch for environment staging...
[swh] Generate config in svix-rate-limiting branch for environment staging...
Your branch is up to date with 'origin/production'.
[swh] Generate config in production branch for environment production, namespace swh...
[swh] Generate config in production branch for environment production, namespace swh-cassandra...
[swh] Generate config in production branch for environment production, namespace swh-cassandra-next-version...
[swh] Generate config in svix-rate-limiting branch for environment production...
[swh] Generate config in svix-rate-limiting branch for environment production...
[swh] Generate config in svix-rate-limiting branch for environment production...


------------- diff for environment staging namespace swh -------------

     _        __  __
   _| |_   _ / _|/ _|  between /tmp/swh-chart.swh.Xr4mMIqD/staging-swh.before, 113 documents
 / _' | | | | |_| |_       and /tmp/swh-chart.swh.Xr4mMIqD/staging-swh.after, 113 documents
| (_| | |_| |  _|  _|
 \__,_|\__, |_| |_|   returned no differences
        |___/



------------- diff for environment staging namespace swh-cassandra -------------

     _        __  __
   _| |_   _ / _|/ _|  between /tmp/swh-chart.swh.Xr4mMIqD/staging-swh-cassandra.before, 400 documents
 / _' | | | | |_| |_       and /tmp/swh-chart.swh.Xr4mMIqD/staging-swh-cassandra.after, 401 documents
| (_| | |_| |  _|  _|
 \__,_|\__, |_| |_|   returned three differences
        |___/

(file level)
    ---
    # Source: swh/templates/webhooks/autoscaling.yaml
    apiVersion: keda.sh/v1alpha1
    kind: ScaledObject
    metadata:
      name: webhooks-origin-visit-status-scaledobject
      namespace: swh-cassandra
    spec:
      scaleTargetRef:
        name: webhooks-origin-visit-status
      pollingInterval: 120
      minReplicaCount: 1
      maxReplicaCount: 2
      idleReplicaCount: 0
      triggers:
      - type: kafka
        metadata:
          bootstrapServers: journal2.internal.staging.swh.network
          consumerGroup: swh-archive-stg-webhooks
          lagThreshold: 1000
          offsetResetPolicy: earliest

spec  (apps/v1/Deployment/swh-cassandra/webhooks-origin-visit-status)
  - one map entry removed:
    replicas: 1

metadata.annotations  (networking.k8s.io/v1/Ingress/swh-cassandra/web-cassandra-ingress-webhooks)
  + one map entry added:
    nginx.ingress.kubernetes.io/limit-connections: 3



------------- diff for environment staging namespace swh-cassandra-next-version -------------

     _        __  __
   _| |_   _ / _|/ _|  between /tmp/swh-chart.swh.Xr4mMIqD/staging-swh-cassandra-next-version.before, 168 documents
 / _' | | | | |_| |_       and /tmp/swh-chart.swh.Xr4mMIqD/staging-swh-cassandra-next-version.after, 168 documents
| (_| | |_| |  _|  _|
 \__,_|\__, |_| |_|   returned one difference
        |___/

metadata.annotations  (networking.k8s.io/v1/Ingress/swh-cassandra-next-version/web-cassandra-ingress-webhooks)
  + one map entry added:
    nginx.ingress.kubernetes.io/limit-connections: 3



------------- diff for environment production namespace swh -------------

     _        __  __
   _| |_   _ / _|/ _|  between /tmp/swh-chart.swh.Xr4mMIqD/production-swh.before, 426 documents
 / _' | | | | |_| |_       and /tmp/swh-chart.swh.Xr4mMIqD/production-swh.after, 427 documents
| (_| | |_| |  _|  _|
 \__,_|\__, |_| |_|   returned three differences
        |___/

(file level)
    ---
    # Source: swh/templates/webhooks/autoscaling.yaml
    apiVersion: keda.sh/v1alpha1
    kind: ScaledObject
    metadata:
      name: webhooks-origin-visit-status-scaledobject
      namespace: swh
    spec:
      scaleTargetRef:
        name: webhooks-origin-visit-status
      pollingInterval: 120
      minReplicaCount: 1
      maxReplicaCount: 5
      idleReplicaCount: 0
      triggers:
      - type: kafka
        metadata:
          bootstrapServers: "kafka1.internal.softwareheritage.org,kafka2.internal.softwareheritage.org,kafka3.internal.softwareheritage.org,kafka4.internal.softwareheritage.org"
          consumerGroup: swh-archive-prod-webhooks
          lagThreshold: 5000
          offsetResetPolicy: earliest
    
  

spec  (apps/v1/Deployment/swh/webhooks-origin-visit-status)
  - one map entry removed:
    replicas: 1

metadata.annotations  (networking.k8s.io/v1/Ingress/swh/web-archive-ingress-webhooks)
  + one map entry added:
    nginx.ingress.kubernetes.io/limit-connections: 10



------------- diff for environment production namespace swh-cassandra -------------

     _        __  __
   _| |_   _ / _|/ _|  between /tmp/swh-chart.swh.Xr4mMIqD/production-swh-cassandra.before, 96 documents
 / _' | | | | |_| |_       and /tmp/swh-chart.swh.Xr4mMIqD/production-swh-cassandra.after, 96 documents
| (_| | |_| |  _|  _|
 \__,_|\__, |_| |_|   returned no differences
        |___/

Edited by Vincent Sellier

Merge request reports