production/web: Align archive instance with previous instance moma
Compared to moma, it's currently undersized to 2 replicas, 2 workers, 2 threads so 8 requests handling. As no gunicorn configuration is provided, the default values are set by the docker image's default values set in the Dockerfile.
In moma, it was configured with 32 workers (with a timeout of 3600s). So we bump the number of replicas to 4, and the number of workers to 4 as well. That makes up a total of 32 requests that can be handled, which matches what moma used to do.
This doubles the memory request since we doubled the number of workers, and current use is near 95% use of memory. This also aligns the request timeout to 3600s.
Another commit adapts the web template so it uses the same default as the Dockerfile declares. This has the benefit to explicit the current setup in environment variables (without impact with regards to the current setup)
make swh-helm-diff
[swh] Comparing changes between branches production and align-web-configuration-with-moma (per environment)...
Your branch is up to date with 'origin/production'.
[swh] Generate config in production branch for environment staging, namespace swh...
[swh] Generate config in production branch for environment staging, namespace swh-cassandra...
[swh] Generate config in production branch for environment staging, namespace swh-cassandra-next-version...
[swh] Generate config in align-web-configuration-with-moma branch for environment staging...
[swh] Generate config in align-web-configuration-with-moma branch for environment staging...
[swh] Generate config in align-web-configuration-with-moma branch for environment staging...
Your branch is up to date with 'origin/production'.
[swh] Generate config in production branch for environment production, namespace swh...
[swh] Generate config in production branch for environment production, namespace swh-cassandra...
[swh] Generate config in production branch for environment production, namespace swh-cassandra-next-version...
[swh] Generate config in align-web-configuration-with-moma branch for environment production...
[swh] Generate config in align-web-configuration-with-moma branch for environment production...
[swh] Generate config in align-web-configuration-with-moma branch for environment production...
------------- diff for environment staging namespace swh -------------
--- /tmp/swh-chart.swh.eDzE0c3T/staging-swh.before 2024-01-16 18:07:44.478771192 +0100
+++ /tmp/swh-chart.swh.eDzE0c3T/staging-swh.after 2024-01-16 18:07:45.110766624 +0100
@@ -24621,20 +24621,26 @@
value: webapp-postgresql.internal.staging.swh.network
initialDelaySeconds: 3
periodSeconds: 10
timeoutSeconds: 30
command:
- /bin/bash
args:
- -c
- /opt/swh/entrypoint.sh
env:
+ - name: WORKERS
+ value: "2"
+ - name: THREADS
+ value: "2"
+ - name: TIMEOUT
+ value: "3600"
- name: STATSD_HOST
value: prometheus-statsd-exporter
- name: STATSD_PORT
value: "9125"
- name: LOG_LEVEL
value: "INFO"
- name: SWH_CONFIG_FILENAME
value: /etc/swh/config.yml
- name: SWH_SENTRY_ENVIRONMENT
value: staging
------------- diff for environment staging namespace swh-cassandra -------------
--- /tmp/swh-chart.swh.eDzE0c3T/staging-swh-cassandra.before 2024-01-16 18:07:44.682769718 +0100
+++ /tmp/swh-chart.swh.eDzE0c3T/staging-swh-cassandra.after 2024-01-16 18:07:45.314765150 +0100
@@ -23105,20 +23105,26 @@
value: webapp.staging.swh.network
initialDelaySeconds: 3
periodSeconds: 10
timeoutSeconds: 30
command:
- /bin/bash
args:
- -c
- /opt/swh/entrypoint.sh
env:
+ - name: WORKERS
+ value: "2"
+ - name: THREADS
+ value: "2"
+ - name: TIMEOUT
+ value: "3600"
- name: STATSD_HOST
value: prometheus-statsd-exporter
- name: STATSD_PORT
value: "9125"
- name: LOG_LEVEL
value: "INFO"
- name: SWH_CONFIG_FILENAME
value: /etc/swh/config.yml
- name: SWH_SENTRY_ENVIRONMENT
value: staging
------------- diff for environment staging namespace swh-cassandra-next-version -------------
--- /tmp/swh-chart.swh.eDzE0c3T/staging-swh-cassandra-next-version.before 2024-01-16 18:07:44.878768301 +0100
+++ /tmp/swh-chart.swh.eDzE0c3T/staging-swh-cassandra-next-version.after 2024-01-16 18:07:45.506763762 +0100
@@ -21282,20 +21282,26 @@
value: webapp-cassandra-next-version.internal.staging.swh.network
initialDelaySeconds: 3
periodSeconds: 10
timeoutSeconds: 30
command:
- /bin/bash
args:
- -c
- /opt/swh/entrypoint.sh
env:
+ - name: WORKERS
+ value: "2"
+ - name: THREADS
+ value: "2"
+ - name: TIMEOUT
+ value: "3600"
- name: STATSD_HOST
value: prometheus-statsd-exporter
- name: STATSD_PORT
value: "9125"
- name: LOG_LEVEL
value: "INFO"
- name: SWH_CONFIG_FILENAME
value: /etc/swh/config.yml
- name: SWH_SENTRY_ENVIRONMENT
value: staging
------------- diff for environment production namespace swh -------------
--- /tmp/swh-chart.swh.eDzE0c3T/production-swh.before 2024-01-16 18:07:45.778761796 +0100
+++ /tmp/swh-chart.swh.eDzE0c3T/production-swh.after 2024-01-16 18:07:46.210758674 +0100
@@ -29902,20 +29902,26 @@
value: webapp1.internal.softwareheritage.org
initialDelaySeconds: 3
periodSeconds: 10
timeoutSeconds: 30
command:
- /bin/bash
args:
- -c
- /opt/swh/entrypoint.sh
env:
+ - name: WORKERS
+ value: "2"
+ - name: THREADS
+ value: "2"
+ - name: TIMEOUT
+ value: "3600"
- name: STATSD_HOST
value: prometheus-statsd-exporter
- name: STATSD_PORT
value: "9125"
- name: LOG_LEVEL
value: "INFO"
- name: SWH_CONFIG_FILENAME
value: /etc/swh/config.yml
- name: SWH_SENTRY_ENVIRONMENT
value: production
@@ -29985,21 +29991,21 @@
# Source: swh/templates/web/deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
namespace: swh
name: web-archive
labels:
app: web-archive
spec:
revisionHistoryLimit: 2
- replicas: 2
+ replicas: 4
selector:
matchLabels:
app: web-archive
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 1
template:
metadata:
labels:
@@ -30115,21 +30121,21 @@
args:
- -c
- cp -r $PWD/.local/share/swh/web/static/ /usr/share/swh/web/static/
volumeMounts:
- name: static
mountPath: /usr/share/swh/web/static
containers:
- name: web-archive
resources:
requests:
- memory: 3072Mi
+ memory: 6144Mi
cpu: 350m
image: container-registry.softwareheritage.org/swh/infra/swh-apps/web:20240111.1
imagePullPolicy: IfNotPresent
ports:
- containerPort: 5004
name: webapp
readinessProbe:
httpGet:
path: /
port: webapp
@@ -30149,20 +30155,26 @@
value: archive.softwareheritage.org
initialDelaySeconds: 3
periodSeconds: 10
timeoutSeconds: 30
command:
- /bin/bash
args:
- -c
- /opt/swh/entrypoint.sh
env:
+ - name: WORKERS
+ value: "4"
+ - name: THREADS
+ value: "2"
+ - name: TIMEOUT
+ value: "3600"
- name: STATSD_HOST
value: prometheus-statsd-exporter
- name: STATSD_PORT
value: "9125"
- name: LOG_LEVEL
value: "INFO"
- name: SWH_CONFIG_FILENAME
value: /etc/swh/config.yml
- name: SWH_SENTRY_ENVIRONMENT
value: production
------------- diff for environment production namespace swh-cassandra -------------
--- /tmp/swh-chart.swh.eDzE0c3T/production-swh-cassandra.before 2024-01-16 18:07:45.938760640 +0100
+++ /tmp/swh-chart.swh.eDzE0c3T/production-swh-cassandra.after 2024-01-16 18:07:46.366757547 +0100
@@ -14907,20 +14907,26 @@
value: webapp-cassandra.internal.softwareheritage.org
initialDelaySeconds: 3
periodSeconds: 10
timeoutSeconds: 30
command:
- /bin/bash
args:
- -c
- /opt/swh/entrypoint.sh
env:
+ - name: WORKERS
+ value: "2"
+ - name: THREADS
+ value: "2"
+ - name: TIMEOUT
+ value: "3600"
- name: STATSD_HOST
value: prometheus-statsd-exporter
- name: STATSD_PORT
value: "9125"
- name: LOG_LEVEL
value: "INFO"
- name: SWH_CONFIG_FILENAME
value: /etc/swh/config.yml
- name: SWH_SENTRY_ENVIRONMENT
value: production
Merge request reports
Activity
FWIW, 32 workers * 5 threads * 2 replicas is 320 worker threads which is 10 times the size that it was on moma (32 workers * 1 thread * 1 replica). The current deployment (5 workers * 5 threads * 2 replicas) is already larger than moma was.
We should probably add some statsd instrumentation to our gunicorn instances to check if they're really being overwhelmed, and ideally turn that into a keda prometheus autoscaler.
Edited by Nicolas Dandrimont- Resolved by Antoine R. Dumont
After looking a bit more, the current deployment looks like 2 worker processes and 2 threads per worker process over 2 replicas (so, 2*2*2 = 8 worker threads overall), which is indeed smaller than moma is (and doesn't seem to match what you said is supposed to be the default).
It does make sense to bump that (but going all the way to 320 is a bit much!)
Edited by Nicolas Dandrimont
- Resolved by Antoine R. Dumont
So either:
- 8 workers
- 1 thread
- 2 replicas
or:
- 4 workers
- 4 threads
- 2 replicas
I don't know what's the most sensible here, any ideas?
- Resolved by Antoine R. Dumont
To unconfuse the default value situation, changing the template from
value: {{ $web_config.gunicorn.threads | default 5 | quote }}
to
value: {{ dig "gunicorn" "threads" 5 $web_config | quote }}
(and dropping the
if $web_config.gunicorn
block) might make senseEdited by Nicolas Dandrimont
added 1 commit
- 89f6ab94 - production/web: Align archive instance with previous instance moma
- Resolved by Antoine R. Dumont
added 4 commits
-
ab92184a...de774880 - 2 commits from branch
staging
- 396a5fbb - production/web: Align archive instance with previous instance moma
- bae2c6d2 - template/web: Simplify the gunicorn setup
-
ab92184a...de774880 - 2 commits from branch