production/storage-rpc: Use tcp liveness probe
That should avoid having cascading effect. When workers are too busy to handle that probe, the http liveness probe fails, this ends up restarting the pod, in effect, killing the ongoing requests.
If it's working well, we should probably do the same for the remaining rpc services (for another diff). (Another commit does the same for the webapp template in the mr).
(It's currently tested without issue on both the storage rpc and the webapp instances).
make swh-helm-diff
[swh] Comparing changes between branches production and stabilize-storage-liveness-probe (per environment)...
Your branch is up to date with 'origin/production'.
[swh] Generate config in production branch for environment staging, namespace swh...
[swh] Generate config in production branch for environment staging, namespace swh-cassandra...
[swh] Generate config in production branch for environment staging, namespace swh-cassandra-next-version...
[swh] Generate config in stabilize-storage-liveness-probe branch for environment staging...
[swh] Generate config in stabilize-storage-liveness-probe branch for environment staging...
[swh] Generate config in stabilize-storage-liveness-probe branch for environment staging...
Your branch is up to date with 'origin/production'.
[swh] Generate config in production branch for environment production, namespace swh...
[swh] Generate config in production branch for environment production, namespace swh-cassandra...
[swh] Generate config in production branch for environment production, namespace swh-cassandra-next-version...
[swh] Generate config in stabilize-storage-liveness-probe branch for environment production...
[swh] Generate config in stabilize-storage-liveness-probe branch for environment production...
[swh] Generate config in stabilize-storage-liveness-probe branch for environment production...
------------- diff for environment staging namespace swh -------------
--- /tmp/swh-chart.swh.3D6eXKBU/staging-swh.before 2024-01-25 13:22:22.935902853 +0100
+++ /tmp/swh-chart.swh.3D6eXKBU/staging-swh.after 2024-01-25 13:22:23.959902479 +0100
@@ -23918,22 +23918,21 @@
- containerPort: 5002
name: rpc
readinessProbe:
httpGet:
path: /
port: rpc
initialDelaySeconds: 15
failureThreshold: 30
periodSeconds: 5
livenessProbe:
- httpGet:
- path: /
+ tcpSocket:
port: rpc
initialDelaySeconds: 10
periodSeconds: 5
command:
- /bin/bash
args:
- -c
- /opt/swh/entrypoint.sh
env:
- name: THREADS
@@ -24057,22 +24056,21 @@
- containerPort: 5002
name: rpc
readinessProbe:
httpGet:
path: /
port: rpc
initialDelaySeconds: 15
failureThreshold: 30
periodSeconds: 5
livenessProbe:
- httpGet:
- path: /
+ tcpSocket:
port: rpc
initialDelaySeconds: 10
periodSeconds: 5
command:
- /bin/bash
args:
- -c
- /opt/swh/entrypoint.sh
env:
- name: THREADS
------------- diff for environment staging namespace swh-cassandra -------------
--- /tmp/swh-chart.swh.3D6eXKBU/staging-swh-cassandra.before 2024-01-25 13:22:23.375902692 +0100
+++ /tmp/swh-chart.swh.3D6eXKBU/staging-swh-cassandra.after 2024-01-25 13:22:24.223902383 +0100
@@ -22398,22 +22398,21 @@
- containerPort: 5002
name: rpc
readinessProbe:
httpGet:
path: /
port: rpc
initialDelaySeconds: 15
failureThreshold: 30
periodSeconds: 5
livenessProbe:
- httpGet:
- path: /
+ tcpSocket:
port: rpc
initialDelaySeconds: 10
periodSeconds: 5
command:
- /bin/bash
args:
- -c
- /opt/swh/entrypoint.sh
env:
- name: THREADS
------------- diff for environment staging namespace swh-cassandra-next-version -------------
--- /tmp/swh-chart.swh.3D6eXKBU/staging-swh-cassandra-next-version.before 2024-01-25 13:22:23.639902596 +0100
+++ /tmp/swh-chart.swh.3D6eXKBU/staging-swh-cassandra-next-version.after 2024-01-25 13:22:24.439902304 +0100
@@ -20894,22 +20894,21 @@
- containerPort: 5002
name: rpc
readinessProbe:
httpGet:
path: /
port: rpc
initialDelaySeconds: 15
failureThreshold: 30
periodSeconds: 5
livenessProbe:
- httpGet:
- path: /
+ tcpSocket:
port: rpc
initialDelaySeconds: 10
periodSeconds: 5
command:
- /bin/bash
args:
- -c
- /opt/swh/entrypoint.sh
env:
- name: THREADS
------------- diff for environment production namespace swh -------------
--- /tmp/swh-chart.swh.3D6eXKBU/production-swh.before 2024-01-25 13:22:24.835902160 +0100
+++ /tmp/swh-chart.swh.3D6eXKBU/production-swh.after 2024-01-25 13:22:25.411901950 +0100
@@ -32070,22 +32070,21 @@
- containerPort: 5002
name: rpc
readinessProbe:
httpGet:
path: /
port: rpc
initialDelaySeconds: 15
failureThreshold: 30
periodSeconds: 5
livenessProbe:
- httpGet:
- path: /
+ tcpSocket:
port: rpc
initialDelaySeconds: 10
periodSeconds: 5
command:
- /bin/bash
args:
- -c
- /opt/swh/entrypoint.sh
env:
- name: THREADS
@@ -32435,22 +32434,21 @@
- containerPort: 5002
name: rpc
readinessProbe:
httpGet:
path: /
port: rpc
initialDelaySeconds: 15
failureThreshold: 30
periodSeconds: 5
livenessProbe:
- httpGet:
- path: /
+ tcpSocket:
port: rpc
initialDelaySeconds: 10
periodSeconds: 5
command:
- /bin/bash
args:
- -c
- /opt/swh/entrypoint.sh
env:
- name: THREADS
------------- diff for environment production namespace swh-cassandra -------------
--- /tmp/swh-chart.swh.3D6eXKBU/production-swh-cassandra.before 2024-01-25 13:22:25.019902093 +0100
+++ /tmp/swh-chart.swh.3D6eXKBU/production-swh-cassandra.after 2024-01-25 13:22:25.643901865 +0100
@@ -14245,22 +14245,21 @@
- containerPort: 5002
name: rpc
readinessProbe:
httpGet:
path: /
port: rpc
initialDelaySeconds: 15
failureThreshold: 30
periodSeconds: 5
livenessProbe:
- httpGet:
- path: /
+ tcpSocket:
port: rpc
initialDelaySeconds: 10
periodSeconds: 5
command:
- /bin/bash
args:
- -c
- /opt/swh/entrypoint.sh
env:
- name: STATSD_HOST
@@ -14602,22 +14601,21 @@
- containerPort: 5002
name: rpc
readinessProbe:
httpGet:
path: /
port: rpc
initialDelaySeconds: 15
failureThreshold: 30
periodSeconds: 5
livenessProbe:
- httpGet:
- path: /
+ tcpSocket:
port: rpc
initialDelaySeconds: 10
periodSeconds: 5
command:
- /bin/bash
args:
- -c
- /opt/swh/entrypoint.sh
env:
- name: STATSD_HOST
Edited by Antoine R. Dumont