- Jan 26, 2024
-
-
Antoine R. Dumont authored
Refs. swh/infra/sysadm-environment#5226
-
Antoine R. Dumont authored
Refs. swh/infra/sysadm-environment#5226
-
Jenkins for Software Heritage authored
-
Antoine R. Dumont authored
Refs. swh/infra/sysadm-environment#5215
-
Nicolas Dandrimont authored
As the intrinsic parallelism has increased, decrease the extrinsic parallelism. ackLate allows us to disable "stopWhenNoActivity", and to let autoscaling do its work.
-
and remove unnecessary indirections
-
Vincent Sellier authored
-
Vincent Sellier authored
-
It seems to fail the same way the current storage rpc does.
-
That should avoid having cascading effect. When workers are too busy to handle that probe, the http liveness probe fails, this ends up restarting the pod, in effect, killing the ongoing requests. https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes/#define-a-tcp-liveness-probe Refs. swh/infra/sysadm-environment#5215
-
Vincent Sellier authored
It's mandatory to allow prometheus to scrape the metrics directly from the pod. Related to swh/infra/sysadm-environment#5227
-
Vincent Sellier authored
Limit the scraping to 1 pod as the metrics are the same for all pods Related to swh/infra/sysadm-environment#5227
-
- Jan 25, 2024
-
-
Nicolas Dandrimont authored
CPU isn't a great proxy for what we actually need (busy gunicorn workers), but that's a start...
-
Nicolas Dandrimont authored
-
Nicolas Dandrimont authored
-
Nicolas Dandrimont authored
The space chopping was concatenating the yaml files together...
-
Jenkins for Software Heritage authored
-
Jenkins for Software Heritage authored
-
Jenkins for Software Heritage authored
-
Nicolas Dandrimont authored
client_max_size is actually set at the toplevel of the swh configfile, not within the objstorage.
-
Nicolas Dandrimont authored
The staging objstorage isn't especially fast
-
Nicolas Dandrimont authored
-
Nicolas Dandrimont authored
-
Nicolas Dandrimont authored
-
Nicolas Dandrimont authored
-
-
Vincent Sellier authored
The network looks very slow this days. The consumers fall in timeout and rafke ebalance all the consumers each time it appends. Related to swh/infra/sysadm-environment#5187
-
Antoine R. Dumont authored
Use 32 workers with 1 thread. Refs. swh/infra/sysadm-environment#5215
-
Antoine R. Dumont authored
For now, we don't know yet where it will settle and current 4 replicas does not follow with half our workers. Refs. swh/infra/sysadm-environment#5215
-
Antoine R. Dumont authored
Refs. swh/infra/sysadm-environment#5215
-
Antoine R. Dumont authored
-
Antoine R. Dumont authored
Refs. swh/infra/sysadm-environment#5215
-
Antoine R. Dumont authored
The last runs have finished so we can stop and decommission them. Refs. swh/infra/sysadm-environment#5223
-
Antoine R. Dumont authored
We migrated only around half the writers and the requests usage for both cpu and memory is already reached. Refs. swh/infra/sysadm-environment#5215
-
Antoine R. Dumont authored
Refs. swh/infra/sysadm-environment#5215
-
Antoine R. Dumont authored
Refs. swh/infra/sysadm-environment#5215
-
Antoine R. Dumont authored
Refs. swh/infra/sysadm-environment#5215
-
Antoine R. Dumont authored
The new one running on saam. Test it on one loader to incrementally check everything is fine. Refs. swh/infra/sysadm-environment#5215
-
Antoine R. Dumont authored
This will soon be migrated. Refs. swh/infra/sysadm-environment#5215
-
Antoine R. Dumont authored
Refs. swh/infra/sysadm-environment#5215
-