- Aug 28, 2024
-
-
Vincent Sellier authored
It must be initialized manually for the moment Related to swh/infra/sysadm-environment#5391
-
Vincent Sellier authored
It's mostly probable that local clusters don't have a prometheus operator deployed.
-
- Aug 27, 2024
-
-
Vincent Sellier authored
The refresh_history endpoint takes more than 10s to retrieve the data from prometheus and compute the values. The cronjob regularly failed probably until enough data is in the thanos cache. Related to swh/infra/sysadm-environment#5387
-
Vincent Sellier authored
Related to swh/infra/sysadm-environment#5386
-
- Aug 26, 2024
-
-
Vincent Sellier authored
- Use the retry command line - Split the rpc checks from the refresh endpoint call - Fix a couple of minor side issues Related to swh/infra/sysadm-environment#5387
-
Vincent Sellier authored
-
Vincent Sellier authored
-
- Aug 22, 2024
-
-
Vincent Sellier authored
Same reason as for the production
-
Vincent Sellier authored
The number of requests sent to the ingress is known and limited so the number of webapp can be fined managed. It allows to remove all the autoscaling configuration.
-
Nicolas Dandrimont authored
Ref. swh/infra/sysadm-environment#5369
- Aug 21, 2024
-
-
Vincent Sellier authored
Related to swh/infra/sysadm-environment#5386
- Aug 20, 2024
-
-
Vincent Sellier authored
Use only 5XX errors to detect errors ratio. Using the 4XX status can trigger false positive alerting for some incomplete object storage or the webapp when quota are exceeded As there is now only 500 errors, the threshold can be reduced. During the last month, it seems a the 10% error rate was reached when real incident happened.
-
- Aug 19, 2024
-
-
Vincent Sellier authored
Use a dedicated ingress to use a regexp. It adds more flexibility on the static resource management Related to swh/infra/sysadm-environment#5383
-
Vincent Sellier authored
-
Vincent Sellier authored
-
- Aug 16, 2024
-
-
Antoine R. Dumont authored
Refs. swh/infra/sysadm-environment#5379
-
Antoine R. Dumont authored
Refs. swh/infra/sysadm-environment#5379
-
- Aug 15, 2024
-
-
Vincent Sellier authored
Even if the cpu is not a good metrics, reducing this ratio allow to start more replicas of the read-only storage. It seems the webapp slowness are somehow due to slowness of the read-only storage. More details are in the linked issue. swh/infra/sysadm-environment#5373
-
- Aug 14, 2024
-
-
Antoine R. Dumont authored
-
Antoine R. Dumont authored
Refs. swh/infra/sysadm-environment#5327
-
Antoine R. Dumont authored
Refs. swh/infra/sysadm-environment#5327
-
Antoine R. Dumont authored
Refs. swh/infra/sysadm-environment#5327
-
Antoine R. Dumont authored
Refs. swh/infra/sysadm-environment#5327
-
Antoine R. Dumont authored
Refs. swh/infra/sysadm-environment#5327
-
Antoine R. Dumont authored
Refs. swh/infra/sysadm-environment#5327
-
Antoine R. Dumont authored
Refs. swh/infra/sysadm-environment#5327
-
Antoine R. Dumont authored
Refs. swh/infra/sysadm-environment#5327
-
Antoine R. Dumont authored
Refs. swh/infra/sysadm-environment#5327
-
Antoine R. Dumont authored
Refs. swh/infra/sysadm-environment#5327
-
Antoine R. Dumont authored
-
Antoine R. Dumont authored
Refs. swh/infra/sysadm-environment#5327
-
Antoine R. Dumont authored
Refs. swh/infra/sysadm-environment#5327
-
Antoine R. Dumont authored
Refs. swh/infra/sysadm-environment#5327
-
Antoine R. Dumont authored
Refs. swh/infra/sysadm-environment#5327
-
Antoine R. Dumont authored
Refs. swh/infra/sysadm-environment#5327
-
Antoine R. Dumont authored
Refs. swh/infra/sysadm-environment#5327
-
Antoine R. Dumont authored
-