- Mar 24, 2025
-
-
Antoine R. Dumont authored
Like we do for the save-code-now. Refs. swh/infra/sysadm-environment#5608
-
- Mar 21, 2025
-
-
Antoine R. Dumont authored
Preventing some load-deposit to finish properly. We limit the case where pods gets warm shutdown because there is almost no messages left in the queue. As they have 3600s to finish when they receive that signal, but some deposit ingestion takes longer than this period to finish [1] As we activated ackLate to stop losing messages, that means, the message is still in the queue and will get consumed again by another pod. Each deposit loading takes apparently the same amount of time for each ingestion of the same deposit. If the new pod is eventually warmed shutdown again, that could loop a while before being actually fully ingested (and the message acknowledged). [1] some takes up more than 6000s, e.g. this one took 9142.2s ``` {"asctime": "2025-03-20 20:56:35,996", "threadName": "MainThread", "pathname": "/opt/swh/venv/lib/python3.11/site-packages/celery/app/trace.py", "lineno": 128, "funcName": "info", "task_name": null, "task_id": null, "name": "celery.app.trace", "levelname": "INFO", "message": "Task swh.loader.package.deposit.tasks.LoadDeposit[ddb169b3-ac71-4599-87b9-098393817903] succeeded in 9142.2018524101s: {'status': 'eventful', 'snapshot_id': 'ff59853917f5a9b8770c8a2998753c6baabab885'}", "data": {"id": "ddb169b3-ac71-4599-87b9-098393817903", "name": "swh.loader.package.deposit.tasks.LoadDeposit", "return_value": "{'status': 'eventful', 'snapshot_id': 'ff59853917f5a9b8770c8a2998753c6baabab885'}", "runtime": 9142.2018524101, "args": "()", "kwargs": "{'url': 'https://doi.org/10.5281/zenodo.598174', 'deposit_id': 43423}"}} ``` Refs. swh/infra/sysadm-environment#5512
-
- Mar 20, 2025
-
-
Antoine R. Dumont authored
When a message is consumed but does not finish for some reason (e.g. deployment, crash, ...), it should be put back in the queue for another pod to consume it until the end. Refs. swh/infra/sysadm-environment#5512#note_200870
-
- Mar 19, 2025
-
-
Guillaume Samson authored
Related to swh/infra/sysadm-environment#5606
-
Guillaume Samson authored
Related to swh/infra/sysadm-environment#5606
-
Nicolas Dandrimont authored
-
-
-
- Mar 18, 2025
-
-
-
-
-
-
-
-
-
-
-
Antoine R. Dumont authored
Refs. swh/infra/sysadm-environment#5512
-
Antoine R. Dumont authored
The runner only schedules a subset of registered task types (e.g. lister, {check|load}-deposit, cooker). The rest are just not its concern anymore. Refs. swh/infra/sysadm-environment#5512
-
Antoine R. Dumont authored
Refs. swh/infra/sysadm-environment#5512
-
Guillaume Samson authored
-
- Mar 17, 2025
-
-
Antoine R. Dumont authored
To check whether that makes a difference in the queries sent to the backend. Refs. swh/infra/sysadm-environment#5512
-
Guillaume Samson authored
-
- Mar 14, 2025
-
-
Nicolas Dandrimont authored
-
Antoine R. Dumont authored
Refs. swh/infra/sysadm-environment#5512
-
Nicolas Dandrimont authored
-
Nicolas Dandrimont authored
-
Nicolas Dandrimont authored
-
Nicolas Dandrimont authored
-
- Mar 13, 2025
-
-
Antoine R. Dumont authored
There are some usage peak from time to time where we exceed the current number of requests our current replicas can handle. This is to reduce 502 errors. Refs. swh/infra/sysadm-environment#5512
-
-
Antoine R. Dumont authored
Refs. swh/infra/sysadm-environment#5512
-
- Mar 12, 2025
-
-
Antoine R. Dumont authored
-
Antoine R. Dumont authored
It was misindented from 2 characters.
-
Antoine R. Dumont authored
-
Antoine R. Dumont authored
-
Antoine R. Dumont authored
Otherwise, we schedule and fill-in queues which are not consumed. This then escalates to spurious infra alerts. Those loaders were stopped to avoid running too much loaders in the staging infra.
-
Antoine R. Dumont authored
-
Antoine R. Dumont authored
It's mostly the same way to trigger it. Only some configuration manipulation must slightly be adapted to allow the extra service to deploy it.
-