Commits · production · Platform / Infrastructure / CI CD / Helm charts for swh packages

Mar 24, 2025
- loader-deposit: Deactivate auto-scaling · bc4b2547
  Antoine R. Dumont authored 4 hours ago
  
  Like we do for the save-code-now. Refs. swh/infra/sysadm-environment#5608
  bc4b2547
Mar 21, 2025

loader-deposit: Increase minReplicaCount to reduce downscaling event · 35692512

Preventing some load-deposit to finish properly.

We limit the case where pods gets warm shutdown because there is almost no
messages left in the queue. As they have 3600s to finish when they receive
that signal, but some deposit ingestion takes longer than this period to
finish [1]

As we activated ackLate to stop losing messages, that means, the message is
still in the queue and will get consumed again by another pod. Each deposit
loading takes apparently the same amount of time for each ingestion of the
same deposit. If the new pod is eventually warmed shutdown again, that could
loop a while before being actually fully ingested (and the message
acknowledged).

[1] some takes up more than 6000s, e.g. this one took 9142.2s
```
{"asctime": "2025-03-20 20:56:35,996", "threadName": "MainThread", "pathname": "/opt/swh/venv/lib/python3.11/site-packages/celery/app/trace.py", "lineno": 128, "funcName": "info", "task_name": null, "task_id": null, "name": "celery.app.trace", "levelname": "INFO", "message": "Task swh.loader.package.deposit.tasks.LoadDeposit[ddb169b3-ac71-4599-87b9-098393817903] succeeded in 9142.2018524101s: {'status': 'eventful', 'snapshot_id': 'ff59853917f5a9b8770c8a2998753c6baabab885'}", "data": {"id": "ddb169b3-ac71-4599-87b9-098393817903", "name": "swh.loader.package.deposit.tasks.LoadDeposit", "return_value": "{'status': 'eventful', 'snapshot_id': 'ff59853917f5a9b8770c8a2998753c6baabab885'}", "runtime": 9142.2018524101, "args": "()", "kwargs": "{'url': 'https://doi.org/10.5281/zenodo.598174', 'deposit_id': 43423}"}}
```

Refs. swh/infra/sysadm-environment#5512

35692512

Mar 20, 2025

deposit-loader: Add ackLate configuration · 48e56fd3

Antoine R. Dumont authored 4 days ago

When a message is consumed but does not finish for some
reason (e.g. deployment, crash, ...), it should be put back in the queue for
another pod to consume it until the end.

Refs. swh/infra/sysadm-environment#5512#note_200870

48e56fd3

Mar 19, 2025
- swh/staging: Update Kafka brokers lists · dc5d1f09
  Guillaume Samson authored 4 days ago
  
  Related to swh/infra/sysadm-environment#5606
  dc5d1f09
- swh/alter: Update configmap for kafka notifications · 43fba37a
  Guillaume Samson authored 6 days ago
  
  Related to swh/infra/sysadm-environment#5606
  43fba37a
- Handle non-zero return codes for swh storage cassandra upgrade · ab0d164e
  Nicolas Dandrimont authored 6 days ago
  
  ab0d164e
- Add support for swh.storage v2.10.0 cassandra keyspace management commands · c2d21dc0
  Nicolas Dandrimont authored 6 days ago
  
  https://docs.softwareheritage.org/devel/swh-storage/cassandra-migrations.html
  c2d21dc0
- v450: Release swh.model v7.1.0 · 272a51db
  Jenkins for Software Heritage authored 5 days ago and Nicolas Dandrimont committed 5 days ago
  
  272a51db
- v449: Release swh.loader.git v2.13.0 · 84776e2f
  Jenkins for Software Heritage authored 5 days ago and Nicolas Dandrimont committed 5 days ago
  
  84776e2f
Mar 18, 2025
- v448: Release swh.datasets v1.1.1 · 0686badf
  Jenkins for Software Heritage authored 6 days ago and Nicolas Dandrimont committed 6 days ago
  
  0686badf
- v447: Release swh.scheduler v2.8.0 · 77acc9e7
  Jenkins for Software Heritage authored 6 days ago and Nicolas Dandrimont committed 6 days ago
  
  77acc9e7
- v446: Release swh.objstorage v4.0.0 · 9b8708bf
  Jenkins for Software Heritage authored 6 days ago and Nicolas Dandrimont committed 6 days ago
  
  9b8708bf
- v445: Release swh.storage v2.10.0 · 323fa50c
  Jenkins for Software Heritage authored 6 days ago and Nicolas Dandrimont committed 6 days ago
  
  323fa50c
- v444: Release swh.loader.bzr v1.4.4 · 6f65d554
  Jenkins for Software Heritage authored 6 days ago and Nicolas Dandrimont committed 6 days ago
  
  6f65d554
- v443: Release swh.indexer v3.9.1 · 0bc60671
  Jenkins for Software Heritage authored 1 week ago and Nicolas Dandrimont committed 6 days ago
  
  0bc60671
- v442: Release swh.provenance v0.3.3 · f55692bf
  Jenkins for Software Heritage authored 1 week ago and Nicolas Dandrimont committed 6 days ago
  
  f55692bf
- v441: Release swh.provenance v0.3.2 · c05f3cb0
  Jenkins for Software Heritage authored 1 week ago and Nicolas Dandrimont committed 6 days ago
  
  c05f3cb0
- v440: Release swh.loader.core v5.21.1 · 63c40ff7
  Jenkins for Software Heritage authored 1 week ago and Nicolas Dandrimont committed 6 days ago
  
  63c40ff7
- extra-service: Unify variable name and quote task type · c6b27fb1
  Antoine R. Dumont authored 6 days ago
  
  Refs. swh/infra/sysadm-environment#5512
  c6b27fb1
- scheduler-runner: Explicit task type patterns allowed to be scheduled · 4fe09c47
  Antoine R. Dumont authored 6 days ago
  
  The runner only schedules a subset of registered task types (e.g. lister, {check|load}-deposit, cooker). The rest are just not its concern anymore. Refs. swh/infra/sysadm-environment#5512
  4fe09c47
- scheduler: Bump to scheduler 2.8.0 · a6417cc2
  Antoine R. Dumont authored 6 days ago
  
  Refs. swh/infra/sysadm-environment#5512
  a6417cc2
- extra-service/runner: Activate runner · 3c225963
  Guillaume Samson authored 6 days ago
  
  3c225963
Mar 17, 2025
- extra-service/runner: Deactivate runner temporarily · 16eced42
  Antoine R. Dumont authored 6 days ago
  
  To check whether that makes a difference in the queries sent to the backend. Refs. swh/infra/sysadm-environment#5512
  16eced42
- cluster-configuration/rabbitmq: Update ignoreDifferences · 042ab073
  Guillaume Samson authored 1 week ago
  
  042ab073
Mar 14, 2025
- indexer: extend allowed network ranges for read-only access · 06db0221
  Nicolas Dandrimont authored 1 week ago
  
  06db0221
- prod/sched-rpc: Enable debug loglevel to investigate unscheduled tasks · 418cf451
  Antoine R. Dumont authored 1 week ago
  
  Refs. swh/infra/sysadm-environment#5512
  418cf451
- production: bump s3 objstorage copy parallelism a bit more · ec6beea2
  Nicolas Dandrimont authored 1 week ago
  
  ec6beea2
- production: allow vpn access to the read-only indexer storage · 3756dfb2
  Nicolas Dandrimont authored 1 week ago
  
  3756dfb2
- staging: add missing check_config endpoint to VPN indexer access · bca212f7
  Nicolas Dandrimont authored 1 week ago
  
  bca212f7
- staging: Allow VPN access to read-only indexer storage endpoints · 32181e1e
  Nicolas Dandrimont authored 1 week ago
  
  32181e1e
Mar 13, 2025
- prod/scheduler-rpc: Increase number of workers to 6 · a84602da
  Antoine R. Dumont authored 1 week ago
  
  There are some usage peak from time to time where we exceed the current number of requests our current replicas can handle. This is to reduce 502 errors. Refs. swh/infra/sysadm-environment#5512
  a84602da
- prod/scheduler-rpc: Match requestedMemory with current usage · e00662d4
  Antoine R. Dumont authored 1 week ago
  
  [1] https://grafana.softwareheritage.org/goto/_jpCQOhHk?orgId=1 Refs. swh/infra/sysadm-environment#5512
  e00662d4
- prod/sched-runner: Enable debug loglevel to investigate unscheduled tasks · df6882a7
  Antoine R. Dumont authored 1 week ago
  
  Refs. swh/infra/sysadm-environment#5512
  df6882a7
Mar 12, 2025
- scheduler/local-cluster: Fix enabled typo · f814bfbb
  Antoine R. Dumont authored 1 week ago
  
  f814bfbb
- staging/schedule-recurrent: Fix indentation to declare the runner · 8769d9fe
  Antoine R. Dumont authored 1 week ago
  
  It was misindented from 2 characters.
  8769d9fe
- extra-service/schedule-recurrent: Update default values · 69d1b69e
  Antoine R. Dumont authored 1 week ago
  
  69d1b69e
- scheduler/extra-services: Fix indentation · df55ac14
  Antoine R. Dumont authored 1 week ago
  
  df55ac14
- staging/recurrent: Only schedule loading tasks for enabled loaders · 3b40de70
  Antoine R. Dumont authored 1 week ago
  
  Otherwise, we schedule and fill-in queues which are not consumed. This then escalates to spurious infra alerts. Those loaders were stopped to avoid running too much loaders in the staging infra.
  3b40de70
- next-version: Add schedule-recurrent runner · 755f6cf1
  Antoine R. Dumont authored 1 week ago
  
  755f6cf1
- Deploy schedule-recurrent runner inside the extra-service block · 58fc6847
  Antoine R. Dumont authored 1 week ago
  
  It's mostly the same way to trigger it. Only some configuration manipulation must slightly be adapted to allow the extra service to deploy it.
  58fc6847