Skip to content

Add a new cli endpoint to schedule recurrent visits in Celery

For each known visit type, we run a loop which:

  • monitors the size of the relevant celery queue
  • schedules more visits of the relevant type once the number of available slots goes over a given threshold (currently set to 5% of the max queue size).

The scheduling of visits combines multiple scheduling policies, for now using static ratios set in the POLICY_RATIOS dict. We emit a warning if the ratio of origins fetched for each policy is skewed with respect to the original request (allowing, for now, manual adjustement of the ratios).

The CLI endpoint spawns one thread for each visit type, which all handle connections to RabbitMQ and the scheduler backend separately. For now, we handle exceptions in the visit scheduling threads by (stupidly) respawning the relevant thread directly. We should probably improve this to give up after a specific number of tries.

Co-authored-by: Antoine R. Dumont (@ardumont) ardumont@softwareheritage.org

Related to swh/infra/sysadm-environment#3667 (closed)

Test Plan

  • docker test ran by @olasd
  • D6543: docker container update and scenario run (swh/meta$1201)
  • tox

Migrated from D6520 (view on Phabricator)

Merge request reports