Skip to content
Snippets Groups Projects
Nicolas Dandrimont's avatar
Nicolas Dandrimont authored
For each known visit type, we run a loop which:
 - monitors the size of the relevant celery queue
 - schedules more visits of the relevant type once the number of
 available slots goes over a given threshold (currently set to 5% of the
 max queue size).

The scheduling of visits combines multiple scheduling policies, for now
using static ratios set in the `POLICY_RATIOS` dict. We emit a warning
if the ratio of origins fetched for each policy is skewed with respect
to the original request (allowing, for now, manual adjustement of the
ratios).

The CLI endpoint spawns one thread for each visit type, which all handle
connections to RabbitMQ and the scheduler backend separately. For now,
we handle exceptions in the visit scheduling threads by (stupidly)
respawning the relevant thread directly. We should probably improve this
to give up after a specific number of tries.

Co-authored-by: default avatarAntoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
50d7fd7f
History

swh-scheduler

Job scheduler for the Software Heritage project.

Task manager for asynchronous/delayed tasks, used for both recurrent (e.g., listing a forge, loading new stuff from a Git repository) and one-off activities (e.g., loading a specific version of a source package).