Direct scheduling of origin visits in celery
This stack of changes builds up to a CLI endpoint allowing us to schedule origin visits directly in Celery, bypassing the legacy scheduler entirely.
This has zero test coverage save from old tests still passing, which is already something... It's being used on the actual production database to schedule actual tasks for git, npm and pypi.
Included changes:
- Drop duplicate docstring from backend
- Make the origin visit scheduling cooldown configurable
(Cosmetic changes)
- Add a (longer) specific cooldown for failed origin visits
- Add a specific cooldown for notfound origins
Both of these changes prevent repeating visits on failing origins. This is necessary because, as we're using a consistent ordering with respect to the upstream information, we'd always be trying to load them, never reaching origins further down the stack. Listers should eventually disable these origins.
- Add table sampling option to grab_next_visits
Running common operations on all git origins is pretty intense. Using table sampling gives us the opportunity to at least schedule some jobs in (decently small) time.
- Add a (very basic) scheduling policy for origins with no known last update
This is especially useful for pypi, as well as some git hosters that do not provide the right info in their APIs. We will need to implement smarter heuristics to avoid repeated uneventful visits on these origins.
- Split off the helper for available slots in a celery queue
This is needed for the send-to-celery subcommand as well, so split it off of the runner module.
- Add a swh scheduler origin send-to-celery subcommand
Yes, finally!
Test Plan
obviously needs at least /some/ test coverage.
Migrated from D5809 (view on Phabricator)