Implement task runner to schedule oneshot tasks with high priority for listed origins
This is a sub-task for implementing a MVP for the Bulk On-demand Archival feature.
The new save-bulk
lister for the Bulk On-demand Archival feature will record ListedOrigin
objects in scheduler database from a list of origin URLs and their visit types submitted by a user (swh-lister#4709 (closed)).
Those will be recurrently visited by loaders but nevertheless the first visits must be scheduled with high priority to give user quick feedback about the loading statuses of the origins he submitted.
A dedicated task runner for loading origins listed by the save-bulk
lister should be implemented to do so:
- the runner should grab
ListedOrigin
objects by filtering on lister ids corresponding to thesave-bulk
lister type - if origins were not visited since the first listing date, celery tasks must be created in dedicated RabbitMQ queues (different from the ones used for recurrent visits and save code now)
Related to swh/meta#5091 (closed).
Designs
- Show closed items
Activity
-
Newest first Oldest first
-
Show all activity Show comments only Show history only
- Antoine Lambert added activity::Implementation priority:High roadmap 2024 labels
added activity::Implementation priority:High roadmap 2024 labels
- Antoine Lambert assigned to @anlambert
assigned to @anlambert
- Antoine Lambert marked this issue as related to swh/meta#5091 (closed)
marked this issue as related to swh/meta#5091 (closed)
- Antoine Lambert mentioned in issue swh/meta#5091 (closed)
mentioned in issue swh/meta#5091 (closed)
- Author Maintainer
After some analysis on the scheduler code, there is already the CLI command
swh scheduler origin send-origins-from-scheduler-to-celery
that can be used to schedule origins registered by thesave-bulk
lister and send them to dedicated RabbitMQ queues.For instance, considering the queue names for the save bulk origins are the celery task names prefixed by
save_bulk:
, the following bash script enables to schedule the origins for loadings (tested in docker). The absolute cooldown parameter specifies the minimum amount of time between two consecutive visits of the same origin.visit_types=(bzr cvs hg git svn tarball-directory) declare -A task_name=( [bzr]=swh.loader.bzr.tasks.LoadBazaar [cvs]=swh.loader.cvs.tasks.LoadCvsRepository [hg]=swh.loader.mercurial.tasks.LoadMercurial [git]=swh.loader.git.tasks.UpdateGitRepository [svn]=swh.loader.svn.tasks.DumpMountAndLoadSvnRepository [tarball-directory]=swh.loader.core.tasks.LoadTarballDirectory ) for visit_type in "${visit_types[@]}"; do swh scheduler origin send-origins-from-scheduler-to-celery $visit_type \ --lister-name save-bulk \ --queue save_bulk:${task_name[$visit_type]} \ --absolute-cooldown "7 days" done
Such script could be executed in a cron on a regular basis to schedule origins submitted through the Web API endpoint dedicated to origins bulk save.
- Antoine Lambert changed the description
changed the description
- Author Maintainer
Specification of new scheduler runner from https://hedgedoc.softwareheritage.org/R4T0WGsoSDSgxia40anJMg?view:
Proposal
- Migrate the
schedule-first-visits
logic in the scheduler backend- Not enough because can't be called by the webapp directly (the lister have to run before)
Final proposal:
- New columns on the lister
- A new
listing_completed_at
column indicating the last complete listing - A new
first_loading_queue_prefix
column indicating the prefix of the queue for the first loading - A new
first_visit_scheduled_at
column to identify if the first visit was done or not
- A new
- A new composant checking this status and scheduling the origins equivalent to
schedule-first-visits
- select lister where
first_visit_scheduled_at = null and first_loading_queue_prefix != null
- select lister where
Deployment:
- new runner deployment
Next step:
- adapt the add-forge-now process to enter in this schema
-
check how to limit the listed origins in staging
-
- Migrate the
- Antoine Lambert mentioned in merge request !390 (merged)
mentioned in merge request !390 (merged)
- Antoine Lambert mentioned in commit anlambert/swh-lister@d6be2cab
mentioned in commit anlambert/swh-lister@d6be2cab
- Antoine Lambert mentioned in commit anlambert/swh-lister@60530d3a
mentioned in commit anlambert/swh-lister@60530d3a
- Antoine Lambert mentioned in commit anlambert/swh-lister@b9468cb9
mentioned in commit anlambert/swh-lister@b9468cb9
- Antoine Lambert mentioned in commit anlambert/swh-lister@1b2dfce3
mentioned in commit anlambert/swh-lister@1b2dfce3
- Antoine Lambert mentioned in commit anlambert/swh-lister@268fe5d7
mentioned in commit anlambert/swh-lister@268fe5d7
- Antoine Lambert mentioned in merge request swh-lister!536 (closed)
mentioned in merge request swh-lister!536 (closed)
- Antoine Lambert mentioned in commit anlambert/swh-lister@69e1b870
mentioned in commit anlambert/swh-lister@69e1b870
- Antoine Lambert mentioned in commit anlambert/swh-lister@5000d42e
mentioned in commit anlambert/swh-lister@5000d42e
- Antoine Lambert mentioned in commit 6b266002
mentioned in commit 6b266002
- Antoine Lambert mentioned in commit ccee462b
mentioned in commit ccee462b
- Antoine Lambert mentioned in commit swh-lister@99f64ddb
mentioned in commit swh-lister@99f64ddb
- Antoine Lambert mentioned in commit swh-lister@7609ebf7
mentioned in commit swh-lister@7609ebf7
- Antoine Lambert mentioned in commit swh-lister@0e1093e3
mentioned in commit swh-lister@0e1093e3
- Antoine R. Dumont mentioned in commit ardumont/swh-scheduler@f61181c0
mentioned in commit ardumont/swh-scheduler@f61181c0
- Antoine R. Dumont mentioned in merge request !392 (merged)
mentioned in merge request !392 (merged)
- Antoine R. Dumont mentioned in commit ardumont/swh-scheduler@0b37912c
mentioned in commit ardumont/swh-scheduler@0b37912c
- Maintainer
Deployment
- new runner deployment
See !392 (merged)
- Antoine R. Dumont mentioned in commit ardumont/swh-scheduler@3f894778
mentioned in commit ardumont/swh-scheduler@3f894778
- Antoine R. Dumont mentioned in commit ardumont/swh-scheduler@c1e60187
mentioned in commit ardumont/swh-scheduler@c1e60187
- Antoine R. Dumont mentioned in commit ardumont/swh-scheduler@a3f037c2
mentioned in commit ardumont/swh-scheduler@a3f037c2
- Antoine R. Dumont mentioned in commit ardumont/swh-scheduler@20366c69
mentioned in commit ardumont/swh-scheduler@20366c69
- Antoine R. Dumont mentioned in commit ardumont/swh-scheduler@1027e25d
mentioned in commit ardumont/swh-scheduler@1027e25d
- Antoine R. Dumont mentioned in commit ardumont/swh-scheduler@e6916bfa
mentioned in commit ardumont/swh-scheduler@e6916bfa
- Antoine R. Dumont mentioned in commit ardumont/swh-scheduler@16beb63c
mentioned in commit ardumont/swh-scheduler@16beb63c
- Antoine R. Dumont mentioned in commit ardumont/swh-scheduler@4712b665
mentioned in commit ardumont/swh-scheduler@4712b665
- Antoine R. Dumont mentioned in commit ardumont/swh-scheduler@bed3529a
mentioned in commit ardumont/swh-scheduler@bed3529a
- Antoine R. Dumont mentioned in commit ardumont/swh-scheduler@72f72130
mentioned in commit ardumont/swh-scheduler@72f72130
- Antoine R. Dumont mentioned in commit ardumont/swh-scheduler@483713e4
mentioned in commit ardumont/swh-scheduler@483713e4
- Antoine R. Dumont mentioned in commit ardumont/swh-scheduler@a52c68db
mentioned in commit ardumont/swh-scheduler@a52c68db
- Antoine R. Dumont mentioned in commit ardumont/swh-scheduler@699e1cc8
mentioned in commit ardumont/swh-scheduler@699e1cc8
- Antoine R. Dumont mentioned in commit ardumont/swh-scheduler@847b1570
mentioned in commit ardumont/swh-scheduler@847b1570
- Antoine R. Dumont mentioned in commit ardumont/swh-scheduler@cb0205b6
mentioned in commit ardumont/swh-scheduler@cb0205b6
- Antoine R. Dumont mentioned in commit ardumont/swh-scheduler@d51dc048
mentioned in commit ardumont/swh-scheduler@d51dc048
- Antoine R. Dumont mentioned in commit ardumont/swh-scheduler@b77c8374
mentioned in commit ardumont/swh-scheduler@b77c8374
- Antoine R. Dumont mentioned in commit ardumont/swh-scheduler@17d7b464
mentioned in commit ardumont/swh-scheduler@17d7b464
- Antoine R. Dumont mentioned in commit ardumont/swh-scheduler@5763d16b
mentioned in commit ardumont/swh-scheduler@5763d16b
- Antoine R. Dumont mentioned in commit ardumont/swh-scheduler@0c943f18
mentioned in commit ardumont/swh-scheduler@0c943f18
- Antoine R. Dumont mentioned in commit ardumont/swh-scheduler@6afef458
mentioned in commit ardumont/swh-scheduler@6afef458
- Antoine R. Dumont mentioned in commit ardumont/swh-scheduler@f072c39a
mentioned in commit ardumont/swh-scheduler@f072c39a
- Antoine R. Dumont mentioned in commit ardumont/swh-scheduler@aa5ba45a
mentioned in commit ardumont/swh-scheduler@aa5ba45a
- Antoine R. Dumont mentioned in commit ardumont/swh-scheduler@bd7ba32b
mentioned in commit ardumont/swh-scheduler@bd7ba32b
- Antoine R. Dumont mentioned in commit ardumont/swh-scheduler@7aa69718
mentioned in commit ardumont/swh-scheduler@7aa69718
- Antoine R. Dumont mentioned in commit ardumont/swh-scheduler@1802081e
mentioned in commit ardumont/swh-scheduler@1802081e
- Antoine R. Dumont mentioned in commit ardumont/swh-scheduler@4300e055
mentioned in commit ardumont/swh-scheduler@4300e055
- Antoine R. Dumont mentioned in commit ardumont/swh-scheduler@4edb6697
mentioned in commit ardumont/swh-scheduler@4edb6697
- Antoine R. Dumont mentioned in commit ardumont/swh-scheduler@434b64aa
mentioned in commit ardumont/swh-scheduler@434b64aa
- Antoine R. Dumont mentioned in commit ardumont/swh-scheduler@64cc5ddf
mentioned in commit ardumont/swh-scheduler@64cc5ddf
- Antoine R. Dumont mentioned in commit docker@35bb6aad
mentioned in commit docker@35bb6aad
- Antoine R. Dumont mentioned in merge request docker!30 (merged)
mentioned in merge request docker!30 (merged)
- Antoine R. Dumont mentioned in commit ardumont/swh-scheduler@23dd849b
mentioned in commit ardumont/swh-scheduler@23dd849b
- Antoine R. Dumont mentioned in commit ardumont/swh-scheduler@18e9ecc4
mentioned in commit ardumont/swh-scheduler@18e9ecc4
- Antoine R. Dumont mentioned in commit ardumont/swh-scheduler@859bb348
mentioned in commit ardumont/swh-scheduler@859bb348
- Antoine R. Dumont mentioned in commit ardumont/swh-scheduler@ad956e4d
mentioned in commit ardumont/swh-scheduler@ad956e4d
- Antoine R. Dumont mentioned in commit ardumont/swh-scheduler@f28b3fa4
mentioned in commit ardumont/swh-scheduler@f28b3fa4
- Antoine R. Dumont mentioned in commit ardumont/swh-scheduler@5952077b
mentioned in commit ardumont/swh-scheduler@5952077b
- Antoine Lambert closed
closed