scheduler-recurrent: Adapt scheduling default policy so origins without last update get regularly scheduled
They are currently not listed.
(octo-diff would not work because i used the wrong branch to compare... ;)
[1]
$SWH_PUPPET_ENVIRONMENT_HOME/bin/octocatalog-diff --to staging-fix-schedule-recurrent-config scheduler
0
Found host scheduler0.internal.staging.swh.network
Cloning into '/tmp/swh-ocd.Tk5NYTYc/swh-site'...
done.
branch 'staging-fix-schedule-recurrent-config' set up to track 'origin/staging-fix-schedule-recurrent-config'.
Switched to a new branch 'staging-fix-schedule-recurrent-config'
WARN -> Environment "staging-fix-schedule-recurrent-config" contained non-word characters, correcting name to staging_fix_schedule_recurrent_config
Cloning into '/tmp/swh-ocd.Tk5NYTYc/environments/production/data/private'...
done.
Cloning into '/tmp/swh-ocd.Tk5NYTYc/environments/staging_fix_schedule_recurrent_config/data/private'...
done.
*** Running octocatalog-diff on host scheduler0.internal.staging.swh.network
I, [2023-07-04T16:23:49.725788 #1046096] INFO -- : Catalogs compiled for scheduler0.internal.staging.swh.network
I, [2023-07-04T16:23:50.111775 #1046096] INFO -- : Diffs computed for scheduler0.internal.staging.swh.network
diff origin/production/scheduler0.internal.staging.swh.network current/scheduler0.internal.staging.swh.network
*******************************************
File[/etc/softwareheritage/scheduler/listener-runner.yml] =>
parameters =>
content =>
@@ -6,3 +6,14 @@
celery:
task_broker: amqp://guest:guest@127.0.0.1:5672/%2f
+scheduling_policy:
+ default:
+ - policy: already_visited_order_by_lag
+ weight: 40
+ - policy: never_visited_oldest_update_first
+ weight: 40
+ - policy: origins_without_last_update
+ weight: 20
+ opam:
+ - policy: origins_without_last_update
+ weight: 100
_
*******************************************
*** End octocatalog-diff on scheduler0.internal.staging.swh.network
Merge request reports
Activity
mentioned in issue swh/infra/sysadm-environment#4971 (closed)
I don't think you need all that complexity. the scheduling policies can just go into the main scheduler config file (they'll be ignored by the tools that don't need them):
commit 9025fc8a7b267c5f88d608523eca1e93dbee6231 Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Tue Jul 4 15:43:13 2023 +0200 Update default scheduling policy diff --git a/data/common/common.yaml b/data/common/common.yaml index 5a2e55fd..9a8c69f4 100644 --- a/data/common/common.yaml +++ b/data/common/common.yaml @@ -2672,6 +2672,14 @@ swh::deploy::scheduler::config: <<: *swh_scheduler_local_config celery: task_broker: "%{alias('swh::deploy::scheduler::task_broker')}" + scheduling_policies: + default: + - policy: already_visited_order_by_lag + weight: 40 + - policy: never_visited_oldest_update_first + weight: 40 + - policy: origins_without_last_update + weight: 20 swh::deploy::scheduler::packages: - python3-swh.lister - python3-swh.loader.bzr
seems to add the expected configs to saatchi and scheduler0.
I don't think you need all that complexity. the scheduling policies can just go into the main scheduler config file (they'll be ignored by the tools that don't need them):
I was unsure it'll get ignored, great to know. I'll adapt.
I don't want to touch to the default config though.
(fwiw, I think that if the swh.scheduler default is inadequate, which it seems, it should be changed there as well)
Well i guess i can adapt according to what you suggested.
In any case though, as opam origins don't have any last update at all [1], is that ok to have the opam's default policy only with the origins_without_last_update with a weight of 100?
[1] I asked @anlambert and his tryouts were not concluant as that cannot be inferred consistently
In practice the last scheduling policy will try to fill all available slots, so it shouldn't make much difference. Of course if the other two policies are useless, no point having them (until we manage to infer a last update for these origins, and we end up noticing three years later that they're not getting scheduled anymore)
added 1 commit
- e156fef1 - scheduler-recurrent: Adapt default scheduling policy & add specific opam policy
In your updated commit message, you wrote:
The current default policy is not appropriate as too few origins without last update are scheduled.
I guess that's technically accurate, but the current default is to not schedule origins without last update at all, so that's a bit misleading: "too few" somewhat implies that we're lagging, but we're really not doing it at all.
Is this also an issue for non-opam listers? If it is, we should definitely be changing the swh.scheduler defaults. How should we be monitoring this so that it doesn't happen again?
I guess that's technically accurate, but the current default is to not schedule origins without last update at all, so that's a bit misleading: "too few" somewhat implies that we're lagging, but we're really not doing it at all.
Right, i had forgotten we enforced it hence my accurate must mislead statement. I'll probably change in 2 commits then. One for the defaults policy (with a proper message) so it can schedule some non last update origin policy. And another for opam.
Is this also an issue for non-opam listers?
I guess it is as we do have few other listers which do not have any last update. After checking some, bower does not have any, conda is not guaranted to have a last update, nor is cran, and I stopped there. I recall we are trying to enforce its use but sometimes the information is just not there (during reviews or developments).
If it is, we should definitely be changing the swh.scheduler defaults.
Yes, we should change it but maybe 20 (as per my last change) is a bit much maybe.
How should we be monitoring this so that it doesn't happen again?
That, I don't know.
Note that I also recall having a mixed feeling about the scheduling on last update policy which tend to create a high number of visits. And i don't know how to reconciliate that with this mr either...
Edited by Antoine R. DumontNote that I also recall having a mixed feeling about the scheduling on last update policy which tend to create a high number of visits. And i don't know how to reconciliate that with this mr either...
How so? I should only generate either zero (no update to the last_update field) or one (last_update updated) visit for each origin for each run of the lister.
How so? I should only generate either zero (no update to the last_update field) or one (last_update updated) visit for each origin for each run of the lister.
I may be misremembering but I saw a long time ago, origins getting scheduled in a loop. And those origins were corresponding to the one we know with a high number of visits. Saying that, it might simply have been the integration save-code-now checks which triggered a lot for those. And with time, i conflated the two.
Thanks for making me think back on that.
Tested in staging and it does the job:
Jul 04 15:25:30 scheduler0 swh[2131415]: INFO:swh.scheduler.celery_backend.recurrent_visits:opam: 1000 visits scheduled in queue swh.loader.package.opam.tasks.LoadOpam Jul 04 15:25:44 scheduler0 swh[2131415]: INFO:swh.scheduler.celery_backend.recurrent_visits:opam: 1000 visits scheduled in queue swh.loader.package.opam.tasks.LoadOpam Jul 04 15:28:54 scheduler0 swh[2131415]: INFO:swh.scheduler.celery_backend.recurrent_visits:opam: 390 visits scheduled in queue swh.loader.package.opam.tasks.LoadOpam Jul 04 15:28:58 scheduler0 swh[2131415]: INFO:swh.scheduler.celery_backend.recurrent_visits:opam: 490 visits scheduled in queue swh.loader.package.opam.tasks.LoadOpam Jul 04 15:29:03 scheduler0 swh[2131415]: INFO:swh.scheduler.celery_backend.recurrent_visits:opam: 591 visits scheduled in queue swh.loader.package.opam.tasks.LoadOpam Jul 04 15:29:09 scheduler0 swh[2131415]: INFO:swh.scheduler.celery_backend.recurrent_visits:opam: 300 visits scheduled in queue swh.loader.package.opam.tasks.LoadOpam Jul 04 15:30:21 scheduler0 swh[2131415]: INFO:swh.scheduler.celery_backend.recurrent_visits:opam: 279 visits scheduled in queue swh.loader.package.opam.tasks.LoadOpam Jul 04 15:30:24 scheduler0 swh[2131415]: INFO:swh.scheduler.celery_backend.recurrent_visits:opam: 279 visits scheduled in queue swh.loader.package.opam.tasks.LoadOpam Jul 04 15:30:32 scheduler0 swh[2131415]: INFO:swh.scheduler.celery_backend.recurrent_visits:opam: 100 visits scheduled in queue swh.loader.package.opam.tasks.LoadOpam