Migrate scheduler services to elastic infra
That should ease the deployment of new listers & loaders. And decrease the dependency on the debian package too.
That will also start the scaffolding of the scheduler in swh-charts.
Charts ready (development):
-
Build swh-scheduler apps image -
swh/infra/ci-cd/swh-charts!132 (merged): swh-scheduler-schedule-recurrent -
swh/infra/ci-cd/swh-charts!134 (merged): swh-scheduler-listener -
swh/infra/ci-cd/swh-charts!134 (merged): swh-scheduler-runner -
swh/infra/ci-cd/swh-charts!134 (merged): swh-scheduler-runner-priority -
swh/infra/ci-cd/swh-charts!132 (merged): gunicorn-swh-service [ #4780 (closed) ] -
swh/infra/ci-cd/swh-charts!135 (merged): swh-scheduler-update-metrics.{timer,service} -
swh/infra/ci-cd/swh-charts!135 (merged): swh-scheduler-journal-client
Plan:
-
swh-apps: Add swh-scheduler image -
swh-charts: Reference image ^ -
Create template for scheduler service(s) -
Tests in minikube -
Tests in staging -
Label nodes [1] -
Checks -> fail -
Loop to fix issues -
All good [2] -
Stop temporarily (for now) in scheduler0.staging to check the staging one is able to keep up -
scheduler0.staging: puppet agent --disable; systemctl stop swh-scheduler-schedule-recurrent -
After > 1 day of run, queues are still getting filled regularly [0] && pod is fine [0'] -
scheduler0.staging: systemctl stop swh-scheduler-runner -
Deploy scheduler-runner pod -
scheduler0.staging: systemctl stop swh-scheduler-runner-priority -
Deploy scheduler-runner-priority pod -
scheduler0.staging: systemctl stop swh-scheduler-listener -
Deploy scheduler-listener -
Deploy journal client -
Deploy update-metrics
-
-
-
Deploy scheduler rpc in staging (requires an ingress) -
Migrate elastic staging services relying on the static scheduler rpc to the elastic scheduler rpc -
status on scheduler services running in staging -
Deploy scheduler services to production -
Deactivate most services in saatchi (but rpc) -
Deactivate rpc in saatchi -
puppet: Clean up scheduler nodes to drop all migrated scheduler services (but rpc) -> at the end of this, the static scheduler node should only run rabbitmq
[0] https://grafana.softwareheritage.org/goto/dRfcY_kIk?orgId=1
[0'] https://grafana.softwareheritage.org/goto/Qplk8_zIk?orgId=1
[1]
$ kubectl --context archive-staging-rke2 label --overwrite node rancher-node-staging-rke2-worker6 swh/scheduler=true
node/rancher-node-staging-rke2-worker6 labeled
$ kubectl --context archive-staging-rke2 label --overwrite node rancher-node-staging-rke2-worker1 swh/scheduler=true
node/rancher-node-staging-rke2-worker1 labeled
[2]
scheduler-schedule-recurrent Running swh command --log-level INFO scheduler --config-file /etc/swh/config.yml schedule-recurrent
scheduler-schedule-recurrent INFO:swh.scheduler.celery_backend.recurrent_visits:Skewed fetch for visit type content with policy already_visited_order_by_lag: fetched 0.0, requested 0.4
scheduler-schedule-recurrent INFO:swh.scheduler.celery_backend.recurrent_visits:Skewed fetch for visit type content with policy never_visited_oldest_update_first: fetched 0.0, requested 0.4
scheduler-schedule-recurrent INFO:swh.scheduler.celery_backend.recurrent_visits:Skewed fetch for visit type content with policy origins_without_last_update: fetched 1.0, requested 0.2
scheduler-schedule-recurrent INFO:swh.scheduler.celery_backend.recurrent_visits:content: 1 visits scheduled in queue swh.loader.core.tasks.LoadContent
scheduler-schedule-recurrent INFO:swh.scheduler.celery_backend.recurrent_visits:Skewed fetch for visit type git-checkout with policy already_visited_order_by_lag: fetched 0.0, requested 0.4
scheduler-schedule-recurrent INFO:swh.scheduler.celery_backend.recurrent_visits:Skewed fetch for visit type git-checkout with policy never_visited_oldest_update_first: fetched 0.0, requested 0.4
scheduler-schedule-recurrent INFO:swh.scheduler.celery_backend.recurrent_visits:Skewed fetch for visit type git-checkout with policy origins_without_last_update: fetched 1.0, requested 0.2
scheduler-schedule-recurrent INFO:swh.scheduler.celery_backend.recurrent_visits:git-checkout: 66 visits scheduled in queue swh.loader.git.tasks.LoadGitCheckout
scheduler-schedule-recurrent INFO:swh.scheduler.celery_backend.recurrent_visits:Skewed fetch for visit type tarball-directory with policy already_visited_order_by_lag: fetched 0.0, requested 0.4
scheduler-schedule-recurrent INFO:swh.scheduler.celery_backend.recurrent_visits:Skewed fetch for visit type tarball-directory with policy never_visited_oldest_update_first: fetched 0.0, requested 0.4
scheduler-schedule-recurrent INFO:swh.scheduler.celery_backend.recurrent_visits:Skewed fetch for visit type tarball-directory with policy origins_without_last_update: fetched 1.0, requested 0.2
scheduler-schedule-recurrent INFO:swh.scheduler.celery_backend.recurrent_visits:tarball-directory: 200 visits scheduled in queue swh.loader.core.tasks.LoadTarballDirectory
scheduler-schedule-recurrent INFO:swh.scheduler.celery_backend.recurrent_visits:Skewed fetch for visit type tarball-directory with policy already_visited_order_by_lag: fetched 0.0, requested 0.4
scheduler-schedule-recurrent INFO:swh.scheduler.celery_backend.recurrent_visits:Skewed fetch for visit type tarball-directory with policy never_visited_oldest_update_first: fetched 0.0, requested 0.4
scheduler-schedule-recurrent INFO:swh.scheduler.celery_backend.recurrent_visits:Skewed fetch for visit type tarball-directory with policy origins_without_last_update: fetched 1.0, requested 0.2
scheduler-schedule-recurrent INFO:swh.scheduler.celery_backend.recurrent_visits:tarball-directory: 77 visits scheduled in queue swh.loader.core.tasks.LoadTarballDirectory
Stream closed EOF for swh/scheduler-schedule-recurrent-5d8644f5d8-f7gbw (prepare-configuration)