Skip to content

Migrate scheduler services to elastic infra

That should ease the deployment of new listers & loaders. And decrease the dependency on the debian package too.

That will also start the scaffolding of the scheduler in swh-charts.

Charts ready (development):

Plan:

  • swh-apps: Add swh-scheduler image

  • swh-charts: Reference image ^

  • Create template for scheduler service(s)

  • Tests in minikube

  • Tests in staging

    • Label nodes [1]
    • Checks -> fail
    • Loop to fix issues
    • All good [2]
    • Stop temporarily (for now) in scheduler0.staging to check the staging one is able to keep up
      • scheduler0.staging: puppet agent --disable; systemctl stop swh-scheduler-schedule-recurrent
      • After > 1 day of run, queues are still getting filled regularly [0] && pod is fine [0']
      • scheduler0.staging: systemctl stop swh-scheduler-runner
      • Deploy scheduler-runner pod
      • scheduler0.staging: systemctl stop swh-scheduler-runner-priority
      • Deploy scheduler-runner-priority pod
      • scheduler0.staging: systemctl stop swh-scheduler-listener
      • Deploy scheduler-listener
      • Deploy journal client
      • Deploy update-metrics
  • Deploy scheduler rpc in staging (requires an ingress)

  • Migrate elastic staging services relying on the static scheduler rpc to the elastic scheduler rpc

  • status on scheduler services running in staging

  • Deploy scheduler services to production

  • Deactivate most services in saatchi (but rpc)

  • Deactivate rpc in saatchi

  • puppet: Clean up scheduler nodes to drop all migrated scheduler services (but rpc) -> at the end of this, the static scheduler node should only run rabbitmq

[0] https://grafana.softwareheritage.org/goto/dRfcY_kIk?orgId=1

[0'] https://grafana.softwareheritage.org/goto/Qplk8_zIk?orgId=1

[1]

$ kubectl --context archive-staging-rke2 label --overwrite node rancher-node-staging-rke2-worker6 swh/scheduler=true
node/rancher-node-staging-rke2-worker6 labeled
$ kubectl --context archive-staging-rke2 label --overwrite node rancher-node-staging-rke2-worker1 swh/scheduler=true
node/rancher-node-staging-rke2-worker1 labeled

[2]

scheduler-schedule-recurrent Running swh command --log-level INFO scheduler --config-file /etc/swh/config.yml schedule-recurrent
scheduler-schedule-recurrent INFO:swh.scheduler.celery_backend.recurrent_visits:Skewed fetch for visit type content with policy already_visited_order_by_lag: fetched 0.0, requested 0.4
scheduler-schedule-recurrent INFO:swh.scheduler.celery_backend.recurrent_visits:Skewed fetch for visit type content with policy never_visited_oldest_update_first: fetched 0.0, requested 0.4
scheduler-schedule-recurrent INFO:swh.scheduler.celery_backend.recurrent_visits:Skewed fetch for visit type content with policy origins_without_last_update: fetched 1.0, requested 0.2
scheduler-schedule-recurrent INFO:swh.scheduler.celery_backend.recurrent_visits:content: 1 visits scheduled in queue swh.loader.core.tasks.LoadContent
scheduler-schedule-recurrent INFO:swh.scheduler.celery_backend.recurrent_visits:Skewed fetch for visit type git-checkout with policy already_visited_order_by_lag: fetched 0.0, requested 0.4
scheduler-schedule-recurrent INFO:swh.scheduler.celery_backend.recurrent_visits:Skewed fetch for visit type git-checkout with policy never_visited_oldest_update_first: fetched 0.0, requested 0.4
scheduler-schedule-recurrent INFO:swh.scheduler.celery_backend.recurrent_visits:Skewed fetch for visit type git-checkout with policy origins_without_last_update: fetched 1.0, requested 0.2
scheduler-schedule-recurrent INFO:swh.scheduler.celery_backend.recurrent_visits:git-checkout: 66 visits scheduled in queue swh.loader.git.tasks.LoadGitCheckout
scheduler-schedule-recurrent INFO:swh.scheduler.celery_backend.recurrent_visits:Skewed fetch for visit type tarball-directory with policy already_visited_order_by_lag: fetched 0.0, requested 0.4
scheduler-schedule-recurrent INFO:swh.scheduler.celery_backend.recurrent_visits:Skewed fetch for visit type tarball-directory with policy never_visited_oldest_update_first: fetched 0.0, requested 0.4
scheduler-schedule-recurrent INFO:swh.scheduler.celery_backend.recurrent_visits:Skewed fetch for visit type tarball-directory with policy origins_without_last_update: fetched 1.0, requested 0.2
scheduler-schedule-recurrent INFO:swh.scheduler.celery_backend.recurrent_visits:tarball-directory: 200 visits scheduled in queue swh.loader.core.tasks.LoadTarballDirectory
scheduler-schedule-recurrent INFO:swh.scheduler.celery_backend.recurrent_visits:Skewed fetch for visit type tarball-directory with policy already_visited_order_by_lag: fetched 0.0, requested 0.4
scheduler-schedule-recurrent INFO:swh.scheduler.celery_backend.recurrent_visits:Skewed fetch for visit type tarball-directory with policy never_visited_oldest_update_first: fetched 0.0, requested 0.4
scheduler-schedule-recurrent INFO:swh.scheduler.celery_backend.recurrent_visits:Skewed fetch for visit type tarball-directory with policy origins_without_last_update: fetched 1.0, requested 0.2
scheduler-schedule-recurrent INFO:swh.scheduler.celery_backend.recurrent_visits:tarball-directory: 77 visits scheduled in queue swh.loader.core.tasks.LoadTarballDirectory
Stream closed EOF for swh/scheduler-schedule-recurrent-5d8644f5d8-f7gbw (prepare-configuration)
Edited by Antoine R. Dumont