celery_backend/runner: Switch write order to rabbitmq then postgresql

Messages are now first sent to rabbitmq then postgresql. In the nominal case where all writes are ok, that changes nothing vs the previous implementation (postgresql first then rabbitmq). In degraded performance though, that's supposedly better. 1. If we cannot write to rabbitmq, then we won't write to postgresql either, that function will raise and stop. 2. If we can write to rabbitmq first, then the messages will be consumed independently from this. And then, if we cannot write to postgresql (for some reason), then we just lose the information we sent the task already. This means the same task will be rescheduled and we'll have a go at it again. As those kind of tasks are supposed to be idempotent, that should not a major issue for their upstream. Also, those tasks are mostly listers now and they have a state management of their own, so that should definitely mostly noops (if the ingestion from the previous run went fine). Edge cases scenario like down site will behave as before. Refs. swh/infra/sysadm-environment#5512

celery_backend/runner: Switch write order to rabbitmq then postgresql
Messages are now first sent to rabbitmq then postgresql. In the nominal case where all writes are ok, that changes nothing vs the previous implementation (postgresql first then rabbitmq). In degraded performance though, that's supposedly better. 1. If we cannot write to rabbitmq, then we won't write to postgresql either, that function will raise and stop. 2. If we can write to rabbitmq first, then the messages will be consumed independently from this. And then, if we cannot write to postgresql (for some reason), then we just lose the information we sent the task already. This means the same task will be rescheduled and we'll have a go at it again. As those kind of tasks are supposed to be idempotent, that should not a major issue for their upstream. Also, those tasks are mostly listers now and they have a state management of their own, so that should definitely mostly noops (if the ingestion from the previous run went fine). Edge cases scenario like down site will behave as before. Refs. swh/infra/sysadm-environment#5512
48c68c1d · Antoine R. Dumont · 6a592e26 · 48c68c1d
Unverified Commit 48c68c1d authored 2 weeks ago by Antoine R. Dumont
--- a/swh/scheduler/celery_backend/runner.py
+++ b/swh/scheduler/celery_backend/runner.py
@@ -37,18 +37,27 @@ def write_to_backends(
    """Utility function to unify the writing to rabbitmq and the scheduler backends in a
    consistent way (towards transaction-like).

-    In the current state of affairs, the messages are written in postgresql first and
-    then sent to rabbitmq.
+    Messages are first sent to rabbitmq then postgresql.

-    That could pose an issue in case we can write to postgresql and then fail to write
-    in the rabbitmq backend. That then leave records in a pending state
-    ("next_run_scheduled") in the postgresql backend (history log).
+    In the nominal case where all writes are ok, that changes nothing vs the previous
+    implementation (postgresql first then rabbitmq).

-    Let's encapsulate in a common utility function first.
+    In degraded performance though, that's supposedly better.
+
+    1. If we cannot write to rabbitmq, then we won't write to postgresql either, that
+    function will raise and stop.
+
+    2. If we can write to rabbitmq first, then the messages will be consumed
+    independently from this. And then, if we cannot write to postgresql (for some
+    reason), then we just lose the information we sent the task already. This means the
+    same task will be rescheduled and we'll have a go at it again. As those kind of
+    tasks are supposed to be idempotent, that should not a major issue for their upstream.
+
+    Also, those tasks are mostly listers now and they have a state management of their
+    own, so that should definitely mostly noops (if the ingestion from the previous run
+    went fine). Edge cases scenario like down site will behave as before.

    """
-    backend.mass_schedule_task_runs(backend_tasks)
-    logger.debug("Written %s celery tasks", len(backend_tasks))
    for with_priority, backend_name, backend_id, args, kwargs in celery_tasks:
        kw = dict(
            task_id=backend_id,
@@ -59,6 +68,8 @@ def write_to_backends(
            kw["queue"] = f"save_code_now:{backend_name}"
        app.send_task(backend_name, **kw)
    logger.debug("Sent %s celery tasks", len(backend_tasks))
+    backend.mass_schedule_task_runs(backend_tasks)
+    logger.debug("Written %s celery tasks", len(backend_tasks))


 def run_ready_tasks(