Add support for indexing directly from the journal client (!315) · Merge requests · Platform / Development / swh-indexer

Before this commit, the journal client only created scheduler tasks, which then run the indexers.

This commit adds support for a new flow: skipping the scheduler, to run indexers directly. This new behavior is triggered by adding a new argument on the CLI, which is the name of the indexer to run (currently, only origin-intrinsic-metadata).

This has the following consequences:

a crash in an indexer will now hang the whole thing (which is arguably good)
the journal client will probably need to be parallelized to keep up with the load
we can remove an existence check for origins

In term of deployment:

stop the old journal client
wait for all tasks to finish
stop and remove celery workers and queues
start the new journal client (it can reuse the group_id to avoid re-indexing, but I think it is a good opportunity to reindex because of all the temporary failures we had over time)

Part of #4273 (closed).

Depends on !314 (closed).

Migrated from D7899 (view on Phabricator)

Add support for indexing directly from the journal client

Merge request reports