Skip to content

Add support for indexing directly from the journal client

Before this commit, the journal client only created scheduler tasks, which then run the indexers.

This commit adds support for a new flow: skipping the scheduler, to run indexers directly. This new behavior is triggered by adding a new argument on the CLI, which is the name of the indexer to run (currently, only origin-intrinsic-metadata).

This has the following consequences:

  • a crash in an indexer will now hang the whole thing (which is arguably good)
  • the journal client will probably need to be parallelized to keep up with the load
  • we can remove an existence check for origins

In term of deployment:

  1. stop the old journal client
  2. wait for all tasks to finish
  3. stop and remove celery workers and queues
  4. start the new journal client (it can reuse the group_id to avoid re-indexing, but I think it is a good opportunity to reindex because of all the temporary failures we had over time)

Part of #4273 (closed).

Depends on !314 (closed).


Migrated from D7899 (view on Phabricator)

Merge request reports