Add support for indexing directly from the journal client
Before this commit, the journal client only created scheduler tasks, which then run the indexers.
This commit adds support for a new flow: skipping the scheduler,
to run indexers directly.
This new behavior is triggered by adding a new argument on the CLI,
which is the name of the indexer to run (currently, only
origin-intrinsic-metadata
).
This has the following consequences:
- a crash in an indexer will now hang the whole thing (which is arguably good)
- the journal client will probably need to be parallelized to keep up with the load
- we can remove an existence check for origins
In term of deployment:
- stop the old journal client
- wait for all tasks to finish
- stop and remove celery workers and queues
- start the new journal client (it can reuse the group_id to avoid re-indexing, but I think it is a good opportunity to reindex because of all the temporary failures we had over time)
Part of #4273 (closed).
Depends on !314 (closed).
Migrated from D7899 (view on Phabricator)