Skip to content

Refactor backend startup scripts

We start to have a couple of different scenario that need different combinations of components, for example the ingestion pipeline needs:

  • redis
  • kafka
  • winery
  • monitoring stack
  • [loaders]

while the deduplicator needs

  • kafka
  • winery
  • monitoring stack
  • the deduplicator

or the content importer need

  • winery
  • monitoring stack
  • the content importer

Currently, the deduplicator was added in the start swh loader backend script but it's should not be started in normal ingestion pipeline.

Instead of multiplying the startup script per scenario, perhaps we should add configuration entries to define which component should be started or not globally. It would allow to have a single startup script that could handle almost all the backend combinations.

An easy first version is introducing configuration entires, as mentioned above, in the form <SERVICE>_ENABLED=[true|false] and then conditionally start / stop the service in the backend start / stop script. Care needs to be taken with not starting Winery, because the loader still expects an object storage to be configured (we can use the noop object storage for this).

Edited by Simeon Carstens