Self-healing for journal clients
To implement a form of self-healing, swh.objstorage.replayer has support for the systemd watchdog: if the systemd libraries are available, replayer journal clients will notify the system manager when they process each batch of messages. This enables automatic restarts by the service manager when a journal client hangs for any reason (e.g. blocking on the S3 side, or a kafka outage).
We should consider generalizing self-healing on other journal clients (e.g. swh.search, swh.scheduler).
Those deployed on static infra could leverage the same kind of systemd-based watchdog support swh.objstorage.replayer implements. I expect that KEDA would be able to autorestart stuck pods if it detects a stuck metric?