Most indexers are consuming journal topics slower than messages are produced
Grafana dashboards show that most indexer consumers are working, but they are lagging behind, and the gap is increasing.
-
https://grafana.softwareheritage.org/goto/RJkpEXVVz?orgId=1 origin-intrinsic-metadata (on
origin_visit_status
topic) -
https://grafana.softwareheritage.org/goto/c_iYPX4Vz?orgId=1 content-fossology-license (on
content
topic) -
https://grafana.softwareheritage.org/goto/g1VPPX4Vz?orgId=1 content-mimetype (on
content
) topic - however, the extrinsic-metadata indexer is fine: https://grafana.softwareheritage.org/goto/3YQsPX44z?orgId=1
The plot of lag derivative shows a progressive slowdown, so it's probably not due to a specific configuration change.
I do not know what is causing this, though. Two possible suspects:
- rdkafka frequently disconnecting from the brokers (or generally, having connection isues): https://sentry.softwareheritage.org/share/issue/76ed328b2ae6465face2ea4bb5f32187/
- slow storage and/or objstorage (which would make sense, as the extrinsic-metadata indexer is super-fast, and is also the only one not to use the storage and objstorage)
- simply not having enough workers
Migrated from T4612 (view on Phabricator)
Edited by vlorentz