Indexers: batch content analyzer infrastructure
We want to be able to analyze, in batch, all the content blobs stored by Software Heritage.
Sample use cases are:
-
compute mime type (service running) -
detect the license using ninka/fossology (service running) -
detect the programming language (service stopped)
To this end we need some scheduling tooling that allows to add/remove analyzer, (re)run analysis in batch, incrementally stay up to date with new incoming content blobs.
Migrated from T359 (view on Phabricator)
Edited by Phabricator Migration user