Skip to content

Indexers: batch content analyzer infrastructure

We want to be able to analyze, in batch, all the content blobs stored by Software Heritage.

Sample use cases are:

  • compute mime type (service running)
  • detect the license using ninka/fossology (service running)
  • detect the programming language (service stopped)

To this end we need some scheduling tooling that allows to add/remove analyzer, (re)run analysis in batch, incrementally stay up to date with new incoming content blobs.


Migrated from T359 (view on Phabricator)

Edited by Phabricator Migration user