Indexers: compute (and maintain up-to-date) the filetype of all blobs
We want to have metadata in the DB that associate each blob to its intrinsic filetype.
As a first approximation the filetype might be encoded as a MIME type and computed using file --mime-type
.
Having also the detected encoding (as per file --mime-encoding
) would be nice too and will help the webapp quite a bit.
More advanced and structured information could be detected by using other tools, some of which are summarized in LWN.net's File-format analysis tools for archivists article.
Migrated from T439 (view on Phabricator)