Add helper functions to easily retrieve metadata associated to one origin URL
The discussion around swh-core!349 (merged) makes me believe that there's a gap in our tooling that swh.loader.metadata could fill in a more generic way: fetching (and parsing) the extrinsic metadata associated to one origin URL, so that it can be exploited.
To do so, we need to implement the following helpers :
-
mapping a origin URL to metadata fetchers -
either via a heuristic based on the url (e.g. github.com
,gitlab.*
, etc.) -
or via access to the scheduler database, seeded with information fetched from the lister
-
-
For each metadata fetcher, do lightweight parsing of the fetched raw metadata (e.g. translate JSON to a Python dict) -
Optionally run the Codemeta / MeSoCoRe mapping on the fetched metadata if that exists
Would this make sense? Should this set of helpers live in swh-loader-metadata?