Refactor swh-indexer to simplify non-trivial mapping operations
I am considering switching swh-indexer's mappings to use rdflib.Graph as the internal representation instead of Python objects based on JSON-LD.
I expect this to make it easier to work with namespaces and various types (eg. {"@id": s}
we have everywhere becomes URIRef(s)
whose value is checked for validatity immediately instead of crashing later while compacting/expanding). And will rely less on PyLD so https://forge.softwareheritage.org/#4436 won't be an issue anymore.
There are some issues with this:
- The PubSpec mapping relies on having two authorship statements whose value is a list; which is not preserved by JSON-LD compaction when working from a fully expanded JSON-LD. It worked so far because we only used the compaction algo on a somewhat-compacted form. https://github.com/w3c/json-ld-api/issues/547
- Lists are built-in to JSON-LD; but in RDF they need to be added as linked lists / graph chains, which is clunky to work with, so I need to design a better API than the naive solution here.
Migrated from T4450 (view on Phabricator)