Refactor metadata mappings using rdflib.Graph instead of JSON-LD internally (!399) · Merge requests · Platform / Development / swh-indexer

vlorentz requested to merge generated-differential-D8279-source into generated-differential-D8279-target Aug 22, 2022

Motivation:

It makes it easier to visualize what is actually happening when modifying the graph, by working explicitly on triples instead of a JSON-LD (a tree serialization of the graph).
Remove the need for the hacky merge_values() function (and possibly merge_documents() in a future commit)
It also catches malformed data exactly where it is added in the document (the call to rdflib.Graph.add()) instead of at the end of the mapping when running compaction/expansion.

Downsides:

Tests are clunkier, because they relied on deterministic order of unordered lists; but rdflib does not guarantee it
Code is longer
Extra dependency (which we will need at some point if we want to import from RDF datasets, anyway)

Sorry for the big diff. Bulk of changes is in base.py, codemeta.py, and `utils.py, everything else is adaptations of existing code without changing the underlying semantics

Depends on !368 (closed), !371 (closed), and !372 (closed)

Migrated from D8279 (view on Phabricator)

Refactor metadata mappings using rdflib.Graph instead of JSON-LD internally

Merge request reports