Skip to content

Add graph properties compressed from the ORC dataset

This commit adds the handling of graph properties, i.e., data attached to nodes or edges (commit timestamps, commit messages, content lengths, ...) to swh-graph.

The class WriteNodeProperties is used to extract the node properties from the ORCGraphDataset and write them in separate files, in compressed format. The properties can then be read using the SwhGraphProperties class.

The compression pipeline and the tests were all changed to use the new dataset format.

Unfortunately there are a lot of interlocking parts and refactors that I had to work on in parallel, so this commit is not as... atomic as it could be.

The CI also won't pass until a new version of WebGraph is released.


Migrated from D7331 (view on Phabricator)

Merge request reports