Skip to content

Add support for compressing the graph with only some node types

vlorentz requested to merge vlorentz/swh-graph:subset-types into master

Typically, this will be used to compress with 'ori,snp,rel,rev' as object types, producing a much smaller graph while still being useful for some experiments

Sorry, it's a big one but it would be pretty hard to test without updating the whole pipeline to support it

There are five kinds of changes:

  1. Making Python/Luigi workflow pass --allow-types to Java
  2. Updating Java main functions to take a --allow-types argument, and pass it to ORCGraphDataset
  3. Make ORCGraphDataset support AllowedNodeTypes, by simply returning null instead of tables, when tables don't match
  4. Updating the rest of the code for skipping some blocks when a table is null
  5. Update sanity checks in the Luigi workflow to support missing objects

This is the code used to resolve #4765 (closed)

Depends on !238 (merged), !235 (merged), !237 (merged)

Edited by vlorentz

Merge request reports