The test graph is too hard to change

mentioned in merge request swh-fuse!93 (merged)

changed the description

I just remembered that I created a way to generate test graphs that are not compressed: https://docs.rs/swh-graph/latest/swh_graph/graph_builder/struct.GraphBuilder.html, which can be dumped to disk and loaded.

This would solve the problem, but I need to document this better, and make it usable from the gRPC server.

actually, it's already supported by the gRPC server, with --graph-format json

mentioned in merge request !726

Solution 1 has two important upsides:

it's simple
it would force swh-{graph,datasets,provenance,fuse} to support edge cases any time we add one... and it makes sense. Those application are made for the complete graph, so they must be robust to those cases.

On repository bloat: it already weights more than 200Mb, so 500kb is negligible. But is 200mb worth moving to git annex ?

most of the 200MB is not reachable from the master branch but from debian/* branches, which we can probably clean up now.

it would force swh-{graph,datasets,provenance,fuse} to support edge cases any time we add one... and it makes sense. Those application are made for the complete graph, so they must be robust to those cases.

but it means every single test of swh-provenance must test all the edge cases, even those that are not relevant to swh-provenance. It would make the tests unreadable. And they would all need to be updated every time we make changes to the example dataset; which requires the person making the change to understand (and have in "live memory") what is being tested. I imagine every change to the test graph would take me hours, or a couple of days for someone not familiar with swh-provenance.

most of the 200MB is not reachable from the master branch but from debian/* branches, which we can probably clean up now.

@olasd cleaned up debian/* branches and tags, the swh-graph repo now weighs only 7MB.

Great !

This gives another perspective to !726, now I see what you did there and agree that option 2 is a reasonable choice right now. After all, it's not proven yet that it will increase the repository size to 100Mb (even on the long term !).

What are the advantages of option 2 over #4844 (comment 197578) though?

as far as I understood they're very, very similar, aren't they ?

Ah yes, indeed. The only difference is that the latter does not require checking in somewhat large binary files.

mentioned in issue swh-fuse#2919

The test graph is too hard to change

Designs

Child items 0

Activity