Draft: Update the example dataset to add a few dandling nodes in the graph
Add a few contents and directories that are not referenced, to help testing edge cases.
Merge request reports
Activity
Jenkins job DGRPH/gitlab-builds #1750 failed .
See Console Output and Coverage Report for more details.- Resolved by David Douard
I'm afraid you can't just update the dataset like that. You need to regenerate the compressed graph (
swh graph compress
CLI) then update every single test to take the new nodes into account, update statistics, and tests that rely on the MPH+permutation to have a specific order.Edited by vlorentz
So using
generate_dataset --compress
does produce the compresed dataset, but there are a few add/remove elements of the diffstat I am not sure about:new file: swh/graph/example_dataset/compressed/example-bfs.roots.txt deleted: swh/graph/example_dataset/compressed/example.node2type.map deleted: swh/graph/example_dataset/compressed/example.nodes.csv.zst deleted: swh/graph/example_dataset/compressed/example.property.content.is_skipped.bits deleted: swh/graph/example_dataset/compressed/meta/export.json
- Is the
meta/export.json
necessary somewhere? How is it generated? - are the other deleted files an issue for the purpose of a usable example dataset?
- Is the
meta/export.json
is generated by the Luigi pipeline, which is what I used to compress the graph so far.example-bfs.roots.txt
should be removed, I forgot to add it to the list of files to clean up at the end.node2type.map
is not needed anymore as I updated the Java code to support the new format (node2type.bin
). Try re-runningmake java
? See https://gitlab.softwareheritage.org/swh/devel/swh-graph/-/blob/master/java/src/main/java/org/softwareheritage/graph/maps/NodeTypesMap.javaproperty.content.is_skipped.bits
is required, I'm afraid. You can either revert ef28a3f2 or runswh graph reindex
to create it fromproperty.content.is_skipped.bin
nodes.csv.zst
is not generated anymore, I don't think anything uses it.
to compare stats values (eg. https://gitlab.softwareheritage.org/swh/devel/swh-graph/-/blob/master/swh/graph/tests/test_cli.py?ref_type=heads#L66 )
The
.stats
and.properties
files are both generated by the compression pipeline (at different steps). If the compression pipeline had a bug writingnodes=0
in.stats
, then.properties
would containnodes=0
too and the test would not catch the bug.Edited by vlorentz
added 1 commit
- 468f69ac - Update the example dataset to add a few dandling nodes in the graph
Jenkins job DGRPH/gitlab-builds #1765 failed .
See Console Output and Coverage Report for more details.mentioned in issue #4844