swh-graph tagshttps://gitlab.softwareheritage.org/swh/devel/swh-graph/-/tagshttps://gitlab.softwareheritage.org/swh/devel/swh-graph/-/tags/v3.3.1v3.3.1v3.3.1
* permute-and-symmetrize: Sort arc lists in parallel
* rust and java: Enforce max_edges *before* traversing edgesvlorentzhttps://gitlab.softwareheritage.org/swh/devel/swh-graph/-/tags/v3.3.0v3.3.0v3.3.0
* Rewrite most dataset generation scripts from Java to Rust
* Make most dataset generation scripts produce sharded files instead of
a single .csv.zst that cannot be processed in parallel
* Improve ergonomics of Rust library
* Switch some early compression steps to Rust (BV, BFS,
{PERMUTE,TRANSPOSE,SIMPLIFY}_BFS)
* Replace Athena with datafusion
* Finish Rust rewrite of the gRPC server (but Java remains the default)Valentin Lorentzhttps://gitlab.softwareheritage.org/swh/devel/swh-graph/-/tags/debian%2F3.2.1-1_swh1debian/3.2.1-1_swh1swh.graph Debian release 3.2.1-1~swh1Jenkins for Software Heritagejenkins@jenkins-debian1.internal.softwareheritage.orghttps://gitlab.softwareheritage.org/swh/devel/swh-graph/-/tags/debian%2Fupstream%2F3.2.1debian/upstream/3.2.1Upstream version 3.2.1Jenkins for Software Heritagejenkins@jenkins-debian1.internal.softwareheritage.orghttps://gitlab.softwareheritage.org/swh/devel/swh-graph/-/tags/v3.2.1v3.2.1v3.2.1
* Remove unnecessary copy of contentIsSkippedValentin Lorentzhttps://gitlab.softwareheritage.org/swh/devel/swh-graph/-/tags/debian%2F3.2.0-1_swh1debian/3.2.0-1_swh1swh.graph Debian release 3.2.0-1~swh1Jenkins for Software Heritagejenkins@jenkins-debian1.internal.softwareheritage.orghttps://gitlab.softwareheritage.org/swh/devel/swh-graph/-/tags/debian%2Fupstream%2F3.2.0debian/upstream/3.2.0Upstream version 3.2.0Jenkins for Software Heritagejenkins@jenkins-debian1.internal.softwareheritage.orghttps://gitlab.softwareheritage.org/swh/devel/swh-graph/-/tags/v3.2.0v3.2.0v3.2.0
- Migrate to copier, PEP420 and pyproject.toml based packaging.
- Provenance: add timestamps to directory_frontier and
content_in_revisions_without_frontiers.
Valentin Lorentzhttps://gitlab.softwareheritage.org/swh/devel/swh-graph/-/tags/debian%2F3.1.0-1_swh1debian/3.1.0-1_swh1swh.graph Debian release 3.1.0-1~swh1Jenkins for Software Heritagejenkins@jenkins-debian1.internal.softwareheritage.orghttps://gitlab.softwareheritage.org/swh/devel/swh-graph/-/tags/debian%2Fupstream%2F3.1.0debian/upstream/3.1.0Upstream version 3.1.0Jenkins for Software Heritagejenkins@jenkins-debian1.internal.softwareheritage.orghttps://gitlab.softwareheritage.org/swh/devel/swh-graph/-/tags/v3.1.0v3.1.0v3.1.0
Misc:
* Prevent timestamps in node properties from being shifted according to the timezone WriteNodeProperties is being run in.
* Add scripts to generate an index for swh-provenance
HTTP API:
* Raise an error when RemoteGraphClient URL is wrong
* Fix DeprecationWarnings in http_client
* http_rpc_server: Remove duplicate request
* Properly handle empty results in HTTP client
CLI:
* compress: Hide traceback on subprocess exception
* Print the end of the log
Rust:
* java utils: add Mph2Cmph to convert from java to rust version of MPH files
* initial skeleton to ship a rust crate
* Implement the first Rust data structures
* codespell: move excluded word list from toml to precommit conf
* Add example BFS implementation
* added Node2Type, its tests, and an bin to convert the .node2swhid.bin file to the new .node2type.bin
Luigi:
* Avoid silencing exceptions thrown by worker threads
* Document how to run Luigi tasks
* luigi: Add missing output files to ExtractNodes.
* luigi, docs: MAPS step does not depend on LLP
* luigi: Increase MPH_LABELS memory allowance
* Tune MPH maximum memory to avoid OOMs
* Honor JAVA_HOME when locating the default Java binary
PopularContentPaths:
* Avoid concurrent updates to ProgressLogger
* Fix missing 'sha1' column for contents with no path
* Skip loading the forward graph
* Allow zstdcat more memory
* Print corrupt records
Internal:
* Prevent flake8 from finding issues in build/ directory
* Migrate to copier-based swh-py-template
* Move the jar under swh/graph so we can get rid of deprecated "date_files" in setup
* Fix documentation buildValentin Lorentzhttps://gitlab.softwareheritage.org/swh/devel/swh-graph/-/tags/debian%2F3.0.1-1_swh1debian/3.0.1-1_swh1swh.graph Debian release 3.0.1-1~swh1Jenkins for Software Heritagejenkins@jenkins-debian1.internal.softwareheritage.orghttps://gitlab.softwareheritage.org/swh/devel/swh-graph/-/tags/debian%2Fupstream%2F3.0.1debian/upstream/3.0.1Upstream version 3.0.1Jenkins for Software Heritagejenkins@jenkins-debian1.internal.softwareheritage.orghttps://gitlab.softwareheritage.org/swh/devel/swh-graph/-/tags/v3.0.1v3.0.1v3.0.1
* origin_contributors: Fix deanonymization and testsValentin Lorentzhttps://gitlab.softwareheritage.org/swh/devel/swh-graph/-/tags/v3.0.0v3.0.0v3.0.0
Breaking changes:
* Use CRLF in output CSV instead of LF
* FindEarliestRevision: switch from TSV to CSV and rename columns
* origin_contributors: Add final task checking integrity of the dataset
* origin_contributors: Change table format/layout to be more compact,
improve performance, add contribution years
Minor changes:
* Match swh-dataset's rename of 'object_type' to 'object_types'
* luigi: Make grpc API globally configurable, and remove the default value
* Bump requirements on protobuf
New derived datasets:
* Import the "blobs datasets" (license, citation) generation script as a
luigi workflow
* Add scripts to find the most popular name(s)/path of content nodes
* Add script to count the total number of paths to any node
* Add ListEarliestRevisions, which computes the earliest revision of all dir/cnt objects at once
New features:
* Export naive_graph_client and remote_graph_client fixtures in pytest plugin
* Add INITIAL_ORIGIN and FORKED_ORIGIN to example dataset
* Make example dataset available for other modules
* Add export_{started,ended}_at to Stats response
* Make Luigi tasks declare their RAM usage (and auto-tune when possible)
* Add support for compressing the graph with only some node types
* Add step stamp to each step's list of output files
* Add support for making the graph dataset name differ from the export name
* webgraph.py: Display log path in error messages
* Add a script to count paths leading to each node
* TopoSort: Default to DFS instead of BFS
* TopoSort: Add support for running forward
* luigi: Add an option to define the maximum RAM used by graph compression
Documentation:
* Move the doc for the example dataset to its own page
* Include the representation of the example dataset in the documentation
* Add some more style to the example dataset graph
* Remove the figure from the example dataset documentation
* docs/compression: Fix inaccuracies in the dependency graph
* DownloadGraphFromS3: Fix incorrect docstring
Bug fixes:
* getMessage: Fix crash on origins with no URL property
* luigi/misc_datasets: Fix _clean_s3_directory() when directory is empty
* Add a flyweight copy() to SwhGraphProperties to make it threadsafe
* FindEarliestRevision: Fix crash on revisions with no committer timestamp
* compressed_graph: Fix data race to .obl files in Transpose command
* Check in constructor instead of size64()
* NodeIdMap: Fix incorrect implementation of size64()
* TopoSort: Fix discard of the last node while looking for leaves
Performance improvements:
* FindEarliestRevision: Run traversals in parallel
* FindEarliestRevision, TopoSort: Use Apache Commons CSV
* luigi/compressed_graph: Tune -Xmx per task
* TopoSort: Various optimizations
Misc:
* assembly: Remove some transitive dependencies from the final uber jar
* luigi: Rewrite compression pipeline as small Luigi tasksValentin Lorentzhttps://gitlab.softwareheritage.org/swh/devel/swh-graph/-/tags/debian%2F2.2.0-2_swh1_bpo10+1debian/2.2.0-2_swh1_bpo10+1swh.graph Debian release 2.2.0-2~swh1~bpo10+1Jenkins for Software Heritagejenkins@jenkins-debian1.internal.softwareheritage.orghttps://gitlab.softwareheritage.org/swh/devel/swh-graph/-/tags/debian%2F2.2.0-2_swh1debian/2.2.0-2_swh1swh.graph Debian release 2.2.0-2~swh1
Nicolas Dandrimontolasd@softwareheritage.orghttps://gitlab.softwareheritage.org/swh/devel/swh-graph/-/tags/debian%2F2.2.0-1_swh1_bpo10+1debian/2.2.0-1_swh1_bpo10+1swh.graph Debian release 2.2.0-1~swh1~bpo10+1Jenkins for Software Heritagejenkins@jenkins-debian1.internal.softwareheritage.orghttps://gitlab.softwareheritage.org/swh/devel/swh-graph/-/tags/debian%2F2.2.0-1_swh1debian/2.2.0-1_swh1swh.graph Debian release 2.2.0-1~swh1Jenkins for Software Heritagejenkins@jenkins-debian1.internal.softwareheritage.orghttps://gitlab.softwareheritage.org/swh/devel/swh-graph/-/tags/debian%2Fupstream%2F2.2.0debian/upstream/2.2.0Upstream version 2.2.0Jenkins for Software Heritagejenkins@jenkins-debian1.internal.softwareheritage.org