Skip to content

Tags

Tags give the ability to mark specific points in history as being important
  • v6.2.0
    3216ab1b · Publish v6.2.0 ·
    v6.2.0
    
    Possibly breaking changes:
    
    * Make Stats::avg_locality optional (gRPC protocol)
    * Remove support for Fmph (library)
    
    New features (Rust library):
    
    * Relax bound on load_properties closure
    
    Fixes (gRPC server):
    
    * Make Stats::avg_locality optional
    * Generate graph.stats as part of graph compression
    
    New features (gRPC server):
    
    * Add StatsD metric traversal_returned_nodes_total
    
    Fixes (misc):
    
    * origin-contributors: Fix crash on origins with unknown URL
    * Update scancode
    * compressed_graph: Cleanup new temporary files before upload
    
    New features (misc):
    
    * Add swh-graph-convert to serialize small graphs to JSON
    * provenance: Make executables writing Parquet files reusable as a library
    * Add aggregated content dataset
    * Add 'generations' index as a compact and faster alternative to the topological order
    * Automate subdataset creation
    
    Improvements:
    
    * Update DeanonymizeOriginContributors to the file layout used by >= 2024-08 graphs
    
    Documentation:
    
    * Add examples to GraphBuilder documentation
    * Add missing descriptions of top-level modules
    * Document other services implemented by the gRPC server
  • v6.1.0
    14cf465a · Publish v6.1.0 ·
    v6.1.0
    
    New features (gRPC server):
    
    * Add Sentry integration
    * Add support for the gRPC Health Checking Protocol
    * Export StatsD metrics
    * Log time spent streaming/traversing
    
    Other new features:
    * blobs_dataset: Add a node-filter that uses a known list of SWHIDs
    
    CLI fixes:
    
    * `swh graph download`:
        * Fix download of export.json
        * Stream zstd files to zstdmt instead of writing a temporary file
    * `swh graph grpc-serve`: Allow graph path to be optional
    
    Other fixes:
    
    * grpc-server: Log requests resulting in gRPC errors
    
    Tweaks:
    
    * provenance: Increase row group size
    * DownloadBlobs: Default to S3 instead of archive.softwareheritage.org
    * Update Tonic
    
    Documentation:
    
    * Document that the swh-graph crate's executables are needed to reindex
  • v6.0.0
    94fbe8f8 · Publish v6.0.0 ·
    v6.0.0
    
    Breaking changes:
    
    * Move gRPC server to its own crate (swh-graph-grpc-server)
    * Move dataset-writer to its own crate (dataset-writer)
    * Switch from stderrlog to env_logger (`-vv` on executables is now the
      default, and log level is tuned with `RUST_LOG=debug`)
    * Switch MPH algorithms for new compressed graphs from GOV to PTHash
    * Move the FCL from java_compat/fcl.rs to front_coded_list/read.rs
    * Removed all Java code
    
    New features:
    
    * Add support for reading graphs with PTHash instead of GOV
    * Switch from stderrlog to env_logger
    * Add load_full() shorthand for loading a bidirectional labeled graph with all properties
    * Replace load_{uni,bi}directional with Swh{Uni,Bi}directionalGraph::new
    * Add SwhFullGraph trait "alias" to simplify types
    * Prioritize target/ over global installs when running Rust executables
    * stdlib: add find_head_revision()
    * Log gRPC requests
    
    Soundness fixes (Rust)
    
    * NodeBuilder: Remove empty data struct when another node type's data is requested
    * Replace Write::write() with Write::write_all()
    
    Soundness fixes (Python):
    
    * Add missing "node." prefix to FieldMasks passed to FindPath{To,Between}
    
    Performance improvements:
    
    * Parallelize merging in par_sort_arcs
    * EDGE_LABELS: Remove unnecessary dependencies
    * PERMUTE_LLP: The base graph does not need to be loaded in memory
    * Replace GNU sort with custom sorters
    * Switch swh-graph-extract to jemalloc
    * Don't materialize vec of values while building MPHs
    * Update default gammas
    
    Documentation:
    
    * Update compression documentation
    * Improve error when .labeloffsets file is missing
    * Make quickstart and gRPC documentations easier to understand
  • v5.1.1
    f101d652 · Publish v5.1.1 ·
    v5.1.1
    
    * Add missing swhgraph.proto file to the crate
  • v5.1.0
    4102a5ea · Publish v5.1.0 ·
    v5.1.0
    
    No changes to the Python code in this release, only Rust and Javz.
    
    Additions (Rust):
    
    * stdlib: add fs_ls_tree, implementing a recursive ls of a FS tree
    * Switch NODE_PROPERTIES to Rust implementation
    * Switch EXTRACT_PERSONS to Rust implementation
    * Add 'compare-graphs' tool
    
    Soundness fixes (Rust):
    
    * Check CSV inputs are not a single non-header line, when expecting a header
    * Switch default port of the Rust gRPC server to 50091
    
    Compilation fixes (Rust):
    
    * Lower MSRV to 1.79
    
    Documentation:
    
    * rust doc: Move Crash Course to its own page + add Tutorial
    * Update gRPC documentation to run the Rust implementation
    * Fix documentation of DirEntry::permission
    * Fix docstring to mention inputs are CSV
    * Add diagnostic hint for when properties are not loaded
    
    Internal changes:
    
    * stdlib: port find_latest_snp() to typed labeled successor iterator
    * Remove unused Java utils and SWH-specific Java compression code
    * Make DefaultUnderlyingGraph and SwhLabeling newtypes
    * Fix warnings
    * Use ar_row's own decimal -> timestamp decoding
    * pytest.ini: Ignore Rust's target/
  • v5.0.0
    73120521 · Publish v5.0.0 ·
    v5.0.0
    
    No changes to the Python code in this release, only Rust.
    
    This is the first release of swh-graph on crates.io.
    
    Breaking changes (Rust):
    
    * Rename "labelled" -> "labeled", for consistency with webgraph
    * Rename SWHType to NodeType
    * Moved find_root_dir to the "stdlib"
    * Make labeled_{predecessor,successors} return typed labels
    * Rename Rust all "::new_with" constructors to just "::with"
    * Rust: implement FromStr (rather than TryInto<&str>) for NodeType
    
    Additions (Rust):
    
    * Added a "stdlib" of algorithms (find_latest_snp, path
      resolution functions, generic node visit operator)
    * GraphBuilder: Add support for Visit (and Branch) labels
    * Add support for multi-arcs in VecGraph and GraphBuilder
    * Add `.flatten_labels()` on labeled arc iterator
    * Rewrite EDGE_LABELS compression step in Rust
    * Subgraph: add builder based on NodeConstraint
    * Add tool to dump edges from a compressed graph
    
    Improvements (Rust):
    
    * Switch to released versions of webgraph and ar_row
    * Various performance improvements in properties compression
    * GraphBuilder: add BuiltGraph type alias for done() return type
    * Rewrite CountPaths to produce sharded Parquet files directly
    * Small improvement of the rust executable dir handling in check_config()
    
    Soundness fixes (Rust):
    
    * Fixed all bugs NODE_PROPERTIES compression (or at least, have a strict
      subset of the Java implementation's bugs). In particular:
      * properties: Update arrays atomically and remove LongArrayBitVector
    * Remove redundant 'datasets' path component
    * Rust: use create_new() to create on-disk maps
    * Subgraph: fix has_arc(), which was transposing the passed arc
    
    Crash/compilation fixes (Rust):
    
    * Fix detection of missing node2type.bin/content.is_skipped.bits file
    * Fix loading node2type.bin larger than 2^31 * 8 bytes
    * Fix debian requirements
    * blobs_dataset: Make Datafusion write to a single file
    * root_directory.rs: do not require (unneeded) SwhBackwardGraph trait
    
    Documentation:
    
    * Subgraph: document that num_nodes/arcs() return non-filtered values
    * Add a 'minimal build for tests' section in rust/README.md
    * review *.rs file headers: add missing Copyright decl
  • v4.0.0
    v4.0.0
    
    Breaking:
    
    * Switch default gRPC server from Java to Rust
    * Remove sorted list of nodes from graph output
    
    Bug fixes:
    
    * Fix pyo3 extension build on recent Maturin versions
    * compute-directory-frontier: Use correct timestamp to decide if node is frontier
    
    Ergonomics:
    
    * Add CLI to regenerate graph files for the current version
    * Add CLI to download the graph (and decompress .zst files)
    
    Performance optimizations:
    
    * Start BFS from (sorted) origins instead of random nodes
    * http_rpc_server.VisitEdgesView: Do not fetch edge labels
    * provenance: Switch from glibc malloc to jemalloc
    * contents-in-directories: Do not traverse nodes not reachable from a frontier directory
    * Make dependency on swh-storage optional
    
    Rust rewrite:
    
    * Look for rust swh-graph-grpc-serve execuctable in user's PATH and 'rust_executable_dir' config
    * EXTRACT_NODES, MAPS, COMPOSE_ORDERS, TRANSPOSE: Switch to Rust implementation
    * Rewrite ListOriginContributors in Rust
    * Rewrite ListFilesByName in Rust
    * Rewrite MPHTranslate in Rust
    * Add support for computing node ids from SWHIDs
    * Update list of temporary files to clean after compression is done
    * java: Add support for reading is_skipped.bits and node2type.bin
      (produced by the Rust compression pipeline)
    
    Other improvements:
    
    * CountPaths: Add support for 2024-05-16 graph
    * provenance: Add support for '--node-filter all'
    * Document how to get the URL of an origin node
    * Document dependency on protoc
    * grpc-server: Add --masked-nodes option
    * pytest_plugin: Move server config to its own fixture
    * model: adapt to the renaming of model.TargetType to model.SnapshotTargetType
  • v3.4.0
    v3.4.0
    
    - Add MPHF to the Python extension
    - provenance:
      - Add support for Hive partitioning on sha1_git column
      - Remove remaining references to topological_order_dir
      - Replace .csv.zst output with .parquet
      - Compress paths with zstd
      - Order rows in final files by the column they will be queried on
    - Move find_frontiers_from_root_directory to frontier-directories-in-revisions
    - Make dependency on 'arrow' optional, even when 'dataset-writer' is
      used
    - docs: Fix reference to SWHIDs
    - webgraph: Set RUST_MIN_STACK to avoid stack overflows
    - naive client: add max_matching_nodes for neighbors method
    - Add timestamp and is_full_visit bit to ori->snp edges' label
    - Update webgraph
    - Fix task dependencies
    - Switch LLP compression step to use the Rust implementation
    - Fix sort_batch_size and input_batch_size values being swapped in
      'permute' and 'transpose' commands
    - transform: Log computed configuration
    - Rewrite PopularContentPaths in Rust
    - Misc. fixes and code improvements
    
  • v3.3.1
    v3.3.1
    
    * permute-and-symmetrize: Sort arc lists in parallel
    * rust and java: Enforce max_edges *before* traversing edges
  • v3.3.0
    v3.3.0
    
    * Rewrite most dataset generation scripts from Java to Rust
    * Make most dataset generation scripts produce sharded files instead of
      a single .csv.zst that cannot be processed in parallel
    * Improve ergonomics of Rust library
    * Switch some early compression steps to Rust (BV, BFS,
      {PERMUTE,TRANSPOSE,SIMPLIFY}_BFS)
    * Replace Athena with datafusion
    * Finish Rust rewrite of the gRPC server (but Java remains the default)
  • debian/3.2.1-1_swh1
    swh.graph Debian release 3.2.1-1~swh1
  • debian/upstream/3.2.1
    Upstream version 3.2.1
  • v3.2.1
    v3.2.1
    
    * Remove unnecessary copy of contentIsSkipped
  • debian/3.2.0-1_swh1
    swh.graph Debian release 3.2.0-1~swh1
  • debian/upstream/3.2.0
    Upstream version 3.2.0
  • v3.2.0
    v3.2.0
    
    - Migrate to copier, PEP420 and pyproject.toml based packaging.
    - Provenance: add timestamps to directory_frontier and
      content_in_revisions_without_frontiers.
    
  • debian/3.1.0-1_swh1
    swh.graph Debian release 3.1.0-1~swh1
  • debian/upstream/3.1.0
    Upstream version 3.1.0
  • v3.1.0
    159f5343 · Print the end of the log ·
    v3.1.0
    
    Misc:
    
    * Prevent timestamps in node properties from being shifted according to the timezone WriteNodeProperties is being run in.
    * Add scripts to generate an index for swh-provenance
    
    HTTP API:
    
    * Raise an error when RemoteGraphClient URL is wrong
    * Fix DeprecationWarnings in http_client
    * http_rpc_server: Remove duplicate request
    * Properly handle empty results in HTTP client
    
    CLI:
    
    * compress: Hide traceback on subprocess exception
    * Print the end of the log
    
    Rust:
    
    * java utils: add Mph2Cmph to convert from java to rust version of MPH files
    * initial skeleton to ship a rust crate
    * Implement the first Rust data structures
    * codespell: move excluded word list from toml to precommit conf
    * Add example BFS implementation
    * added Node2Type, its tests, and an bin to convert the .node2swhid.bin file to the new .node2type.bin
    
    Luigi:
    
    * Avoid silencing exceptions thrown by worker threads
    * Document how to run Luigi tasks
    * luigi: Add missing output files to ExtractNodes.
    * luigi, docs: MAPS step does not depend on LLP
    * luigi: Increase MPH_LABELS memory allowance
    * Tune MPH maximum memory to avoid OOMs
    * Honor JAVA_HOME when locating the default Java binary
    
    PopularContentPaths:
    
    * Avoid concurrent updates to ProgressLogger
    * Fix missing 'sha1' column for contents with no path
    * Skip loading the forward graph
    * Allow zstdcat more memory
    * Print corrupt records
    
    Internal:
    
    * Prevent flake8 from finding issues in build/ directory
    * Migrate to copier-based swh-py-template
    * Move the jar under swh/graph so we can get rid of deprecated "date_files" in setup
    * Fix documentation build
  • debian/3.0.1-1_swh1
    swh.graph Debian release 3.0.1-1~swh1