Skip to content
Snippets Groups Projects

Tags

Tags give the ability to mark specific points in history as being important
  • Name
  • Oldest updated
  • Updated date
  • Latest version
  • Oldest version
  • v3.2.1
    v3.2.1
    
    * Remove unnecessary copy of contentIsSkipped
  • v3.2.0
    v3.2.0
    
    - Migrate to copier, PEP420 and pyproject.toml based packaging.
    - Provenance: add timestamps to directory_frontier and
      content_in_revisions_without_frontiers.
    
    Unverified
  • v3.1.0
    v3.1.0
    
    Misc:
    
    * Prevent timestamps in node properties from being shifted according to the timezone WriteNodeProperties is being run in.
    * Add scripts to generate an index for swh-provenance
    
    HTTP API:
    
    * Raise an error when RemoteGraphClient URL is wrong
    * Fix DeprecationWarnings in http_client
    * http_rpc_server: Remove duplicate request
    * Properly handle empty results in HTTP client
    
    CLI:
    
    * compress: Hide traceback on subprocess exception
    * Print the end of the log
    
    Rust:
    
    * java utils: add Mph2Cmph to convert from java to rust version of MPH files
    * initial skeleton to ship a rust crate
    * Implement the first Rust data structures
    * codespell: move excluded word list from toml to precommit conf
    * Add example BFS implementation
    * added Node2Type, its tests, and an bin to convert the .node2swhid.bin file to the new .node2type.bin
    
    Luigi:
    
    * Avoid silencing exceptions thrown by worker threads
    * Document how to run Luigi tasks
    * luigi: Add missing output files to ExtractNodes.
    * luigi, docs: MAPS step does not depend on LLP
    * luigi: Increase MPH_LABELS memory allowance
    * Tune MPH maximum memory to avoid OOMs
    * Honor JAVA_HOME when locating the default Java binary
    
    PopularContentPaths:
    
    * Avoid concurrent updates to ProgressLogger
    * Fix missing 'sha1' column for contents with no path
    * Skip loading the forward graph
    * Allow zstdcat more memory
    * Print corrupt records
    
    Internal:
    
    * Prevent flake8 from finding issues in build/ directory
    * Migrate to copier-based swh-py-template
    * Move the jar under swh/graph so we can get rid of deprecated "date_files" in setup
    * Fix documentation build
  • v3.0.1
    v3.0.1
    
    * origin_contributors: Fix deanonymization and tests
  • v3.0.0
    v3.0.0
    
    Breaking changes:
    
    * Use CRLF in output CSV instead of LF
    * FindEarliestRevision: switch from TSV to CSV and rename columns
    * origin_contributors: Add final task checking integrity of the dataset
    * origin_contributors: Change table format/layout to be more compact,
      improve performance, add contribution years
    
    Minor changes:
    
    * Match swh-dataset's rename of 'object_type' to 'object_types'
    * luigi: Make grpc API globally configurable, and remove the default value
    * Bump requirements on protobuf
    
    New derived datasets:
    
    * Import the "blobs datasets" (license, citation) generation script as a
      luigi workflow
    * Add scripts to find the most popular name(s)/path of content nodes
    * Add script to count the total number of paths to any node
    * Add ListEarliestRevisions, which computes the earliest revision of all dir/cnt objects at once
    
    New features:
    
    * Export naive_graph_client and remote_graph_client fixtures in pytest plugin
    * Add INITIAL_ORIGIN and FORKED_ORIGIN to example dataset
    * Make example dataset available for other modules
    * Add export_{started,ended}_at to Stats response
    * Make Luigi tasks declare their RAM usage (and auto-tune when possible)
    * Add support for compressing the graph with only some node types
    * Add step stamp to each step's list of output files
    * Add support for making the graph dataset name differ from the export name
    * webgraph.py: Display log path in error messages
    * Add a script to count paths leading to each node
    * TopoSort: Default to DFS instead of BFS
    * TopoSort: Add support for running forward
    * luigi: Add an option to define the maximum RAM used by graph compression
    
    Documentation:
    
    * Move the doc for the example dataset to its own page
    * Include the representation of the example dataset in the documentation
    * Add some more style to the example dataset graph
    * Remove the figure from the example dataset documentation
    * docs/compression: Fix inaccuracies in the dependency graph
    * DownloadGraphFromS3: Fix incorrect docstring
    
    Bug fixes:
    
    * getMessage: Fix crash on origins with no URL property
    * luigi/misc_datasets: Fix _clean_s3_directory() when directory is empty
    * Add a flyweight copy() to SwhGraphProperties to make it threadsafe
    * FindEarliestRevision: Fix crash on revisions with no committer timestamp
    * compressed_graph: Fix data race to .obl files in Transpose command
    * Check in constructor instead of size64()
    * NodeIdMap: Fix incorrect implementation of size64()
    * TopoSort: Fix discard of the last node while looking for leaves
    
    Performance improvements:
    
    * FindEarliestRevision: Run traversals in parallel
    * FindEarliestRevision, TopoSort: Use Apache Commons CSV
    * luigi/compressed_graph: Tune -Xmx per task
    * TopoSort: Various optimizations
    
    Misc:
    
    * assembly: Remove some transitive dependencies from the final uber jar
    * luigi: Rewrite compression pipeline as small Luigi tasks
  • v2.3.0
    v2.3.0
    
    * Add tools/swh-graph-lookup/swh-graph-lookup.py
    * Add Luigi workflow to generate the compressed graph
    * Add scripts to generate a topological order and list origin
      contributors
    * Fix minor crashes
  • v2.2.0
    v2.2.0
    
    * FindEarliestRevision: Add earliest_ts and rev_occurrences columns
    * pre-commit, tox: Bump pre-commit, codespell, black and flake8
    * docs/grpc-api.rst: Update to match to current code
    * docs/grpc-api.rst: Add Python examples
  • v2.1.2
    v2.1.2
    
    * Apply 'max_matching_nodes' restriction after 'return_types' filter
    * http_client: Add max_matching_nodes parameter to visit_nodes()
  • v2.1.1
    Release swh.graph 2.1.1
    
    - Don't ignore the port specified on the swh graph grpc-serve command
    line
    - add debug logging in the grpc-serve initialization
    
    Unverified
  • v2.1.0
    v2.1.0
    
    * Add max_matching_nodes parameter to /leaves
    * Add field 'max_matching_nodes' to TraversalRequest
    * Exclude protobuf 4.21.*
  • v2.0.0
    Release swh.graph 2.0.0
    
     - Rename modules to unconfuse them
     - Allow separate deployment of the swh.graph grpc server
    
    Unverified
  • v1.0.2
    v1.0.2
    
    * Return HTTP 503 on AioRpcError
    * Remove documentation of deleted endpoint
    * rpc_server: use shlex.quote() to print command
    * Minor test improvements
  • v1.0.1
    version 1.0.1
    
    Unverified
  • v1.0.0
    version 1.0.0
    
    Unverified
  • v0.6.1
    aaed82fc · Documentation overhaul ·
    version 0.6.1
    
    Unverified
  • v0.5.2
    v0.5.2
    
    * Increase retries for random walks from 5 to 10
    * naive_client: Add documentation and doctest to initialize it.
    * Add support for CoreSWHID/ExtendedSWHID when building naive.Graph
  • v0.6.0
    v0.6.0
    
    * Increase retries for random walks from 5 to 10
    * Refactor Graph class in SwhUnidirectionalGraph and SwhBidirectionalGraph
    * Use AllowedNodesTest to implement return type filtering
    * Add graph dataset reading classes (orc+edges)
    * Remove unused/buggy TopologicalSort
    * Add support for CoreSWHID/ExtendedSWHID when building naive.Graph
  • v0.5.1
    version 0.5.1
    
    Unverified
  • v0.5.0
    v0.5.0
    
    - Delay import to allow proper debian packaging of the client code
    - bytes_to_str: Format strings directly, instead of constructing ExtendedSWHID
    - StreamingGraphView: Buffer lines before writing
    - cli: Fix rpc-serve to actually used the path given as argument
    - Fix typo in HTTP index
    - server: Define make_app_from_configfile so it can be ran by gunicorn
    - LabelMapBuilder: mmap order file, use less RAM
    - ConnectedComponents: add --by-origins
    - Bump fastutil version
    - git2graph: bugfix: traverse all nodes even when edges are not traversed
    - tools/dir2graph: new tool to convert a local dir to nodes/edges files
    - FindEarliestRevision: make timing optional with a dedidcated CLI flag
    
    Unverified
  • v0.4.1
    20717234 · Merge branch 'topology' ·
    version 0.4.1
    
    Unverified