Skip to content
Tags give the ability to mark specific points in history as being important
  • v1.0.3
    * luigi.UploadExportToS3: Skip upload of already-uploaded files
    * luigi: Dynamically list directories instead of using object_types
    * luigi: Read meta/export.json instead of relying on stamp files
    * docs/index.rst: Add missing new line at end of file
    * docs/index.rst: Fix sphinx tag name
    * docs: Include module indices only when building standalone package doc
  • v1.0.2
    * exporters/orc: Fix crash on visit status with no type
    * luigi.CreateAthena: Fix validation of DB name
    * luigi.RunExportAll: Default to exporting all formats
  • v1.0.1
    * luigi: Rename classes to be globally unambiguous
    * luigi: Send progress reports to the scheduler
  • v1.0.0
    * Update for swh-objstorage >= 2.0.0
    * docs/athena: Fix value of --location-prefix
    * Fix link to the 2021-03-23 compressed dataset
    * cli: Sort object types to be processed in the right order
    * cli: Increase open file descriptor limit to support 256 open LevelDBs
    * athena: Fix create_table to work with restricted permissions
    * Add luigi tasks
  • v0.3.2
    version 0.3.2
  • v0.3.1
    version 0.3.1
  • v0.3.0
    Release swh.dataset v0.3.0
  • v0.2.2
    version 0.2.2
  • v0.2.1
    version 0.2.1
  • v0.2.0
    v0.2.0 / 2021-04-17
      * athena: pass database name as an attribute
      * docs: Update for new schema
      * Add two ORC tools (orc-merge, orc-print-contents)
      * journalprocessor: only reassign partitions when needed
      * journalprocessor: disable in-partition sharding for LevelDB tests
      * ORC: export missing revision_history table
      * athena: add documentation and licensing info
      * Add athena subcommand to create/query AWS Athena database
      * Move ORC table schema in
      * test_edges: fix mypy error while mocking a method
      * Fix duplicate reference target
      * Swap README.rst and docs/README.rst to match the new template.
      * Include README.rst in the documentation.
      * Add LevelDB backend for exporter node sets
      * ORC exporter: handle releases with empty authors/dates
      * Update exporters.edged to swh.model 1.0
      * ORC exporter: avoid fromtimestamp(), use datetime() from epoch instead
      * Refactor export paths in the base Exporter class
      * ORC exporter: Add unit tests
      * Add ORC exporter
      * Edge exporter: use common remove_pull_requests() function
      * journalprocessor: be resilient to exporter errors
      * Export CLI: add a way to exclude specific object types
      * Namespace exporters in exporters/ dir
      * journalprocessor: don't shadow the object function
      * journalprocessor: fix hashing of origin_visit_status objects
      * journalprocessor: remove comment about deserialize_message overload being a 'hack'
      * tests: fix test_export_origin
      * SQLite on-disk set: disable journalling and synchronous mode
      * journalprocessor: also partition sqlite files by first byte
      * Journal processor: fetch offsets in parallel
      * Exporter documentation fixes
      * Rewrite of the export pipeline using Exporters
      * Graph export: add labels to the export CSV format
      * graph exporter: schema upgrade for origin_visit_status
      * Replace vcversioner with setuptools-scm
      * Run isort after the CLI import changes
  • v0.1.0
  • v0.0.1
    version 0.0.1