Tags give the ability to mark specific points in history as being important
-
v1.10.0
5f529dfc · ·v1.10.0 Documentation: * Advertise 2025-05-18 and 2025-05-18-history-hosting datasets * docs: cost of running the generate subdataset Fullname export: * Add documentation * Add missing `formats` argument to `ExportPersonsTable` * Exclude full names over 32 kB * Make `local_sensitive_export_path` optional * Override luigi.Tasks docstrings to avoid third-party refs * Add missing dependency of ExportGraph on ExportPersonsTable Misc: * Stop using swh.objstorage.objstorage.ID_HASH_ALGO * Add support for resuming from interrupted export * Apply swh-py-template v0.3.4 with copier * pyproject.toml: Bump minimal required Python version to 3.9
-
v1.9.0
f96210ce · ·v1.9.0 - Rename command from `swh dataset` to `swh export` - Switch to swh-journal 2.0 - Switch to model objects in order to .anonymize() revisions and releases - Export new ORC table containing author fullnames and fullnames hashes - Advertise 2023-09-06-history-hosting and 2024-12-06-history-hosting - Document the 2024-05-16-history-hosting dataset Fixes: - Fix hanging in exports with no persons - Make sensitive export path consistent
-
v1.8.0
b5132844 · ·v1.8.0 * Rename from 'swh-dataset' to 'swh-export' * Fix progress reporting * doc: Fix description of 'type' column of 'directory_entry' table * doc: Advertise 2024-12-06 graph export * Add support for type-checking with luigi >= 3.6.0
-
v1.7.1
73de7917 · ·v1.7.1 Doc: * Advertise 2024-08-23-popular-500-python teaser dataset * Advertise 2024-08-23 graph export * The 2019-01-28-popular-3k-python dataset did not have a compressed graph * Fix typo Fixes: * generate_subdataset: Fix crash on excessively large release.message values Internal: * journalprocessor: Fix typo spotted after codespell upgrade * journalprocessor: Fix black formatting * Apply swh-py-template v0.2.3
-
v1.7.0
1c0b61f4 · ·v1.7.0 * orc: Add Bloom filters to some columns * Small fix in test_journal_processor to make mypy happy * Update to swh.model 6.13 introducing ModelObjectType
-
v1.6.0
a8b448a3 · ·v1.6.0 * Fetch all offsets before exporting any object * journalprocessor: Add type annotations * Simplify exporter initialization * Exclude masked objects from export * Advertise 2024-05-16 dataset
-
v1.5.1
3fbed004 · ·v1.5.1 - Publish edges export of the 2020-12-15 dataset - add S3 link for 2020-12-15 compressed graph
-
v1.5.0
3f905e8e · ·v1.5.0 * python: Fix black formatting after bump to 23.1.0 in pre-commit * Add latest blackify to git-blame-ignore-revs * Add a “I will respect the ToS” button to the dataset index * Fix typo in command name * Advertise 2023-09-06-popular-1k teaser dataset
-
v1.3.3
8dd81df3 · ·v1.3.3 * luigi: Work around absence of metadata files in old exports * luigi: Add support for downloading exports in parallel * luigi: Delete all files in the root dir instead of the root dir itself * docs: Add links to Terms of Use and 'How to use SWH data' * Fix Sphinx role (:cls: -> :class:)
-
-
-
-
v1.2.0
a9692a7e · ·v1.2.0 * luigi: Actually check whether RunExportAll is complete. * Advertize 2022-12-07 dataset * luigi: Make AthenaDatabaseTarget check tables exist
-
v1.1.0
5a7bb58f · ·v1.1.0 * Rename 'object_type' to 'object_types' in export metadata * luigi: Clarify the meaning of 'object_types' * Invert symlink and content for the README file * Bump mypy to 1.0.1 and isort to 5.11.5 * Fix tox and pytest config
-
v1.0.3
a01a82fc · ·v1.0.3 * luigi.UploadExportToS3: Skip upload of already-uploaded files * luigi: Dynamically list directories instead of using object_types * luigi: Read meta/export.json instead of relying on stamp files * docs/index.rst: Add missing new line at end of file * docs/index.rst: Fix sphinx tag name * docs: Include module indices only when building standalone package doc