From 8093a85775e41e18db7615be5e0e6e8e53fe64c3 Mon Sep 17 00:00:00 2001 From: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Tue, 17 Oct 2023 12:13:41 +0200 Subject: [PATCH 1/2] docs/dataset: Add anchors --- docs/graph/dataset.rst | 30 ++++++++++++++++++++++++++++++ 1 file changed, 30 insertions(+) diff --git a/docs/graph/dataset.rst b/docs/graph/dataset.rst index 820c937..db340a6 100644 --- a/docs/graph/dataset.rst +++ b/docs/graph/dataset.rst @@ -122,12 +122,16 @@ Summary of dataset versions - ✔ - ✔ + Full graph datasets ------------------- Because of their size, some of the latest datasets are only available for downside from Amazon S3. + +.. _graph-dataset-2023-09-06: + 2023-09-06 ~~~~~~~~~~ @@ -143,6 +147,9 @@ A full export of the graph dated from September 2023 - **Total size**: 8.8 TiB - **S3**: ``s3://softwareheritage/graph/2023-09-06/compressed`` + +.. _graph-dataset-2022-12-07: + 2022-12-07 ~~~~~~~~~~ @@ -166,6 +173,9 @@ A full export of the graph dated from December 2022 - **Total size**: 1 TiB - **S3**: ``s3://softwareheritage/graph/2022-12-07-history-hosting/compressed`` + +.. _graph-dataset-2022-04-25: + 2022-04-25 ~~~~~~~~~~ @@ -182,6 +192,8 @@ A full export of the graph dated from April 2022 - **S3**: ``s3://softwareheritage/graph/2022-04-25/compressed`` +.. _graph-dataset-2021-03-23: + 2021-03-23 ~~~~~~~~~~ @@ -199,6 +211,8 @@ A full export of the graph dated from March 2021. - **S3**: ``s3://softwareheritage/graph/2021-03-23/compressed`` +.. _graph-dataset-2020-12-15: + 2020-12-15 ~~~~~~~~~~ @@ -211,6 +225,8 @@ compressed representation. <https://annex.softwareheritage.org/public/dataset/graph/2020-12-15/compressed/>`_ +.. _graph-dataset-2020-05-20: + 2020-05-20 ~~~~~~~~~~ @@ -225,6 +241,8 @@ compressed representation. <https://annex.softwareheritage.org/public/dataset/graph/2020-05-20/compressed/>`_ +.. _graph-dataset-2019-01-28: + 2019-01-28 ~~~~~~~~~~ @@ -253,6 +271,9 @@ Teaser datasets If the above datasets are too big, we also provide "teaser" datasets that can get you started and have a smaller size fingerprint. + +.. _graph-dataset-2021-03-23-popular-3k-python: + 2021-03-23-popular-3k-python ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ @@ -281,6 +302,8 @@ was the following: - **S3**: ``s3://softwareheritage/graph/2021-03-23-popular-3k-python/compressed/`` +.. _graph-dataset-2020-12-15-gitlab-all: + 2020-12-15-gitlab-all ~~~~~~~~~~~~~~~~~~~~~ @@ -292,6 +315,9 @@ Available in compressed graph format. - **URL**: `/graph/2020-12-15-gitlab-all/compressed/ <https://annex.softwareheritage.org/public/dataset/graph/2020-12-15-gitlab-all/compressed/>`_ + +.. _graph-dataset-2020-12-15-gitlab-100k: + 2020-12-15-gitlab-100k ~~~~~~~~~~~~~~~~~~~~~~ @@ -304,6 +330,8 @@ exported in December 2020. Available in compressed graph format. <https://annex.softwareheritage.org/public/dataset/graph/2020-12-15-gitlab-100k/compressed/>`_ +.. _graph-dataset-2019-01-28-popular-4k: + 2019-01-28-popular-4k ~~~~~~~~~~~~~~~~~~~~~ @@ -325,6 +353,8 @@ was the following: <https://annex.softwareheritage.org/public/dataset/graph/2019-01-28-popular-4k/parquet/>`_ - **S3**: ``s3://softwareheritage/graph/2019-01-28-popular-4k/parquet/`` +.. _graph-dataset-2019-01-28-popular-3k-python: + 2019-01-28-popular-3k-python ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -- GitLab From bcc2ff096710cb7d72ae993a9f6a204d48dcbe62 Mon Sep 17 00:00:00 2001 From: Valentin Lorentz <vlorentz@softwareheritage.org> Date: Tue, 17 Oct 2023 12:14:56 +0200 Subject: [PATCH 2/2] Add erratum on timezones in the 2022-12-07 compressed graph --- docs/graph/dataset.rst | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/docs/graph/dataset.rst b/docs/graph/dataset.rst index db340a6..45b9d6d 100644 --- a/docs/graph/dataset.rst +++ b/docs/graph/dataset.rst @@ -173,6 +173,10 @@ A full export of the graph dated from December 2022 - **Total size**: 1 TiB - **S3**: ``s3://softwareheritage/graph/2022-12-07-history-hosting/compressed`` +- **Erratum**: + + - `author and committer timestamps were shifted back 1 or 2 hours, based on the Europe/Paris timezone <https://gitlab.softwareheritage.org/swh/devel/swh-graph/-/issues/4788>`_ + .. _graph-dataset-2022-04-25: -- GitLab