{Content|Directory}Loader: Adapt support for checksums
This moves away the loaders from the 'integrity' fields which is now dealt with lister side. Instead the lister is send a dictionary of hex hashes.
This also improves the loader to check those checksums when retrieving the artifact (content or tarball). Thanks to a bump in the swh.model version, this is now able to deal with sha512 checksums checks as well.
This echoes with the ongoing work for the package loaders [1].
Related to T3781 Depends on !436 (closed)
- [1] !326 (closed)
Migrated from D8587 (view on Phabricator)
Merge request reports
Activity
Build is green
Patch application report for D8587 (id=31000)
Could not rebase; Attempt merge onto f774aba5...
Updating f774aba..1672fed Fast-forward swh/loader/core/loader.py | 212 +++++++++++++++++---- .../https_example.org/archives_dummy-hello.tar.gz | Bin 0 -> 221 bytes swh/loader/core/tests/test_loader.py | 130 ++++++++++++- 3 files changed, 297 insertions(+), 45 deletions(-) create mode 100644 swh/loader/core/tests/data/https_example.org/archives_dummy-hello.tar.gz
Changes applied before test
commit 1672fed607576399dcd1aec67a452606a6427fe6 Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org> Date: Fri Sep 30 15:10:49 2022 +0200 ContentLoader: Fix integrity check Related to [T3781](https://forge.softwareheritage.org/T3781 'view original for T3781 on Phabricator') commit 497f74f3225e4ccf11adce0d6a2bb50b2a471fab Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org> Date: Fri Sep 30 11:54:13 2022 +0200 Add Directory Loader to allow tarball ingestion as Directory In some marginal listing cases (Nix or Guix for now), we can receive raw tarball to ingest. This commit adds a loader to ingest those. The output of the ingestion is a snapshot with 1 branch, one HEAD branch targetting the ingested directory (contained within the tarball). This expects to receive a mandatory 'integrity' field. It is used to check the tarball received out of the origin. This can also optionally receive a list of mirror urls in case the main origin url is no longer available. Those mirror urls are solely used as fallback to retrieve the tarball. Related to [T3781](https://forge.softwareheritage.org/T3781 'view original for T3781 on Phabricator')
See https://jenkins.softwareheritage.org/job/DLDBASE/job/tests-on-diff/927/ for more details.
Build is green
Patch application report for D8587 (id=31015)
Could not rebase; Attempt merge onto f774aba5...
Updating f774aba..3083a3b Fast-forward requirements-swh.txt | 2 +- swh/loader/core/loader.py | 221 +++++++++++++++++---- .../https_example.org/archives_dummy-hello.tar.gz | Bin 0 -> 221 bytes swh/loader/core/tests/test_loader.py | 182 +++++++++++++++-- 4 files changed, 350 insertions(+), 55 deletions(-) create mode 100644 swh/loader/core/tests/data/https_example.org/archives_dummy-hello.tar.gz
Changes applied before test
commit 3083a3b23fa118158a3e1bf54087492086603e63 Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org> Date: Fri Sep 30 15:10:49 2022 +0200 ContentLoader: Improve integrity check support This can deal with sha512 with the new swh.model version. Related to [T3781](https://forge.softwareheritage.org/T3781 'view original for T3781 on Phabricator') commit 497f74f3225e4ccf11adce0d6a2bb50b2a471fab Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org> Date: Fri Sep 30 11:54:13 2022 +0200 Add Directory Loader to allow tarball ingestion as Directory In some marginal listing cases (Nix or Guix for now), we can receive raw tarball to ingest. This commit adds a loader to ingest those. The output of the ingestion is a snapshot with 1 branch, one HEAD branch targetting the ingested directory (contained within the tarball). This expects to receive a mandatory 'integrity' field. It is used to check the tarball received out of the origin. This can also optionally receive a list of mirror urls in case the main origin url is no longer available. Those mirror urls are solely used as fallback to retrieve the tarball. Related to [T3781](https://forge.softwareheritage.org/T3781 'view original for T3781 on Phabricator')
See https://jenkins.softwareheritage.org/job/DLDBASE/job/tests-on-diff/932/ for more details.
Build has FAILED
Patch application report for D8587 (id=31027)
Could not rebase; Attempt merge onto f774aba5...
Updating f774aba..da7cc37 Fast-forward requirements-swh.txt | 2 +- swh/loader/core/loader.py | 234 +++++++++++++++++---- .../https_example.org/archives_dummy-hello.tar.gz | Bin 0 -> 221 bytes swh/loader/core/tests/test_loader.py | 194 +++++++++++++++-- 4 files changed, 372 insertions(+), 58 deletions(-) create mode 100644 swh/loader/core/tests/data/https_example.org/archives_dummy-hello.tar.gz
Changes applied before test
commit da7cc372fe5a9cdf139de7cf51a2c6cb6dc8b8ed Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org> Date: Fri Sep 30 15:10:49 2022 +0200 {Content|Directory}Loader: Adapt support for checksums This moves away the loaders from the 'integrity' fields which is now dealt with lister side. Instead the lister is send a dictionary of hex hashes. This also improves the loader to check those checksums when retrieving the artifact (content or tarball). Thanks to a bump in the swh.model version, this is now able to deal with sha512 checksums checks as well. Related to [T3781](https://forge.softwareheritage.org/T3781 'view original for T3781 on Phabricator') commit 497f74f3225e4ccf11adce0d6a2bb50b2a471fab Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org> Date: Fri Sep 30 11:54:13 2022 +0200 Add Directory Loader to allow tarball ingestion as Directory In some marginal listing cases (Nix or Guix for now), we can receive raw tarball to ingest. This commit adds a loader to ingest those. The output of the ingestion is a snapshot with 1 branch, one HEAD branch targetting the ingested directory (contained within the tarball). This expects to receive a mandatory 'integrity' field. It is used to check the tarball received out of the origin. This can also optionally receive a list of mirror urls in case the main origin url is no longer available. Those mirror urls are solely used as fallback to retrieve the tarball. Related to [T3781](https://forge.softwareheritage.org/T3781 'view original for T3781 on Phabricator')
Link to build: https://jenkins.softwareheritage.org/job/DLDBASE/job/tests-on-diff/937/ See console output for more information: https://jenkins.softwareheritage.org/job/DLDBASE/job/tests-on-diff/937/console
Build has FAILED
Build error unrelated to this diff [1] locally ok with my old tox generation [2] Maybe a new upstream version bump on importlib-metadata?
- [1]
... importlib-metadata==5.0.0 ... 11:37:49 File "/var/lib/jenkins/workspace/DLDBASE/tests-on-diff/.tox/py3/lib/python3.7/site-packages/kombu/utils/compat.py", line 82, in entrypoints 11:37:49 for ep in importlib_metadata.entry_points().get(namespace, []) 11:37:49 AttributeError: 'EntryPoints' object has no attribute 'get'
- [2]
$ tox ... importlib-metadata==4.12.0 ... ================================================================ 272 passed, 122 warnings in 62.62s (0:01:02) ================================================================
Build has FAILED
Patch application report for D8587 (id=31027)
Could not rebase; Attempt merge onto f774aba5...
Updating f774aba..da7cc37 Fast-forward requirements-swh.txt | 2 +- swh/loader/core/loader.py | 234 +++++++++++++++++---- .../https_example.org/archives_dummy-hello.tar.gz | Bin 0 -> 221 bytes swh/loader/core/tests/test_loader.py | 194 +++++++++++++++-- 4 files changed, 372 insertions(+), 58 deletions(-) create mode 100644 swh/loader/core/tests/data/https_example.org/archives_dummy-hello.tar.gz
Changes applied before test
commit da7cc372fe5a9cdf139de7cf51a2c6cb6dc8b8ed Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org> Date: Fri Sep 30 15:10:49 2022 +0200 {Content|Directory}Loader: Adapt support for checksums This moves away the loaders from the 'integrity' fields which is now dealt with lister side. Instead the lister is send a dictionary of hex hashes. This also improves the loader to check those checksums when retrieving the artifact (content or tarball). Thanks to a bump in the swh.model version, this is now able to deal with sha512 checksums checks as well. Related to [T3781](https://forge.softwareheritage.org/T3781 'view original for T3781 on Phabricator') commit 497f74f3225e4ccf11adce0d6a2bb50b2a471fab Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org> Date: Fri Sep 30 11:54:13 2022 +0200 Add Directory Loader to allow tarball ingestion as Directory In some marginal listing cases (Nix or Guix for now), we can receive raw tarball to ingest. This commit adds a loader to ingest those. The output of the ingestion is a snapshot with 1 branch, one HEAD branch targetting the ingested directory (contained within the tarball). This expects to receive a mandatory 'integrity' field. It is used to check the tarball received out of the origin. This can also optionally receive a list of mirror urls in case the main origin url is no longer available. Those mirror urls are solely used as fallback to retrieve the tarball. Related to [T3781](https://forge.softwareheritage.org/T3781 'view original for T3781 on Phabricator')
Link to build: https://jenkins.softwareheritage.org/job/DLDBASE/job/tests-on-diff/938/ See console output for more information: https://jenkins.softwareheritage.org/job/DLDBASE/job/tests-on-diff/938/console
Maybe a new upstream version bump on importlib-metadata?
[1]
... importlib-metadata==5.0.0
Definitely something about that release. I reproduced it in the docker dev environment. Adding a workaround in requirements.txt (swh.{lister|loader}: importlib_metadata!=5.0.0). Those containers are then able to start ^.
mentioned in merge request !438 (closed)
Some references in the commit message have been migrated:
- T3781 is now swh/meta#3781 (closed)
Build is green
Patch application report for D8587 (id=31058)
Rebasing onto dbf7f3dc...
Current branch diff-target is up to date.
Changes applied before test
commit 39c33a66c27c030c7ae3cfba4a393c2ced468fbc Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org> Date: Fri Sep 30 15:10:49 2022 +0200 {Content|Directory}Loader: Adapt support for checksums This adapts the content/directory loader implementations to use directly a checksums dict which is now sent by the listers. This improves the loader to check those checksums when retrieving the artifact (content or tarball). Thanks to a bump in the swh.model version, this is now able to deal with sha512 checksums checks as well. This also aligns with the current package loaders which now are also checking the integrity of the tarballs they ingest. Related to [T3781](https://forge.softwareheritage.org/T3781 'view original for T3781 on Phabricator')
See https://jenkins.softwareheritage.org/job/DLDBASE/job/tests-on-diff/943/ for more details.
mentioned in issue swh/meta#3781 (closed)