Skip to content
Snippets Groups Projects

{Content|Directory}Loader: Adapt support for checksums

This moves away the loaders from the 'integrity' fields which is now dealt with lister side. Instead the lister is send a dictionary of hex hashes.

This also improves the loader to check those checksums when retrieving the artifact (content or tarball). Thanks to a bump in the swh.model version, this is now able to deal with sha512 checksums checks as well.

This echoes with the ongoing work for the package loaders [1].

Related to T3781 Depends on !436 (closed)


Migrated from D8587 (view on Phabricator)

Merge request reports

Loading
Loading

Activity

Filter activity
  • Approvals
  • Assignees & reviewers
  • Comments (from bots)
  • Comments (from users)
  • Commits & branches
  • Edits
  • Labels
  • Lock status
  • Mentions
  • Merge request status
  • Tracking
  • Build is green

    Patch application report for D8587 (id=31000)

    Could not rebase; Attempt merge onto f774aba5...

    Updating f774aba..1672fed
    Fast-forward
     swh/loader/core/loader.py                          | 212 +++++++++++++++++----
     .../https_example.org/archives_dummy-hello.tar.gz  | Bin 0 -> 221 bytes
     swh/loader/core/tests/test_loader.py               | 130 ++++++++++++-
     3 files changed, 297 insertions(+), 45 deletions(-)
     create mode 100644 swh/loader/core/tests/data/https_example.org/archives_dummy-hello.tar.gz
    Changes applied before test
    commit 1672fed607576399dcd1aec67a452606a6427fe6
    Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
    Date:   Fri Sep 30 15:10:49 2022 +0200
    
        ContentLoader: Fix integrity check
        
        Related to [T3781](https://forge.softwareheritage.org/T3781 'view original for T3781 on Phabricator')
    
    commit 497f74f3225e4ccf11adce0d6a2bb50b2a471fab
    Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
    Date:   Fri Sep 30 11:54:13 2022 +0200
    
        Add Directory Loader to allow tarball ingestion as Directory
        
        In some marginal listing cases (Nix or Guix for now), we can receive raw tarball to
        ingest. This commit adds a loader to ingest those. The output of the ingestion is a
        snapshot with 1 branch, one HEAD branch targetting the ingested directory (contained
        within the tarball).
        
        This expects to receive a mandatory 'integrity' field. It is used to check the tarball
        received out of the origin.
        
        This can also optionally receive a list of mirror urls in case the main origin url is no
        longer available. Those mirror urls are solely used as fallback to retrieve the tarball.
        
        Related to [T3781](https://forge.softwareheritage.org/T3781 'view original for T3781 on Phabricator')

    See https://jenkins.softwareheritage.org/job/DLDBASE/job/tests-on-diff/927/ for more details.

    • Build content out of MultiHash
    • Add support for sha512 hash
    • Fix input test checksum
  • Build is green

    Patch application report for D8587 (id=31015)

    Could not rebase; Attempt merge onto f774aba5...

    Updating f774aba..3083a3b
    Fast-forward
     requirements-swh.txt                               |   2 +-
     swh/loader/core/loader.py                          | 221 +++++++++++++++++----
     .../https_example.org/archives_dummy-hello.tar.gz  | Bin 0 -> 221 bytes
     swh/loader/core/tests/test_loader.py               | 182 +++++++++++++++--
     4 files changed, 350 insertions(+), 55 deletions(-)
     create mode 100644 swh/loader/core/tests/data/https_example.org/archives_dummy-hello.tar.gz
    Changes applied before test
    commit 3083a3b23fa118158a3e1bf54087492086603e63
    Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
    Date:   Fri Sep 30 15:10:49 2022 +0200
    
        ContentLoader: Improve integrity check support
        
        This can deal with sha512 with the new swh.model version.
        
        Related to [T3781](https://forge.softwareheritage.org/T3781 'view original for T3781 on Phabricator')
    
    commit 497f74f3225e4ccf11adce0d6a2bb50b2a471fab
    Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
    Date:   Fri Sep 30 11:54:13 2022 +0200
    
        Add Directory Loader to allow tarball ingestion as Directory
        
        In some marginal listing cases (Nix or Guix for now), we can receive raw tarball to
        ingest. This commit adds a loader to ingest those. The output of the ingestion is a
        snapshot with 1 branch, one HEAD branch targetting the ingested directory (contained
        within the tarball).
        
        This expects to receive a mandatory 'integrity' field. It is used to check the tarball
        received out of the origin.
        
        This can also optionally receive a list of mirror urls in case the main origin url is no
        longer available. Those mirror urls are solely used as fallback to retrieve the tarball.
        
        Related to [T3781](https://forge.softwareheritage.org/T3781 'view original for T3781 on Phabricator')

    See https://jenkins.softwareheritage.org/job/DLDBASE/job/tests-on-diff/932/ for more details.

  • Adapt both Content and Directory loaders to deal with checksums dict instead of integrity.

  • Build has FAILED

    Patch application report for D8587 (id=31027)

    Could not rebase; Attempt merge onto f774aba5...

    Updating f774aba..da7cc37
    Fast-forward
     requirements-swh.txt                               |   2 +-
     swh/loader/core/loader.py                          | 234 +++++++++++++++++----
     .../https_example.org/archives_dummy-hello.tar.gz  | Bin 0 -> 221 bytes
     swh/loader/core/tests/test_loader.py               | 194 +++++++++++++++--
     4 files changed, 372 insertions(+), 58 deletions(-)
     create mode 100644 swh/loader/core/tests/data/https_example.org/archives_dummy-hello.tar.gz
    Changes applied before test
    commit da7cc372fe5a9cdf139de7cf51a2c6cb6dc8b8ed
    Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
    Date:   Fri Sep 30 15:10:49 2022 +0200
    
        {Content|Directory}Loader: Adapt support for checksums
        
        This moves away the loaders from the 'integrity' fields which is now dealt with lister
        side. Instead the lister is send a dictionary of hex hashes.
        
        This also improves the loader to check those checksums when retrieving the
        artifact (content or tarball). Thanks to a bump in the swh.model version, this is now
        able to deal with sha512 checksums checks as well.
        
        Related to [T3781](https://forge.softwareheritage.org/T3781 'view original for T3781 on Phabricator')
    
    commit 497f74f3225e4ccf11adce0d6a2bb50b2a471fab
    Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
    Date:   Fri Sep 30 11:54:13 2022 +0200
    
        Add Directory Loader to allow tarball ingestion as Directory
        
        In some marginal listing cases (Nix or Guix for now), we can receive raw tarball to
        ingest. This commit adds a loader to ingest those. The output of the ingestion is a
        snapshot with 1 branch, one HEAD branch targetting the ingested directory (contained
        within the tarball).
        
        This expects to receive a mandatory 'integrity' field. It is used to check the tarball
        received out of the origin.
        
        This can also optionally receive a list of mirror urls in case the main origin url is no
        longer available. Those mirror urls are solely used as fallback to retrieve the tarball.
        
        Related to [T3781](https://forge.softwareheritage.org/T3781 'view original for T3781 on Phabricator')

    Link to build: https://jenkins.softwareheritage.org/job/DLDBASE/job/tests-on-diff/937/ See console output for more information: https://jenkins.softwareheritage.org/job/DLDBASE/job/tests-on-diff/937/console

  • Build has FAILED

    Build error unrelated to this diff [1] locally ok with my old tox generation [2] Maybe a new upstream version bump on importlib-metadata?

    • [1]
    ...
    importlib-metadata==5.0.0
    ...
    11:37:49    File "/var/lib/jenkins/workspace/DLDBASE/tests-on-diff/.tox/py3/lib/python3.7/site-packages/kombu/utils/compat.py", line 82, in entrypoints
    11:37:49      for ep in importlib_metadata.entry_points().get(namespace, [])
    11:37:49  AttributeError: 'EntryPoints' object has no attribute 'get'
    • [2]
    $ tox
    ...
    importlib-metadata==4.12.0
    ...
    ================================================================ 272 passed, 122 warnings in 62.62s (0:01:02) ================================================================
  • Build has FAILED

    Patch application report for D8587 (id=31027)

    Could not rebase; Attempt merge onto f774aba5...

    Updating f774aba..da7cc37
    Fast-forward
     requirements-swh.txt                               |   2 +-
     swh/loader/core/loader.py                          | 234 +++++++++++++++++----
     .../https_example.org/archives_dummy-hello.tar.gz  | Bin 0 -> 221 bytes
     swh/loader/core/tests/test_loader.py               | 194 +++++++++++++++--
     4 files changed, 372 insertions(+), 58 deletions(-)
     create mode 100644 swh/loader/core/tests/data/https_example.org/archives_dummy-hello.tar.gz
    Changes applied before test
    commit da7cc372fe5a9cdf139de7cf51a2c6cb6dc8b8ed
    Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
    Date:   Fri Sep 30 15:10:49 2022 +0200
    
        {Content|Directory}Loader: Adapt support for checksums
        
        This moves away the loaders from the 'integrity' fields which is now dealt with lister
        side. Instead the lister is send a dictionary of hex hashes.
        
        This also improves the loader to check those checksums when retrieving the
        artifact (content or tarball). Thanks to a bump in the swh.model version, this is now
        able to deal with sha512 checksums checks as well.
        
        Related to [T3781](https://forge.softwareheritage.org/T3781 'view original for T3781 on Phabricator')
    
    commit 497f74f3225e4ccf11adce0d6a2bb50b2a471fab
    Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
    Date:   Fri Sep 30 11:54:13 2022 +0200
    
        Add Directory Loader to allow tarball ingestion as Directory
        
        In some marginal listing cases (Nix or Guix for now), we can receive raw tarball to
        ingest. This commit adds a loader to ingest those. The output of the ingestion is a
        snapshot with 1 branch, one HEAD branch targetting the ingested directory (contained
        within the tarball).
        
        This expects to receive a mandatory 'integrity' field. It is used to check the tarball
        received out of the origin.
        
        This can also optionally receive a list of mirror urls in case the main origin url is no
        longer available. Those mirror urls are solely used as fallback to retrieve the tarball.
        
        Related to [T3781](https://forge.softwareheritage.org/T3781 'view original for T3781 on Phabricator')

    Link to build: https://jenkins.softwareheritage.org/job/DLDBASE/job/tests-on-diff/938/ See console output for more information: https://jenkins.softwareheritage.org/job/DLDBASE/job/tests-on-diff/938/console

  • Maybe a new upstream version bump on importlib-metadata?

    [1]

    ...
    importlib-metadata==5.0.0

    Definitely something about that release. I reproduced it in the docker dev environment. Adding a workaround in requirements.txt (swh.{lister|loader}: importlib_metadata!=5.0.0). Those containers are then able to start ^.

  • vlorentz mentioned in merge request !438 (closed)

    mentioned in merge request !438 (closed)

  • Merge request was accepted

  • vlorentz approved this merge request

    approved this merge request

  • Adapt according to latest review

  • Build is green

    Patch application report for D8587 (id=31058)

    Rebasing onto dbf7f3dc...

    Current branch diff-target is up to date.
    Changes applied before test
    commit 39c33a66c27c030c7ae3cfba4a393c2ced468fbc
    Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
    Date:   Fri Sep 30 15:10:49 2022 +0200
    
        {Content|Directory}Loader: Adapt support for checksums
        
        This adapts the content/directory loader implementations to use directly a checksums
        dict which is now sent by the listers.
        
        This improves the loader to check those checksums when retrieving the artifact (content
        or tarball). Thanks to a bump in the swh.model version, this is now able to deal with
        sha512 checksums checks as well.
        
        This also aligns with the current package loaders which now are also checking the
        integrity of the tarballs they ingest.
        
        Related to [T3781](https://forge.softwareheritage.org/T3781 'view original for T3781 on Phabricator')

    See https://jenkins.softwareheritage.org/job/DLDBASE/job/tests-on-diff/943/ for more details.

  • Merge request was merged

Please register or sign in to reply
Loading