Skip to content
Snippets Groups Projects

Add Directory Loader to allow tarball ingestion as Directory

In some marginal listing cases (Nix or Guix for now), we can receive raw tarball to ingest. This commit adds a loader to ingest those. The output of the ingestion is a snapshot with 1 branch, one HEAD branch targetting the ingested directory (contained within the tarball).

This expects to receive a mandatory 'integrity' field. It is used to check the tarball received out of the origin.

This can also optionally receive a list of mirror urls in case the main origin url is no longer available. Those mirror urls are solely used as fallback to retrieve the tarball.

Related to T3781 Depends on D8581


Migrated from D8584 (view on Phabricator)

Merge request reports

Loading
Loading

Activity

Filter activity
  • Approvals
  • Assignees & reviewers
  • Comments (from bots)
  • Comments (from users)
  • Commits & branches
  • Edits
  • Labels
  • Lock status
  • Mentions
  • Merge request status
  • Tracking
  • Amend commit message and fix typo

  • Build has FAILED

    Patch application report for D8584 (id=30988)

    Could not rebase; Attempt merge onto 6299c091...

    Merge made by the 'recursive' strategy.
     .pre-commit-config.yaml                            |   2 +-
     swh/loader/core/loader.py                          | 241 ++++++++++++++++++++-
     .../project_asdf_archives_asdf-3.3.5.lisp          |   1 +
     .../https_example.org/archives_dummy-hello.tar.gz  | Bin 0 -> 221 bytes
     swh/loader/core/tests/test_loader.py               | 204 ++++++++++++++++-
     5 files changed, 440 insertions(+), 8 deletions(-)
     create mode 100644 swh/loader/core/tests/data/https_common-lisp.net/project_asdf_archives_asdf-3.3.5.lisp
     create mode 100644 swh/loader/core/tests/data/https_example.org/archives_dummy-hello.tar.gz
    Changes applied before test
    commit 10076b690145ee3c306e923eea29b5ede907da57
    Merge: 6299c09 12da8df
    Author: Jenkins user <jenkins@localhost>
    Date:   Fri Sep 30 09:56:36 2022 +0000
    
        Merge branch 'diff-target' into HEAD
    
    commit 12da8df8ee7277b9c208fdd282be92c87cb70a2e
    Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
    Date:   Fri Sep 30 11:54:13 2022 +0200
    
        Add Directory Loader to ingest raw tarball
        
        In some marginal listing cases (Nix or Guix for now), we can receive raw tarball to
        ingest. This commit adds a loader to ingest those. The output of the ingestion is a
        snapshot with 1 branch, one HEAD branch targetting the ingested directory (contained
        within the tarball).
        
        This expects to receive a mandatory 'integrity' field. It is used to check the tarball
        received out of the origin.
        
        This can also optionally receive a list of mirror urls in case the main origin url is no
        longer available. Those mirror urls are solely used as fallback to retrieve the tarball.
        
        Related to [T3781](https://forge.softwareheritage.org/T3781 'view original for T3781 on Phabricator')
    
    commit c5fcf4025bb878df9541bee1e8c55006ba1874df
    Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
    Date:   Thu Sep 29 16:14:43 2022 +0200
    
        Add Content Loader to ingest raw content file
        
        In some marginal listing cases (Nix or Guix for now), we can receive raw file to ingest.
        This commit adds a loader to ingest those. The output of the ingestion is a snapshot
        with 1 branch, one HEAD branch targetting the file content ingested.
        
        This expects to receive a mandatory 'integrity' field. It is used to check the content
        match the declaration.
        
        This can also optionally receive a list of mirror urls in case the main origin url is no
        longer available. Those mirror urls are solely used as fallback to retrieve the content.
        
        Related to [T3781](https://forge.softwareheritage.org/T3781 'view original for T3781 on Phabricator')

    Link to build: https://jenkins.softwareheritage.org/job/DLDBASE/job/tests-on-diff/922/ See console output for more information: https://jenkins.softwareheritage.org/job/DLDBASE/job/tests-on-diff/922/console

  • Build has FAILED

    Patch application report for D8584 (id=30989)

    Could not rebase; Attempt merge onto 6299c091...

    Merge made by the 'recursive' strategy.
     .pre-commit-config.yaml                            |   2 +-
     swh/loader/core/loader.py                          | 240 ++++++++++++++++++++-
     .../project_asdf_archives_asdf-3.3.5.lisp          |   1 +
     .../https_example.org/archives_dummy-hello.tar.gz  | Bin 0 -> 221 bytes
     swh/loader/core/tests/test_loader.py               | 204 +++++++++++++++++-
     5 files changed, 439 insertions(+), 8 deletions(-)
     create mode 100644 swh/loader/core/tests/data/https_common-lisp.net/project_asdf_archives_asdf-3.3.5.lisp
     create mode 100644 swh/loader/core/tests/data/https_example.org/archives_dummy-hello.tar.gz
    Changes applied before test
    commit 3de188cb01ed3e21492491bb207da019b20b5742
    Merge: 6299c09 628efbf
    Author: Jenkins user <jenkins@localhost>
    Date:   Fri Sep 30 09:59:58 2022 +0000
    
        Merge branch 'diff-target' into HEAD
    
    commit 628efbf0d9502a45acd55c49a69f1251ac093c06
    Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
    Date:   Fri Sep 30 11:54:13 2022 +0200
    
        Add Directory Loader to allow tarball ingestion as Directory
        
        In some marginal listing cases (Nix or Guix for now), we can receive raw tarball to
        ingest. This commit adds a loader to ingest those. The output of the ingestion is a
        snapshot with 1 branch, one HEAD branch targetting the ingested directory (contained
        within the tarball).
        
        This expects to receive a mandatory 'integrity' field. It is used to check the tarball
        received out of the origin.
        
        This can also optionally receive a list of mirror urls in case the main origin url is no
        longer available. Those mirror urls are solely used as fallback to retrieve the tarball.
        
        Related to [T3781](https://forge.softwareheritage.org/T3781 'view original for T3781 on Phabricator')
    
    commit c5fcf4025bb878df9541bee1e8c55006ba1874df
    Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
    Date:   Thu Sep 29 16:14:43 2022 +0200
    
        Add Content Loader to ingest raw content file
        
        In some marginal listing cases (Nix or Guix for now), we can receive raw file to ingest.
        This commit adds a loader to ingest those. The output of the ingestion is a snapshot
        with 1 branch, one HEAD branch targetting the file content ingested.
        
        This expects to receive a mandatory 'integrity' field. It is used to check the content
        match the declaration.
        
        This can also optionally receive a list of mirror urls in case the main origin url is no
        longer available. Those mirror urls are solely used as fallback to retrieve the content.
        
        Related to [T3781](https://forge.softwareheritage.org/T3781 'view original for T3781 on Phabricator')

    Link to build: https://jenkins.softwareheritage.org/job/DLDBASE/job/tests-on-diff/923/ See console output for more information: https://jenkins.softwareheritage.org/job/DLDBASE/job/tests-on-diff/923/console

  • ah, ok, pytest is happy here 'cause i got the other diff i abandonned on model... But the current version is:

    DEBUG    swh.loader.core.loader.DirectoryLoader:loader.py:818 Error: Unexpected hashing algorithm sha512, expected one of blake2b512, blake2s256, md5, sha1, sha1_git, sha256
  • Rebase and make tests ok

    I had some local abandonned patch which made my local tests ok... (around swh.model)

  • Build is green

    Patch application report for D8584 (id=30991)

    Could not rebase; Attempt merge onto 6299c091...

    Updating 6299c09..4eaa99e
    Fast-forward
     .pre-commit-config.yaml                            |   2 +-
     swh/loader/core/loader.py                          | 236 ++++++++++++++++++++-
     .../project_asdf_archives_asdf-3.3.5.lisp          |   1 +
     .../https_example.org/archives_dummy-hello.tar.gz  | Bin 0 -> 221 bytes
     swh/loader/core/tests/test_loader.py               | 204 +++++++++++++++++-
     5 files changed, 435 insertions(+), 8 deletions(-)
     create mode 100644 swh/loader/core/tests/data/https_common-lisp.net/project_asdf_archives_asdf-3.3.5.lisp
     create mode 100644 swh/loader/core/tests/data/https_example.org/archives_dummy-hello.tar.gz
    Changes applied before test
    commit 4eaa99ea751f49d5453dbb51e2361f9d070d3dd8
    Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
    Date:   Fri Sep 30 11:54:13 2022 +0200
    
        Add Directory Loader to allow tarball ingestion as Directory
        
        In some marginal listing cases (Nix or Guix for now), we can receive raw tarball to
        ingest. This commit adds a loader to ingest those. The output of the ingestion is a
        snapshot with 1 branch, one HEAD branch targetting the ingested directory (contained
        within the tarball).
        
        This expects to receive a mandatory 'integrity' field. It is used to check the tarball
        received out of the origin.
        
        This can also optionally receive a list of mirror urls in case the main origin url is no
        longer available. Those mirror urls are solely used as fallback to retrieve the tarball.
        
        Related to [T3781](https://forge.softwareheritage.org/T3781 'view original for T3781 on Phabricator')
    
    commit f774aba59e65bd3e5dd0ba9364840d8903d5706c
    Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
    Date:   Thu Sep 29 16:14:43 2022 +0200
    
        Add Content Loader to ingest raw content file
        
        In some marginal listing cases (Nix or Guix for now), we can receive raw file to ingest.
        This commit adds a loader to ingest those. The output of the ingestion is a snapshot
        with 1 branch, one HEAD branch targetting the file content ingested.
        
        This expects to receive a mandatory 'integrity' field. It is used to check the content
        match the declaration.
        
        This can also optionally receive a list of mirror urls in case the main origin url is no
        longer available. Those mirror urls are solely used as fallback to retrieve the content.
        
        Related to [T3781](https://forge.softwareheritage.org/T3781 'view original for T3781 on Phabricator')

    See https://jenkins.softwareheritage.org/job/DLDBASE/job/tests-on-diff/924/ for more details.

  • Refactoring step

  • Build is green

    Patch application report for D8584 (id=30998)

    Rebasing onto f774aba5...

    Current branch diff-target is up to date.
    Changes applied before test
    commit 497f74f3225e4ccf11adce0d6a2bb50b2a471fab
    Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
    Date:   Fri Sep 30 11:54:13 2022 +0200
    
        Add Directory Loader to allow tarball ingestion as Directory
        
        In some marginal listing cases (Nix or Guix for now), we can receive raw tarball to
        ingest. This commit adds a loader to ingest those. The output of the ingestion is a
        snapshot with 1 branch, one HEAD branch targetting the ingested directory (contained
        within the tarball).
        
        This expects to receive a mandatory 'integrity' field. It is used to check the tarball
        received out of the origin.
        
        This can also optionally receive a list of mirror urls in case the main origin url is no
        longer available. Those mirror urls are solely used as fallback to retrieve the tarball.
        
        Related to [T3781](https://forge.softwareheritage.org/T3781 'view original for T3781 on Phabricator')

    See https://jenkins.softwareheritage.org/job/DLDBASE/job/tests-on-diff/926/ for more details.

  • Antoine Lambert mentioned in merge request !437 (closed)

    mentioned in merge request !437 (closed)

  • Looks good to me !

  • Merge request was accepted

  • Antoine Lambert approved this merge request

    approved this merge request

  • Rebase

  • Build is green

    Patch application report for D8584 (id=31056)

    Rebasing onto 5482a48e...

    Current branch diff-target is up to date.
    Changes applied before test
    commit dbf7f3dca0c8c2b9c364bdcdf19481ecf8421b77
    Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
    Date:   Fri Sep 30 11:54:13 2022 +0200
    
        Add Directory Loader to allow tarball ingestion as Directory
        
        In some marginal listing cases (Nix or Guix for now), we can receive raw tarball to
        ingest. This commit adds a loader to ingest those. The output of the ingestion is a
        snapshot with 1 branch, one HEAD branch targetting the ingested directory (contained
        within the tarball).
        
        This expects to receive a mandatory 'integrity' field. It is used to check the tarball
        received out of the origin.
        
        This can also optionally receive a list of mirror urls in case the main origin url is no
        longer available. Those mirror urls are solely used as fallback to retrieve the tarball.
        
        Related to [T3781](https://forge.softwareheritage.org/T3781 'view original for T3781 on Phabricator')

    See https://jenkins.softwareheritage.org/job/DLDBASE/job/tests-on-diff/941/ for more details.

  • Merge request was merged

Please register or sign in to reply
Loading