Skip to content
Snippets Groups Projects

Add Content Loader to ingest raw content file

In some marginal listing cases (Nix or Guix for now), we can receive raw file to ingest. This commit adds a loader to ingest those. The output of the ingestion is a snapshot with 1 branch, one HEAD branch targetting the file content ingested.

This expects to receive a mandatory 'integrity' field. It is used to check the content match the declaration.

This can also optionally receive a list of mirror urls in case the main origin url is no longer available. Those mirror urls are solely used as fallback to retrieve the content.

Note: For the integrity field, some future adaptations will be needed in that code. It's kept out of the scope of this diff to avoid depending on a new release of the model [1]

Related to T3781 Supersedes !446 (closed)


Migrated from D8581 (view on Phabricator)

Merge request reports

Loading
Loading

Activity

Filter activity
  • Approvals
  • Assignees & reviewers
  • Comments (from bots)
  • Comments (from users)
  • Commits & branches
  • Edits
  • Labels
  • Lock status
  • Mentions
  • Merge request status
  • Tracking
  • Build is green

    Patch application report for D8581 (id=30956)

    Rebasing onto 6299c091...

    First, rewinding head to replay your work on top of it...
    Applying: Add Content Loader to ingest raw content file
    Changes applied before test
    commit 75e8a22f220083d9d4a3c1341ed5d882849f7b86
    Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
    Date:   Thu Sep 29 16:14:43 2022 +0200
    
        Add Content Loader to ingest raw content file
        
        In some marginal listing cases (Nix or Guix for now), we can receive files to ingest.
        This creates a loader to ingest those. The output of the ingestion is a snapshot with 2
        branches, one targetting the file ingested whose branch name is the filename. The other
        is an alias branch (matching what's done in other package loader).
        
        This expects to receive a mandatory 'integrity' field. It is used to check the content
        match the declaration.
        
        This can also receive a list of mirror urls in case the main origin url is no longer
        available.
        
        Related to [T3781](https://forge.softwareheritage.org/T3781 'view original for T3781 on Phabricator')

    See https://jenkins.softwareheritage.org/job/DLDBASE/job/tests-on-diff/914/ for more details.

  • please document which terminal of the grammar you are aiming for. (the current implementation is hash-expression, but I don't know if that's intentional)

    Could you use a smaller test file? That one is really big...

  • please document which terminal of the grammar you are aiming for. (the current implementation is hash-expression, but I don't know if that's intentional)

    It is intentional, yes. I'll add a link to the grammar.

    Could you use a smaller test file? That one is really big...

    Well, that's the sole one i found.

  • ! In !447 (closed), @ardumont wrote: Could you use a smaller test file? That one is really big...

    Well, that's the sole one i found.

    any file would do, you don't need to get one from Guix/Nix

  • Could you use a smaller test file? That one is really big...

    Well, that's the sole one i found.

    any file would do, you don't need to get one from Guix/Nix

    heh, right.

  • Address review

  • Build has FAILED

    Patch application report for D8581 (id=30978)

    Rebasing onto 6299c091...

    First, rewinding head to replay your work on top of it...
    Applying: Add Content Loader to ingest raw content file
    Changes applied before test
    commit 26d3ad52aa8c6e1223d4d0b0e3609c198bf46c7b
    Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
    Date:   Thu Sep 29 16:14:43 2022 +0200
    
        Add Content Loader to ingest raw content file
        
        In some marginal listing cases (Nix or Guix for now), we can receive raw file to ingest.
        This commit adds a loader to ingest those. The output of the ingestion is a snapshot
        with 1 branch, one HEAD branch targetting the file content ingested.
        
        This expects to receive a mandatory 'integrity' field. It is used to check the content
        match the declaration.
        
        This can also optionally receive a list of mirror urls in case the main origin url is no
        longer available. Those mirror urls are solely used as fallback to retrieve the content.
        
        Related to [T3781](https://forge.softwareheritage.org/T3781 'view original for T3781 on Phabricator')

    Link to build: https://jenkins.softwareheritage.org/job/DLDBASE/job/tests-on-diff/916/ See console output for more information: https://jenkins.softwareheritage.org/job/DLDBASE/job/tests-on-diff/916/console

  • Fix docstring which failed the build!

  • Build is green

    Patch application report for D8581 (id=30979)

    Rebasing onto 6299c091...

    First, rewinding head to replay your work on top of it...
    Applying: Add Content Loader to ingest raw content file
    Changes applied before test
    commit 2aca780a73de24ecf7ff9227e43513acb0fb0357
    Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
    Date:   Thu Sep 29 16:14:43 2022 +0200
    
        Add Content Loader to ingest raw content file
        
        In some marginal listing cases (Nix or Guix for now), we can receive raw file to ingest.
        This commit adds a loader to ingest those. The output of the ingestion is a snapshot
        with 1 branch, one HEAD branch targetting the file content ingested.
        
        This expects to receive a mandatory 'integrity' field. It is used to check the content
        match the declaration.
        
        This can also optionally receive a list of mirror urls in case the main origin url is no
        longer available. Those mirror urls are solely used as fallback to retrieve the content.
        
        Related to [T3781](https://forge.softwareheritage.org/T3781 'view original for T3781 on Phabricator')

    See https://jenkins.softwareheritage.org/job/DLDBASE/job/tests-on-diff/917/ for more details.

  • Merge request was accepted

  • vlorentz approved this merge request

    approved this merge request

  • @vlorentz I should have started with this... from the nixguix manifest, the integrity is for now only sha256... [1] So not sure we need to touch the model after all [2], especially since that diff got a tad bigger since you reviewed it...

     swh  tony  yavin4  ~  work  …  swh  swh-environment  swh-model   master  3⬆  %  jq . /var/tmp/sources.json | grep -c sha256
    13629
     swh  tony  yavin4  ~  work  …  swh  swh-environment  swh-model   master  3⬆  %  jq . /var/tmp/sources.json | grep -c sha384
    0
     swh  tony  yavin4  ~  work  …  swh  swh-environment  swh-model   master  3⬆  ERROR  %  jq . /var/tmp/sources.json | grep -c sha512
    0
  • @vlorentz I should have started with this... from the nixguix manifest, the integrity is for now only sha256... [1] So not sure we need to touch the model after all [2], especially since that diff got a tad bigger since you reviewed it...

    [2] swh-model!322 (closed)

    $ jq . /var/tmp/sources.json | grep -c sha256
    13629
    $ jq . /var/tmp/sources.json | grep -c sha384
    0
    $ jq . /var/tmp/sources.json | grep -c sha512
    0

    Although sha512 is used in the nixpkgs manifest...

    $ jq . /var/tmp/sources-unstable.json | grep -c sha256
    58036
    $ jq . /var/tmp/sources-unstable.json | grep -c sha384
    0
    $ jq . /var/tmp/sources-unstable.json | grep -c sha512
    8162
  • Compute expected checksum to check integrity outside the loop

  • Build is green

    Patch application report for D8581 (id=30983)

    Rebasing onto 6299c091...

    First, rewinding head to replay your work on top of it...
    Applying: Add Content Loader to ingest raw content file
    Changes applied before test
    commit 6436e2304d37812839870562f447895768d4c4a5
    Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
    Date:   Thu Sep 29 16:14:43 2022 +0200
    
        Add Content Loader to ingest raw content file
        
        In some marginal listing cases (Nix or Guix for now), we can receive raw file to ingest.
        This commit adds a loader to ingest those. The output of the ingestion is a snapshot
        with 1 branch, one HEAD branch targetting the file content ingested.
        
        This expects to receive a mandatory 'integrity' field. It is used to check the content
        match the declaration.
        
        This can also optionally receive a list of mirror urls in case the main origin url is no
        longer available. Those mirror urls are solely used as fallback to retrieve the content.
        
        Related to [T3781](https://forge.softwareheritage.org/T3781 'view original for T3781 on Phabricator')

    See https://jenkins.softwareheritage.org/job/DLDBASE/job/tests-on-diff/919/ for more details.

  • Refactoring steps

  • Build is green

    Patch application report for D8581 (id=30984)

    Rebasing onto 6299c091...

    First, rewinding head to replay your work on top of it...
    Applying: Add Content Loader to ingest raw content file
    Changes applied before test
    commit 32524ef0c03e677dbd60ec9d7aec7626c4a5322d
    Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
    Date:   Thu Sep 29 16:14:43 2022 +0200
    
        Add Content Loader to ingest raw content file
        
        In some marginal listing cases (Nix or Guix for now), we can receive raw file to ingest.
        This commit adds a loader to ingest those. The output of the ingestion is a snapshot
        with 1 branch, one HEAD branch targetting the file content ingested.
        
        This expects to receive a mandatory 'integrity' field. It is used to check the content
        match the declaration.
        
        This can also optionally receive a list of mirror urls in case the main origin url is no
        longer available. Those mirror urls are solely used as fallback to retrieve the content.
        
        Related to [T3781](https://forge.softwareheritage.org/T3781 'view original for T3781 on Phabricator')

    See https://jenkins.softwareheritage.org/job/DLDBASE/job/tests-on-diff/920/ for more details.

  • Rebase

  • Build is green

    Patch application report for D8581 (id=30992)

    Rebasing onto 6299c091...

    Current branch diff-target is up to date.
    Changes applied before test
    commit f774aba59e65bd3e5dd0ba9364840d8903d5706c
    Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
    Date:   Thu Sep 29 16:14:43 2022 +0200
    
        Add Content Loader to ingest raw content file
        
        In some marginal listing cases (Nix or Guix for now), we can receive raw file to ingest.
        This commit adds a loader to ingest those. The output of the ingestion is a snapshot
        with 1 branch, one HEAD branch targetting the file content ingested.
        
        This expects to receive a mandatory 'integrity' field. It is used to check the content
        match the declaration.
        
        This can also optionally receive a list of mirror urls in case the main origin url is no
        longer available. Those mirror urls are solely used as fallback to retrieve the content.
        
        Related to [T3781](https://forge.softwareheritage.org/T3781 'view original for T3781 on Phabricator')

    See https://jenkins.softwareheritage.org/job/DLDBASE/job/tests-on-diff/925/ for more details.

  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
Please register or sign in to reply
Loading