Skip to content
Snippets Groups Projects

archive, cran: Replace 'artifact_identity' with extid to detect known packages

We want to store these identifiers in the ExtID storage, which expects a (preferably short) bytearray; but the 'artifact_identity' was a list of (possibly long) strings and ints.

While this commit does not write them to the ExtID storage yet, it makes these two loaders use them internally.

Assuming no sha256 collision, this does not change their behavior when seen from the outside, with two exceptions:

  • the list of keys to use is now configured with a template string
  • configuring an unknown key now raises a KeyError instead of silently using a None value.

But we never use this configuration setting, so in practice there is no change at all.


Migrated from D5289 (view on Phabricator)

Merge request reports

Loading
Loading

Activity

Filter activity
  • Approvals
  • Assignees & reviewers
  • Comments (from bots)
  • Comments (from users)
  • Commits & branches
  • Edits
  • Labels
  • Lock status
  • Mentions
  • Merge request status
  • Tracking
  • Build is green

    Patch application report for D5289 (id=18936)

    Could not rebase; Attempt merge onto 132522e4...

    Merge made by the 'recursive' strategy.
     swh/loader/package/archive/loader.py             | 49 +++++++++++++--------
     swh/loader/package/archive/tests/test_archive.py | 32 +++++++-------
     swh/loader/package/cran/loader.py                | 14 +++---
     swh/loader/package/debian/loader.py              | 53 +++++++++++++----------
     swh/loader/package/debian/tests/test_debian.py   | 55 ++++++++++++++++++++++++
     swh/loader/package/loader.py                     | 19 ++++++--
     swh/loader/package/nixguix/loader.py             | 44 ++++++++++---------
     swh/loader/package/nixguix/tests/test_nixguix.py | 33 ++++++++------
     swh/loader/package/npm/loader.py                 | 26 ++++++-----
     swh/loader/package/pypi/loader.py                | 31 +++++++------
     swh/loader/package/tests/test_loader.py          | 11 +++--
     11 files changed, 239 insertions(+), 128 deletions(-)
    Changes applied before test
    commit 257350cc6b699d7a96080611acd02f0a524a0bb9
    Merge: 132522e d764a78
    Author: Jenkins user <jenkins@localhost>
    Date:   Fri Mar 19 13:28:53 2021 +0000
    
        Merge branch 'diff-target' into HEAD
    
    commit d764a783e772d639a34b513b3e7d2dad68d68b72
    Author: Valentin Lorentz <vlorentz@softwareheritage.org>
    Date:   Fri Mar 19 10:51:43 2021 +0100
    
        archive, cran: Replace 'artifact_identity' with extid to detect known packages
        
        We want to store these identifiers in the ExtID storage, which expects
        a (preferably short) bytearray; but the 'artifact_identity' was a
        list of (possibly long) strings and ints.
        
        While this commit does not write them to the ExtID storage yet,
        it makes these two loaders use them internally.
        
        Assuming no sha256 collision, this does not change their behavior
        when seen from the outside, with two exceptions:
        
        * the list of keys to use is now configured with a template string
        * configuring an unknown key now raises a KeyError instead of silently
          using a None value.
        
        But we never use this configuration setting, so in practice there is no
        change at all.
    
    commit 3357e642e5676aa1ace50ecffe2a777f51b251f7
    Author: Valentin Lorentz <vlorentz@softwareheritage.org>
    Date:   Thu Mar 18 17:47:44 2021 +0100
    
        nixguix: Split 'integrity' extraction out of resolve_revision_from
        
        We will need it independently in a future commit
    
    commit 9dc175c2aef95b0e5a2a83a71ec63e2c1f075c7b
    Author: Valentin Lorentz <vlorentz@softwareheritage.org>
    Date:   Wed Mar 17 09:32:37 2021 +0100
    
        npm, pypi: Split original_artifact parsing out of artifact_to_revision_id
        
        We will need it independently in a future commit
    
    commit 93c9aa8e014c19b02887e6c05a671ad7430235dd
    Author: Valentin Lorentz <vlorentz@softwareheritage.org>
    Date:   Thu Mar 18 16:18:40 2021 +0100
    
        debian: Split original_artifact parsing out of resolve_revision_from
        
        We will need it independently in a future commit
    
    commit 9a4991b391e6b825f9dee3008a410650d434ca6d
    Author: Valentin Lorentz <vlorentz@softwareheritage.org>
    Date:   Wed Mar 17 10:02:21 2021 +0100
    
        debian: Make resolve_revision_from use the sha256 of the .dsc
        
        Instead of the sha256 + name + ... of all the files of a package.
        
        This will be needed to transition to ExtID, as we can't reasonably write
        this large set in the ExtID storage; and the sha256 of the .dsc is good
        enough, as the .dsc contains hashes and names of other files.
    
    commit 455e05837dc2fb59b4adcf01df27c6f7d64781dd
    Author: Valentin Lorentz <vlorentz@softwareheritage.org>
    Date:   Tue Mar 16 14:44:33 2021 +0100
    
        test_load_nixguix_one_common_artifact_from_other_loader: Test all log lines, not just the last one
        
        It's stricter and more readable.

    See https://jenkins.softwareheritage.org/job/DLDBASE/job/tests-on-diff/407/ for more details.

  • Author Maintainer

    rebase

  • Build has FAILED

    Patch application report for D5289 (id=19000)

    Could not rebase; Attempt merge onto 132522e4...

    Updating 132522e..e61f507
    Fast-forward
     swh/loader/package/archive/loader.py             | 49 +++++++++++++--------
     swh/loader/package/archive/tests/test_archive.py | 32 +++++++-------
     swh/loader/package/cran/loader.py                | 14 +++---
     swh/loader/package/debian/loader.py              | 53 +++++++++++++----------
     swh/loader/package/debian/tests/test_debian.py   | 55 ++++++++++++++++++++++++
     swh/loader/package/loader.py                     | 19 ++++++--
     swh/loader/package/nixguix/loader.py             | 44 ++++++++++---------
     swh/loader/package/nixguix/tests/test_nixguix.py | 33 ++++++++------
     swh/loader/package/npm/loader.py                 | 26 ++++++-----
     swh/loader/package/pypi/loader.py                | 37 ++++++++++------
     swh/loader/package/pypi/tests/test_pypi.py       | 11 +++++
     swh/loader/package/tests/test_loader.py          | 11 +++--
     12 files changed, 256 insertions(+), 128 deletions(-)
    Changes applied before test
    commit e61f507cdea82a01be93cb792abe486ad0e5a596
    Author: Valentin Lorentz <vlorentz@softwareheritage.org>
    Date:   Fri Mar 19 10:51:43 2021 +0100
    
        archive, cran: Replace 'artifact_identity' with extid to detect known packages
        
        We want to store these identifiers in the ExtID storage, which expects
        a (preferably short) bytearray; but the 'artifact_identity' was a
        list of (possibly long) strings and ints.
        
        While this commit does not write them to the ExtID storage yet,
        it makes these two loaders use them internally.
        
        Assuming no sha256 collision, this does not change their behavior
        when seen from the outside, with two exceptions:
        
        * the list of keys to use is now configured with a template string
        * configuring an unknown key now raises a KeyError instead of silently
          using a None value.
        
        But we never use this configuration setting, so in practice there is no
        change at all.
    
    commit fd70fe0e475a4f5d339d2a69ce3eb99aad009d76
    Author: Valentin Lorentz <vlorentz@softwareheritage.org>
    Date:   Thu Mar 18 17:47:44 2021 +0100
    
        nixguix: Split 'integrity' extraction out of resolve_revision_from
        
        We will need it independently in a future commit
    
    commit 6da306159137c309ea3aa987589415c4b8f81de3
    Author: Valentin Lorentz <vlorentz@softwareheritage.org>
    Date:   Wed Mar 17 09:32:37 2021 +0100
    
        npm, pypi: Split original_artifact parsing out of artifact_to_revision_id
        
        We will need it independently in a future commit
    
    commit 3190406239d231d62b8583bb1aeb74def1016a08
    Author: Valentin Lorentz <vlorentz@softwareheritage.org>
    Date:   Thu Mar 18 16:18:40 2021 +0100
    
        debian: Split original_artifact parsing out of resolve_revision_from
        
        We will need it independently in a future commit
    
    commit 4960aa38b10b29d249b28b92e78e0308f482e551
    Author: Valentin Lorentz <vlorentz@softwareheritage.org>
    Date:   Wed Mar 17 10:02:21 2021 +0100
    
        debian: Make resolve_revision_from use the sha256 of the .dsc
        
        Instead of the sha256 + name + ... of all the files of a package.
        
        This will be needed to transition to ExtID, as we can't reasonably write
        this large set in the ExtID storage; and the sha256 of the .dsc is good
        enough, as the .dsc contains hashes and names of other files.
    
    commit cf8fc9e9bd19b4f303b33ddeb73dc285d835bfcc
    Author: Valentin Lorentz <vlorentz@softwareheritage.org>
    Date:   Tue Mar 16 14:44:33 2021 +0100
    
        test_load_nixguix_one_common_artifact_from_other_loader: Test all log lines, not just the last one
        
        It's stricter and more readable.

    Link to build: https://jenkins.softwareheritage.org/job/DLDBASE/job/tests-on-diff/415/ See console output for more information: https://jenkins.softwareheritage.org/job/DLDBASE/job/tests-on-diff/415/console

  • Author Maintainer

    rebase

  • Build has FAILED

    Patch application report for D5289 (id=19027)

    Could not rebase; Attempt merge onto 9a339e98...

    Updating 9a339e9..4cfb64b
    Fast-forward
     swh/loader/package/archive/loader.py             | 49 +++++++++++++--------
     swh/loader/package/archive/tests/test_archive.py | 32 +++++++-------
     swh/loader/package/cran/loader.py                | 14 +++---
     swh/loader/package/debian/loader.py              | 53 +++++++++++++----------
     swh/loader/package/debian/tests/test_debian.py   | 55 ++++++++++++++++++++++++
     swh/loader/package/loader.py                     | 19 ++++++--
     swh/loader/package/nixguix/loader.py             | 44 ++++++++++---------
     swh/loader/package/npm/loader.py                 | 26 ++++++-----
     swh/loader/package/pypi/loader.py                | 37 ++++++++++------
     swh/loader/package/pypi/tests/test_pypi.py       | 11 +++++
     swh/loader/package/tests/test_loader.py          | 11 +++--
     11 files changed, 235 insertions(+), 116 deletions(-)
    Changes applied before test
    commit 4cfb64bb8875109e77ff3ba29cb9cf4fc76bba88
    Author: Valentin Lorentz <vlorentz@softwareheritage.org>
    Date:   Fri Mar 19 10:51:43 2021 +0100
    
        archive, cran: Replace 'artifact_identity' with extid to detect known packages
        
        We want to store these identifiers in the ExtID storage, which expects
        a (preferably short) bytearray; but the 'artifact_identity' was a
        list of (possibly long) strings and ints.
        
        While this commit does not write them to the ExtID storage yet,
        it makes these two loaders use them internally.
        
        Assuming no sha256 collision, this does not change their behavior
        when seen from the outside, with two exceptions:
        
        * the list of keys to use is now configured with a template string
        * configuring an unknown key now raises a KeyError instead of silently
          using a None value.
        
        But we never use this configuration setting, so in practice there is no
        change at all.
    
    commit d565e5b307dbcf5763171fcce828d4305dccf5de
    Author: Valentin Lorentz <vlorentz@softwareheritage.org>
    Date:   Thu Mar 18 17:47:44 2021 +0100
    
        nixguix: Split 'integrity' extraction out of resolve_revision_from
        
        We will need it independently in a future commit
    
    commit 749dd287432d116be5f4a69335ef35c244923ca6
    Author: Valentin Lorentz <vlorentz@softwareheritage.org>
    Date:   Wed Mar 17 09:32:37 2021 +0100
    
        npm, pypi: Split original_artifact parsing out of artifact_to_revision_id
        
        We will need it independently in a future commit
    
    commit e7ae6364c06a4d1e1863e8c1e948cd3a82ca4286
    Author: Valentin Lorentz <vlorentz@softwareheritage.org>
    Date:   Thu Mar 18 16:18:40 2021 +0100
    
        debian: Split original_artifact parsing out of resolve_revision_from
        
        We will need it independently in a future commit
    
    commit 268f83e943a439958840b4d8cdcff5c9eb16429e
    Author: Valentin Lorentz <vlorentz@softwareheritage.org>
    Date:   Wed Mar 17 10:02:21 2021 +0100
    
        debian: Make resolve_revision_from use the sha256 of the .dsc
        
        Instead of the sha256 + name + ... of all the files of a package.
        
        This will be needed to transition to ExtID, as we can't reasonably write
        this large set in the ExtID storage; and the sha256 of the .dsc is good
        enough, as the .dsc contains hashes and names of other files.

    Link to build: https://jenkins.softwareheritage.org/job/DLDBASE/job/tests-on-diff/427/ See console output for more information: https://jenkins.softwareheritage.org/job/DLDBASE/job/tests-on-diff/427/console

  • Author Maintainer

    rebase

  • Build has FAILED

    Patch application report for D5289 (id=19031)

    Could not rebase; Attempt merge onto 9a339e98...

    Updating 9a339e9..0226f5c
    Fast-forward
     swh/loader/package/archive/loader.py             | 49 +++++++++++++--------
     swh/loader/package/archive/tests/test_archive.py | 32 +++++++-------
     swh/loader/package/cran/loader.py                | 14 +++---
     swh/loader/package/debian/loader.py              | 53 +++++++++++++----------
     swh/loader/package/debian/tests/test_debian.py   | 55 ++++++++++++++++++++++++
     swh/loader/package/loader.py                     | 19 ++++++--
     swh/loader/package/nixguix/loader.py             | 44 ++++++++++---------
     swh/loader/package/npm/loader.py                 | 26 ++++++-----
     swh/loader/package/pypi/loader.py                | 37 ++++++++++------
     swh/loader/package/pypi/tests/test_pypi.py       | 11 +++++
     swh/loader/package/tests/test_loader.py          | 11 +++--
     11 files changed, 235 insertions(+), 116 deletions(-)
    Changes applied before test
    commit 0226f5c509bdc9d882cd310dcbd6dc0643bcb2ef
    Author: Valentin Lorentz <vlorentz@softwareheritage.org>
    Date:   Fri Mar 19 10:51:43 2021 +0100
    
        archive, cran: Replace 'artifact_identity' with extid to detect known packages
        
        We want to store these identifiers in the ExtID storage, which expects
        a (preferably short) bytearray; but the 'artifact_identity' was a
        list of (possibly long) strings and ints.
        
        While this commit does not write them to the ExtID storage yet,
        it makes these two loaders use them internally.
        
        Assuming no sha256 collision, this does not change their behavior
        when seen from the outside, with two exceptions:
        
        * the list of keys to use is now configured with a template string
        * configuring an unknown key now raises a KeyError instead of silently
          using a None value.
        
        But we never use this configuration setting, so in practice there is no
        change at all.
    
    commit 589d97562923155d4e51de6367e8985bbfc92e5a
    Author: Valentin Lorentz <vlorentz@softwareheritage.org>
    Date:   Thu Mar 18 17:47:44 2021 +0100
    
        nixguix: Split 'integrity' extraction out of resolve_revision_from
        
        We will need it independently in a future commit
    
    commit dc197af60c60f061df3ac35d7e7f94b0a0262140
    Author: Valentin Lorentz <vlorentz@softwareheritage.org>
    Date:   Wed Mar 17 09:32:37 2021 +0100
    
        npm, pypi: Split original_artifact parsing out of artifact_to_revision_id
        
        We will need it independently in a future commit
    
    commit e7ae6364c06a4d1e1863e8c1e948cd3a82ca4286
    Author: Valentin Lorentz <vlorentz@softwareheritage.org>
    Date:   Thu Mar 18 16:18:40 2021 +0100
    
        debian: Split original_artifact parsing out of resolve_revision_from
        
        We will need it independently in a future commit
    
    commit 268f83e943a439958840b4d8cdcff5c9eb16429e
    Author: Valentin Lorentz <vlorentz@softwareheritage.org>
    Date:   Wed Mar 17 10:02:21 2021 +0100
    
        debian: Make resolve_revision_from use the sha256 of the .dsc
        
        Instead of the sha256 + name + ... of all the files of a package.
        
        This will be needed to transition to ExtID, as we can't reasonably write
        this large set in the ExtID storage; and the sha256 of the .dsc is good
        enough, as the .dsc contains hashes and names of other files.

    Link to build: https://jenkins.softwareheritage.org/job/DLDBASE/job/tests-on-diff/431/ See console output for more information: https://jenkins.softwareheritage.org/job/DLDBASE/job/tests-on-diff/431/console

  • Author Maintainer

    rebase

  • Build is green

    Patch application report for D5289 (id=19035)

    Could not rebase; Attempt merge onto 9a339e98...

    Updating 9a339e9..20a3c9c
    Fast-forward
     swh/loader/package/archive/loader.py             | 49 +++++++++++++--------
     swh/loader/package/archive/tests/test_archive.py | 32 +++++++-------
     swh/loader/package/cran/loader.py                | 14 +++---
     swh/loader/package/debian/loader.py              | 53 +++++++++++++----------
     swh/loader/package/debian/tests/test_debian.py   | 55 ++++++++++++++++++++++++
     swh/loader/package/loader.py                     | 19 ++++++--
     swh/loader/package/nixguix/loader.py             | 44 ++++++++++---------
     swh/loader/package/npm/loader.py                 | 26 ++++++-----
     swh/loader/package/pypi/loader.py                | 37 ++++++++++------
     swh/loader/package/pypi/tests/test_pypi.py       | 14 ++++++
     swh/loader/package/tests/test_loader.py          | 11 +++--
     11 files changed, 238 insertions(+), 116 deletions(-)
    Changes applied before test
    commit 20a3c9c809c0ca4bacc892e3f983ac56a20f503b
    Author: Valentin Lorentz <vlorentz@softwareheritage.org>
    Date:   Fri Mar 19 10:51:43 2021 +0100
    
        archive, cran: Replace 'artifact_identity' with extid to detect known packages
        
        We want to store these identifiers in the ExtID storage, which expects
        a (preferably short) bytearray; but the 'artifact_identity' was a
        list of (possibly long) strings and ints.
        
        While this commit does not write them to the ExtID storage yet,
        it makes these two loaders use them internally.
        
        Assuming no sha256 collision, this does not change their behavior
        when seen from the outside, with two exceptions:
        
        * the list of keys to use is now configured with a template string
        * configuring an unknown key now raises a KeyError instead of silently
          using a None value.
        
        But we never use this configuration setting, so in practice there is no
        change at all.
    
    commit fdb41abd76b2d683c2fd44fcef0d267ee2788a25
    Author: Valentin Lorentz <vlorentz@softwareheritage.org>
    Date:   Thu Mar 18 17:47:44 2021 +0100
    
        nixguix: Split 'integrity' extraction out of resolve_revision_from
        
        We will need it independently in a future commit
    
    commit f827d04cb538038cc38aceb7ee55c5465c444f52
    Author: Valentin Lorentz <vlorentz@softwareheritage.org>
    Date:   Wed Mar 17 09:32:37 2021 +0100
    
        npm, pypi: Split original_artifact parsing out of artifact_to_revision_id
        
        We will need it independently in a future commit
    
    commit e7ae6364c06a4d1e1863e8c1e948cd3a82ca4286
    Author: Valentin Lorentz <vlorentz@softwareheritage.org>
    Date:   Thu Mar 18 16:18:40 2021 +0100
    
        debian: Split original_artifact parsing out of resolve_revision_from
        
        We will need it independently in a future commit
    
    commit 268f83e943a439958840b4d8cdcff5c9eb16429e
    Author: Valentin Lorentz <vlorentz@softwareheritage.org>
    Date:   Wed Mar 17 10:02:21 2021 +0100
    
        debian: Make resolve_revision_from use the sha256 of the .dsc
        
        Instead of the sha256 + name + ... of all the files of a package.
        
        This will be needed to transition to ExtID, as we can't reasonably write
        this large set in the ExtID storage; and the sha256 of the .dsc is good
        enough, as the .dsc contains hashes and names of other files.

    See https://jenkins.softwareheritage.org/job/DLDBASE/job/tests-on-diff/435/ for more details.

  • Merge request was accepted

  • Nicolas Dandrimont approved this merge request

    approved this merge request

  • Author Maintainer

    Merge request was merged

  • closed

Please register or sign in to reply
Loading