Skip to content
Snippets Groups Projects

Refactor package loaders to remove temporary revision objects

1 unresolved thread

Depends on !237 (closed).

Test Plan

No hash is changed


Migrated from D6624 (view on Phabricator)

Merge request reports

Closed by Phabricator Migration userPhabricator Migration user 3 years ago (Nov 10, 2021 1:21pm UTC)

Merge details

  • The changes were not merged into generated-differential-D6624-target.

Activity

Filter activity
  • Approvals
  • Assignees & reviewers
  • Comments (from bots)
  • Comments (from users)
  • Commits & branches
  • Edits
  • Labels
  • Lock status
  • Mentions
  • Merge request status
  • Tracking
  • Build is green

    Patch application report for D6624 (id=24066)

    Could not rebase; Attempt merge onto 5063082e...

    Updating 5063082..2af673b
    Fast-forward
     docs/index.rst                                   |   3 +-
     docs/package-loader-specifications.rst           | 106 +++++++
     docs/package-loader-tutorial.rst                 |  25 ++
     swh/loader/package/archive/loader.py             |  28 +-
     swh/loader/package/archive/tests/test_archive.py |  40 +--
     swh/loader/package/cran/loader.py                |  30 +-
     swh/loader/package/cran/tests/test_cran.py       |  37 ++-
     swh/loader/package/debian/loader.py              |  30 +-
     swh/loader/package/debian/tests/test_debian.py   | 106 ++++---
     swh/loader/package/deposit/loader.py             |  48 ++-
     swh/loader/package/deposit/tests/test_deposit.py | 134 ++++++---
     swh/loader/package/loader.py                     | 225 ++++++++++-----
     swh/loader/package/nixguix/loader.py             |  24 +-
     swh/loader/package/nixguix/tests/test_nixguix.py |  61 ++--
     swh/loader/package/npm/loader.py                 |  24 +-
     swh/loader/package/npm/tests/test_npm.py         | 145 +++++-----
     swh/loader/package/opam/loader.py                |  89 ++++--
     swh/loader/package/opam/tests/test_opam.py       | 197 +++++++++++--
     swh/loader/package/pypi/loader.py                |  34 ++-
     swh/loader/package/pypi/tests/test_pypi.py       | 353 +++++++++--------------
     swh/loader/package/tests/test_loader.py          | 261 +++++++++++++----
     swh/loader/package/tests/test_loader_metadata.py |  41 +--
     swh/loader/tests/__init__.py                     |  18 +-
     swh/loader/tests/test_init.py                    |   4 +-
     24 files changed, 1294 insertions(+), 769 deletions(-)
     create mode 100644 docs/package-loader-specifications.rst
    Changes applied before test
    commit 2af673b617c5d993127c37a2831202ff478f2135
    Author: Valentin Lorentz <vlorentz@softwareheritage.org>
    Date:   Tue Nov 9 13:37:31 2021 +0100
    
        Refactor package loaders to remove temporary revision objects
    
    commit 05496787809354552ea9ed0043d687acabf451d0
    Author: Valentin Lorentz <vlorentz@softwareheritage.org>
    Date:   Mon Nov 8 13:39:17 2021 +0100
    
        Document how each package loader populates fields.
    
    commit 89417bb072aade6b68402dbc10eaee61d98e9ae4
    Author: Valentin Lorentz <vlorentz@softwareheritage.org>
    Date:   Fri Nov 5 13:18:01 2021 +0100
    
        Make package loaders write releases instead of revisions
        
        The artifacts they load match the semantics of a Release, but we used Revisions
        so far because of technical details (we needed the 'metadata' field of Revision
        that Release lacks) that is no longer relevant (thanks to the metadata storage).
        
        Packages that were loaded by previous versions of the package loader (as revs)
        will be converted to releases. In order to avoid fetching them from the origin,
        the loader will look for an existing extid pointing to a revision (like it used
        to), fetch that revision, extract some fields (directory id, author, date, ...)
        and build a new release using this information.
        
        This commit is unfortunately very large because of all changes in tests, mostly
        just new hashes and renaming 'revision' to 'release' (and various abbreviations
        and capitalizations).
        
        The only meaningful changes are in swh/loader/package/tests/test_loader.py and
        swh/loader/package/loader.py.
        
        To keep this commit as short as possible, I did not yet change individual loaders
        to create releases: they still create revisions, but are converted by the base
        loader. The next commit will refactor them to remove this conversion layer.
    
    commit c0a98a5c4cfe4beac5c0b03bf61459d244b6f132
    Author: Valentin Lorentz <vlorentz@softwareheritage.org>
    Date:   Thu Nov 4 17:58:37 2021 +0100
    
        tests: Remove duplicate checks
        
        All the '*_missing' tests are already done automatically by check_snapshot
        (it recursively checks all objects are present in the storage).
    
    commit 2311ad9b365e23d1a3c532aa005e32db46bc06f3
    Author: Valentin Lorentz <vlorentz@softwareheritage.org>
    Date:   Thu Nov 4 17:53:20 2021 +0100
    
        tests: Hide utilities from stack traces
        
        They clutter the test output because pytest prints the whole code
        of the function raising the assertionerror.
        
        With this magic variable, the error is shown as if it was raised
        directly in the caller's body.
    
    commit 551c55ff04774b2c4e50e54f701c514064607f6a
    Author: Valentin Lorentz <vlorentz@softwareheritage.org>
    Date:   Thu Nov 4 15:49:45 2021 +0100
    
        package loaders: Make test failures more helpful
        
        Some tests did the following:
        
        1. build a snapshot
        2. get the snapshot from the storage
        3. compare it with the expected snapshot
        4. get the origin visit from the storage and check it
        
        If the loader built a wrong snapshot, the test fails at step 2,
        and the only information displayed is that the expected snapshot id
        does not exist, which is very unhelpful.
        
        Instead, I reordered them as: 1, 4, 2, 3. This way, if a wrong
        snapshot is build by the loader, it is detected when comparing
        the visit, and pytest shows the two hashes.
        Then, the test can be modified to use the hash that is actually
        generated to show the actual snapshot.
        
        This is consistent with what was already done in the pypi loader.
        
        Additionally, I made the following changes:
        
        1. always check stats last (because a difference in numbers is
           hardly actionable without testing other objects)
        2. add a few more snapshot id checks in visits
        3. deduplicated a hardcoded snapshot id.
    
    commit 89a0bfee48ca1663faddaa08b8fb3f163d5cfc6f
    Author: Valentin Lorentz <vlorentz@softwareheritage.org>
    Date:   Thu Oct 21 14:35:47 2021 +0200
    
        deposit: Remove 'parent' deposit
        
        The parent is computed by the deposit as the revision of the latest deposit
        in the same origin before the current one.
        Therefore, it is redundant, as it can be recomputed from metadata
        + revision date.
        
        This is a preliminary change needed to make package loaders produce
        releases instead of revisions, as releases don't have parent relationships
    
    commit aeffe01a2b3918262e4bd715ea52c7d7da27807c
    Author: Valentin Lorentz <vlorentz@softwareheritage.org>
    Date:   Thu Nov 4 11:45:36 2021 +0100
    
        opam: Write package definitions to the extrinsic metadata storage
    
    commit 18bbbae719fc9d165d5b543e54d449f5befc083a
    Author: Valentin Lorentz <vlorentz@softwareheritage.org>
    Date:   Thu Nov 4 11:43:53 2021 +0100
    
        Add missing documentation for `get_metadata_authority`.

    See https://jenkins.softwareheritage.org/job/DLDBASE/job/tests-on-diff/606/ for more details.

799 774 release = self.build_release(
800 775 version, p_info, uncompressed_path, directory=directory.hash
801 776 )
777 print(release)
  • lgtm

    a couple of remarks inline.

  • Antoine R. Dumont mentioned in merge request !239 (closed)

    mentioned in merge request !239 (closed)

  • Merge request was accepted

  • Antoine R. Dumont approved this merge request

    approved this merge request

  • Author Maintainer

    Merge request was merged

  • closed

  • Please register or sign in to reply
    Loading