Skip to content
Snippets Groups Projects
  1. Nov 22, 2021
    • vlorentz's avatar
      Package loader: Uniformize author and message · 2ab367ba
      vlorentz authored
      Authors: use the empty string '' instead of placeholders
      Message: use the same message format (inspired by the Debian loader)
       for all loaders, instead of the empty string / the version /
       something else; except for PyPI and Deposit (which have a better
       format because we have more metadata available).
      
      Additionally, this commit adds test of each release object,
      instead of only relying on its hash.
      2ab367ba
  2. Nov 10, 2021
  3. Nov 09, 2021
  4. Nov 08, 2021
    • vlorentz's avatar
      Make package loaders write releases instead of revisions · 89417bb0
      vlorentz authored
      The artifacts they load match the semantics of a Release, but we used Revisions
      so far because of technical details (we needed the 'metadata' field of Revision
      that Release lacks) that is no longer relevant (thanks to the metadata storage).
      
      Packages that were loaded by previous versions of the package loader (as revs)
      will be converted to releases. In order to avoid fetching them from the origin,
      the loader will look for an existing extid pointing to a revision (like it used
      to), fetch that revision, extract some fields (directory id, author, date, ...)
      and build a new release using this information.
      
      This commit is unfortunately very large because of all changes in tests, mostly
      just new hashes and renaming 'revision' to 'release' (and various abbreviations
      and capitalizations).
      
      The only meaningful changes are in swh/loader/package/tests/test_loader.py and
      swh/loader/package/loader.py.
      
      To keep this commit as short as possible, I did not yet change individual loaders
      to create releases: they still create revisions, but are converted by the base
      loader. The next commit will refactor them to remove this conversion layer.
      89417bb0
  5. Nov 04, 2021
    • vlorentz's avatar
      tests: Remove duplicate checks · c0a98a5c
      vlorentz authored
      All the '*_missing' tests are already done automatically by check_snapshot
      (it recursively checks all objects are present in the storage).
      c0a98a5c
    • vlorentz's avatar
      tests: Hide utilities from stack traces · 2311ad9b
      vlorentz authored
      They clutter the test output because pytest prints the whole code
      of the function raising the assertionerror.
      
      With this magic variable, the error is shown as if it was raised
      directly in the caller's body.
      2311ad9b
    • vlorentz's avatar
      package loaders: Make test failures more helpful · 551c55ff
      vlorentz authored
      Some tests did the following:
      
      1. build a snapshot
      2. get the snapshot from the storage
      3. compare it with the expected snapshot
      4. get the origin visit from the storage and check it
      
      If the loader built a wrong snapshot, the test fails at step 2,
      and the only information displayed is that the expected snapshot id
      does not exist, which is very unhelpful.
      
      Instead, I reordered them as: 1, 4, 2, 3. This way, if a wrong
      snapshot is build by the loader, it is detected when comparing
      the visit, and pytest shows the two hashes.
      Then, the test can be modified to use the hash that is actually
      generated to show the actual snapshot.
      
      This is consistent with what was already done in the pypi loader.
      
      Additionally, I made the following changes:
      
      1. always check stats last (because a difference in numbers is
         hardly actionable without testing other objects)
      2. add a few more snapshot id checks in visits
      3. deduplicated a hardcoded snapshot id.
      551c55ff
    • vlorentz's avatar
      deposit: Remove 'parent' deposit · 89a0bfee
      vlorentz authored
      The parent is computed by the deposit as the revision of the latest deposit
      in the same origin before the current one.
      Therefore, it is redundant, as it can be recomputed from metadata
      + revision date.
      
      This is a preliminary change needed to make package loaders produce
      releases instead of revisions, as releases don't have parent relationships
      89a0bfee
    • vlorentz's avatar
    • vlorentz's avatar
      18bbbae7
  6. Nov 03, 2021
    • vlorentz's avatar
      Revert "deposit: Remove 'parent' deposit" · 5063082e
      vlorentz authored
      This reverts commit f6905cdf.
      
      That commit was a first step toward making loaders write releases
      instead of revisions.
      
      Unfortunately, we will still write revisions for a non-negligeable time,
      so I prefer to defer the removal of parent deposit revisions to the
      moment we actually make that switch, so we don't end up with inconsistent
      revisions.
      5063082e
  7. Oct 21, 2021
    • vlorentz's avatar
      Remove unused 'known_artifacts' code · 9f882793
      vlorentz authored
      extids are used instead now, this is all dead code.
      9f882793
    • vlorentz's avatar
      deposit: Remove 'parent' deposit · f6905cdf
      vlorentz authored
      The parent is computed by the deposit as the revision of the latest deposit
      in the same origin before the current one.
      Therefore, it is redundant, as it can be recomputed from metadata
      + revision date.
      
      This is a preliminary change needed to make package loaders produce
      releases instead of revisions, as releases don't have parent relationships
      f6905cdf
  8. Oct 07, 2021
  9. Sep 29, 2021
  10. Sep 28, 2021
  11. Sep 22, 2021
    • Antoine R. Dumont's avatar
      Allow opam loader to actually use multi-instance opam root · b2d17596
      Antoine R. Dumont authored
      It allow the opam loader to reuse existing opam root with multiple instances. It's the
      complementary code that goes with the loader adaptation [1].
      
      As the `opam show` (cli) [2] version currently packaged does not support the means to
      enclose the metadata extraction per opam instance (when sharing the same opam root), we
      actually work around this by opening internal details to opam.
      
      [1] D6316
      
      [2] `opam show` is currently the interface we are using to extract and list information
      about a package. It does work on standalone opam root folder but it comes short when
      sharing multiple instances within one opam root (for now).
      b2d17596
  12. Sep 21, 2021
  13. Sep 17, 2021
  14. Sep 16, 2021
  15. Sep 15, 2021
    • Antoine Lambert's avatar
      pypi/loader: Filter out sdist archives not of interest · 73299984
      Antoine Lambert authored
      Some PyPI origins declare sdist archives that cannot be extracted
      by swh.core.tarball.uncompress and their content do not match
      standard sdist layout.
      
      This is notably the case for sdist files whose extensions are
      .deb, .egg, .rpm or .whl.
      
      As those artifacts are not of interest to archive and generate
      errors while loading PyPI origins, filter them out from the
      sdist files to process.
      
      Related to T3575
      73299984
  16. Sep 14, 2021
  17. Sep 13, 2021
  18. Aug 31, 2021
    • vlorentz's avatar
      package.tests: Fix failure caused by wrong order of visit IDs · 50b062ad
      vlorentz authored
      The in-mem/cass storage used to sort visits by (id, date).
      The last releases now sort by (date, id) like postgresql, but
      this test did not expect it.
      
      This commit instantiate the loader *after* picking a date
      for the dummy visit, so the loader's visit always comes after
      the dummy one.
      50b062ad
  19. Aug 12, 2021
  20. Aug 05, 2021
  21. Jul 20, 2021
  22. Jul 07, 2021
  23. Jun 25, 2021
  24. Jun 16, 2021
  25. Jun 10, 2021
  26. Jun 09, 2021
Loading