Skip to content
Snippets Groups Projects
  1. Feb 26, 2025
  2. Dec 18, 2024
    • Antoine Lambert's avatar
      package: Harmonize the way package versions are sorted · b773bc11
      Antoine Lambert authored
      Instead of implementing the versions sorting in each package loader
      prefer  to have a base implementation in swh.loader.package.PackageLoader
      class through the get_sorted_versions method. It relies on the looseversion
      module enabling to interact with heterogeneous version schemes which works
      pretty well with a large majority of package loaders.
      
      The get_default_version method of the PackageLoader class now also has a
      base implementation returning the last element from the list returned by
      the get_sorted_versions method. As a consequence, each snapshot produced
      by a package loader contains a HEAD alias branch targeting the branch
      for the highest version number of a package.
      
      Both methods can be reimplemented in package loaders for special cases
      like debian for instance.
      
      Also remove the use of the packaging module to parse versions as it is
      only dedicated to parse Python package versions.
      
      Related to swh-lister#4711.
      b773bc11
  3. May 22, 2024
  4. May 15, 2024
  5. Dec 04, 2023
  6. Oct 04, 2022
  7. Sep 30, 2022
  8. Apr 27, 2022
    • Antoine Lambert's avatar
      tasks: Simplify implementation and add tests for listed origins · 39ad939b
      Antoine Lambert authored
      Recent changes in swh-scheduler add new parameters to the celery tasks
      produced from swh.scheduler.model.ListedOrigin instances.
      
      So ensure to handle any new parameters by not hardcoding the expected
      ones in task signatures.
      
      Remove unsafe use of unnamed task parameters.
      
      Add new tests checking task parameters produced from ListedOrigin
      instances do no raise error when attempting to create a package loader.
      
      Related to T4187
  9. Apr 21, 2022
  10. Apr 08, 2022
  11. Jan 11, 2022
  12. Nov 22, 2021
  13. Nov 09, 2021
  14. Nov 08, 2021
    • vlorentz's avatar
      Make package loaders write releases instead of revisions · 89417bb0
      vlorentz authored
      The artifacts they load match the semantics of a Release, but we used Revisions
      so far because of technical details (we needed the 'metadata' field of Revision
      that Release lacks) that is no longer relevant (thanks to the metadata storage).
      
      Packages that were loaded by previous versions of the package loader (as revs)
      will be converted to releases. In order to avoid fetching them from the origin,
      the loader will look for an existing extid pointing to a revision (like it used
      to), fetch that revision, extract some fields (directory id, author, date, ...)
      and build a new release using this information.
      
      This commit is unfortunately very large because of all changes in tests, mostly
      just new hashes and renaming 'revision' to 'release' (and various abbreviations
      and capitalizations).
      
      The only meaningful changes are in swh/loader/package/tests/test_loader.py and
      swh/loader/package/loader.py.
      
      To keep this commit as short as possible, I did not yet change individual loaders
      to create releases: they still create revisions, but are converted by the base
      loader. The next commit will refactor them to remove this conversion layer.
      89417bb0
  15. Nov 04, 2021
    • vlorentz's avatar
      tests: Remove duplicate checks · c0a98a5c
      vlorentz authored
      All the '*_missing' tests are already done automatically by check_snapshot
      (it recursively checks all objects are present in the storage).
      c0a98a5c
    • vlorentz's avatar
      package loaders: Make test failures more helpful · 551c55ff
      vlorentz authored
      Some tests did the following:
      
      1. build a snapshot
      2. get the snapshot from the storage
      3. compare it with the expected snapshot
      4. get the origin visit from the storage and check it
      
      If the loader built a wrong snapshot, the test fails at step 2,
      and the only information displayed is that the expected snapshot id
      does not exist, which is very unhelpful.
      
      Instead, I reordered them as: 1, 4, 2, 3. This way, if a wrong
      snapshot is build by the loader, it is detected when comparing
      the visit, and pytest shows the two hashes.
      Then, the test can be modified to use the hash that is actually
      generated to show the actual snapshot.
      
      This is consistent with what was already done in the pypi loader.
      
      Additionally, I made the following changes:
      
      1. always check stats last (because a difference in numbers is
         hardly actionable without testing other objects)
      2. add a few more snapshot id checks in visits
      3. deduplicated a hardcoded snapshot id.
      551c55ff
  16. Apr 06, 2021
    • vlorentz's avatar
      package loaders: Stop reading/writing Revision.metadata · d84d68a8
      vlorentz authored
      They already write it with raw_extrinsic_metadata_add/extid_add,
      and read it with extid_get_*.
      
      This code was only kept for compatibility while we were migrating
      the extids. This is now done, so this code is useless.
      d84d68a8
  17. Mar 30, 2021
  18. Mar 23, 2021
    • vlorentz's avatar
      cran, npm, pypi: Add the loader name in the ExtID type · 6d3545e4
      vlorentz authored
      These three loaders get intrinsic metadata from the archive, and use it
      to build the revision object (mostly authoring and date), which means
      they would not load the same revision as an other loader given the
      same archive.
      6d3545e4
    • vlorentz's avatar
      package loaders: define extid types · e9a8f986
      vlorentz authored
      This is still a purely internal change for now, but it will be needed
      to read/write ExtIDs from/to the storage.
      e9a8f986
    • vlorentz's avatar
      Deduplicate resolve_revision_from across package loaders · eef74bbf
      vlorentz authored
      All package loaders but deposit had logic to compute some object
      from the new packageinfo, some other objects from the known artifacts,
      and compare them.
      
      This commit moves the comparison logic to the base class, and unifies
      the two computation interfaces, respectively as an extid() method
      on TPackageInfo and a method on the loader.
      
      This unified object for comparison is a byte string,
      which is internal to each loader for now, but a future commit
      will read and write it from/to the ExtID storage instead of
      computing it from the 'original_artifacts' present in
      revision metadata.
      eef74bbf
    • vlorentz's avatar
      archive, cran: Replace 'artifact_identity' with extid to detect known packages · 20a3c9c8
      vlorentz authored
      We want to store these identifiers in the ExtID storage, which expects
      a (preferably short) bytearray; but the 'artifact_identity' was a
      list of (possibly long) strings and ints.
      
      While this commit does not write them to the ExtID storage yet,
      it makes these two loaders use them internally.
      
      Assuming no sha256 collision, this does not change their behavior
      when seen from the outside, with two exceptions:
      
      * the list of keys to use is now configured with a template string
      * configuring an unknown key now raises a KeyError instead of silently
        using a None value.
      
      But we never use this configuration setting, so in practice there is no
      change at all.
      20a3c9c8
  19. Feb 16, 2021
    • Antoine R. Dumont's avatar
      Unify loader instantiation · 7116bb75
      Antoine R. Dumont authored
      This unifies and centralizes the instantiation the same way the lister does.
      
      This introduces a new base class swh.loader.core.loader.Loader for all loaders whose
      only concern for now is to instantiate loaders from either a configuration dict or a
      configuration file.
      
      This simplifies instantiation in celery task code and avoids duplicating the
      configuration load in each loader constructor.
      
      The end goal is to simplify the future refactoring on configuration. With the following,
      we will only have to adapt the Loader class when we start simplifying uniformly the
      configuration.
      
      Also note that I mostly reused the equivalent `swh.lister.pattern.Lister.from_config*`.
      I did not refactor the common behavior (to avoid throwing another dependency in the
      mix). That could always be refactored later.
      
      (inspired by both the work on listers and the configuration system work)
      
      Related to T1410
      7116bb75
  20. Feb 05, 2021
  21. Sep 17, 2020
  22. Jul 31, 2020
  23. Jul 24, 2020
  24. Jul 23, 2020
    • vlorentz's avatar
      Use attributes of *PackageInfo objects instead of untyped dicts. · 14f700d0
      vlorentz authored
      This commit does the following:
      
      * Move artifact_identity to BasePackageInfo, which uses a class attribute
        (and is overriden for ArchivePackageInfo, which needs a custom behavior
        to override keys). Also moved/improved its test
      * Add attributes to *PackageInfo classes, that can be accessed instead
        of the raw metadata.
      * Add a from_metadata class method to all *PackageInfo classes, to parse
        the raw metadata and build the object from it.
      * Pass the PackageInfo object to resolve_revision_from and build_revision
        instead of untyped dicts.
      14f700d0
    • vlorentz's avatar
      Use an attr class for p_info instead of a dict. · 1050d94d
      vlorentz authored
      The benefits are minimal for now, as 'raw' still contains a lot of stuff;
      but further commits will move data out of 'raw' to a proper attribute.
      1050d94d
  25. Jul 16, 2020
  26. Jul 15, 2020
  27. Jul 10, 2020
    • David Douard's avatar
      Fix branches types in tests · fcc7e61b
      David Douard authored
      branch names and targets are expected to be bytes.
      This should allow to get rid of the type castings in check_snapshot().
      fcc7e61b
  28. Jul 09, 2020
  29. Jul 06, 2020
  30. Jun 22, 2020
  31. Jun 03, 2020
Loading