Skip to content
Snippets Groups Projects
  1. Sep 16, 2021
  2. Sep 15, 2021
    • Antoine Lambert's avatar
      pypi/loader: Filter out sdist archives not of interest · 73299984
      Antoine Lambert authored
      Some PyPI origins declare sdist archives that cannot be extracted
      by swh.core.tarball.uncompress and their content do not match
      standard sdist layout.
      
      This is notably the case for sdist files whose extensions are
      .deb, .egg, .rpm or .whl.
      
      As those artifacts are not of interest to archive and generate
      errors while loading PyPI origins, filter them out from the
      sdist files to process.
      
      Related to T3575
      73299984
  3. Sep 14, 2021
  4. Sep 13, 2021
  5. Aug 31, 2021
    • vlorentz's avatar
      package.tests: Fix failure caused by wrong order of visit IDs · 50b062ad
      vlorentz authored
      The in-mem/cass storage used to sort visits by (id, date).
      The last releases now sort by (date, id) like postgresql, but
      this test did not expect it.
      
      This commit instantiate the loader *after* picking a date
      for the dummy visit, so the loader's visit always comes after
      the dummy one.
      50b062ad
  6. Aug 12, 2021
  7. Aug 05, 2021
  8. Jul 20, 2021
  9. Jul 07, 2021
  10. Jun 25, 2021
  11. Jun 16, 2021
  12. Jun 10, 2021
  13. Jun 09, 2021
  14. May 27, 2021
  15. Apr 26, 2021
    • Antoine Lambert's avatar
      tox: Add sphinx environments to check sane doc build · 0e4bb4bb
      Antoine Lambert authored
      Enable to check package documentation can be built without producing
      sphinx warnings.
      
      The sphinx environment is designed to be used in continuous integration
      in order to prevent breaking documentation build when committing changes.
      
      The sphinx-dev environment is designed to be used inside a full swh
      development environment.
      
      Related to T3258
      0e4bb4bb
  16. Apr 16, 2021
  17. Apr 13, 2021
  18. Apr 08, 2021
  19. Apr 06, 2021
  20. Apr 02, 2021
  21. Apr 01, 2021
  22. Mar 30, 2021
  23. Mar 29, 2021
    • vlorentz's avatar
      package.loader: Write to the ExtID storage · e48ced02
      vlorentz authored
      This allows future runs of a loader to know a package was already
      loaded, without querying each of the revisions individually and
      parsing their metadata.
      
      Eventually, this will allow us to get rid of the 'metadata' column
      on the 'revision' table entirely.
      e48ced02
  24. Mar 26, 2021
    • vlorentz's avatar
      package.loader: Lookup packages from the ExtID storage · a32f6871
      vlorentz authored
      To check which packages are already downloaded.
      
      For now, this lookup is done in addition to checking the artifacts
      from the last snapshot's revisions' metadata, because we did not start
      writing ExtIDs yet.
      But the ExtID lookup will eventually replace the artifact-based lookup.
      
      This will finally allow us to drop the 'metadata' field of Revision
      objects.
      a32f6871
    • vlorentz's avatar
      test_resolve_revision_from_artifacts: Use the right type for PartialExtID. · e590b425
      vlorentz authored
      We used a string instead of a tuple. It doesn't matter much because they
      are only compared with each other, but let's not intentionally use
      the wrong types when we don't need to.
      e590b425
  25. Mar 25, 2021
  26. Mar 23, 2021
    • vlorentz's avatar
      package.loader: Unnest loops in PackageLoader.load() · 78430078
      vlorentz authored
      In a future commit, we will need to go through all the PackageInfo
      objects before running the loop, so we can get their ExtID and
      fetch them from the storage.
      
      So, we need to fetch them all before running the load loop,
      using this listcomp.
      78430078
    • vlorentz's avatar
      cran, npm, pypi: Add the loader name in the ExtID type · 6d3545e4
      vlorentz authored
      These three loaders get intrinsic metadata from the archive, and use it
      to build the revision object (mostly authoring and date), which means
      they would not load the same revision as an other loader given the
      same archive.
      6d3545e4
    • vlorentz's avatar
      Revert "package.loader: Unnest loops in PackageLoader.load()" · 5fd7619f
      vlorentz authored
      This reverts commit 6ae19e51.
      
      I didn't mean to commit it now.
      5fd7619f
    • vlorentz's avatar
      package.loader: Unnest loops in PackageLoader.load() · 6ae19e51
      vlorentz authored
      In a future commit, we will need to go through all the PackageInfo
      objects before running the loop, so we can get their ExtID and
      fetch them from the storage.
      
      So, we need to fetch them all before running the load loop,
      using this listcomp.
      6ae19e51
    • vlorentz's avatar
      package.loader: Simplify definition of tmp_revision · 5ac26676
      vlorentz authored
      I found the old definition to be quite confusing when refactoring this code.
      5ac26676
    • vlorentz's avatar
      package loaders: define extid types · e9a8f986
      vlorentz authored
      This is still a purely internal change for now, but it will be needed
      to read/write ExtIDs from/to the storage.
      e9a8f986
    • vlorentz's avatar
      Deduplicate resolve_revision_from across package loaders · eef74bbf
      vlorentz authored
      All package loaders but deposit had logic to compute some object
      from the new packageinfo, some other objects from the known artifacts,
      and compare them.
      
      This commit moves the comparison logic to the base class, and unifies
      the two computation interfaces, respectively as an extid() method
      on TPackageInfo and a method on the loader.
      
      This unified object for comparison is a byte string,
      which is internal to each loader for now, but a future commit
      will read and write it from/to the ExtID storage instead of
      computing it from the 'original_artifacts' present in
      revision metadata.
      eef74bbf
    • vlorentz's avatar
      archive, cran: Replace 'artifact_identity' with extid to detect known packages · 20a3c9c8
      vlorentz authored
      We want to store these identifiers in the ExtID storage, which expects
      a (preferably short) bytearray; but the 'artifact_identity' was a
      list of (possibly long) strings and ints.
      
      While this commit does not write them to the ExtID storage yet,
      it makes these two loaders use them internally.
      
      Assuming no sha256 collision, this does not change their behavior
      when seen from the outside, with two exceptions:
      
      * the list of keys to use is now configured with a template string
      * configuring an unknown key now raises a KeyError instead of silently
        using a None value.
      
      But we never use this configuration setting, so in practice there is no
      change at all.
      20a3c9c8
Loading