Skip to content
Snippets Groups Projects
  1. Sep 30, 2022
    • Antoine Lambert's avatar
      Use tarball checksum to check download integrity in package loaders · 5482a48e
      Antoine Lambert authored
      When one or multiple tarball checksums are available, either from listers
      output or from Web APIs calls perfomed by some loaders, use them to check
      integrity of downloaded tarballs.
      5482a48e
    • Antoine R. Dumont's avatar
      Add Content Loader to ingest raw content file · f774aba5
      Antoine R. Dumont authored
      In some marginal listing cases (Nix or Guix for now), we can receive raw file to ingest.
      This commit adds a loader to ingest those. The output of the ingestion is a snapshot
      with 1 branch, one HEAD branch targetting the file content ingested.
      
      This expects to receive a mandatory 'integrity' field. It is used to check the content
      match the declaration.
      
      This can also optionally receive a list of mirror urls in case the main origin url is no
      longer available. Those mirror urls are solely used as fallback to retrieve the content.
      
      Related to T3781
      Verified
      f774aba5
  2. Sep 29, 2022
    • Franck Bret's avatar
      Puppet: The puppet loader loads origins from https://forge.puppet.com · 6299c091
      Franck Bret authored
      For each origin it takes advantage of 'artifacts' data send through
      'extra_loader_arguments' from the Puppet lister, providing versions,
      archive url, last_update, filename.
      Author and description are extracted from intrinsic metadata.
      
      Related T4580
      6299c091
    • Franck Bret's avatar
      Cpan: Cpan loader loads Perl modules from cpan.org · 2db1a754
      Franck Bret authored
      For each origin it calls an http api endpoint to retrieve extrinsic
      metadata for each version of a module.
      Author and package description are extracted from intrinsic metadata
      parsing data from META.json or META.yml at the root of the archive.
      
      Related T2833
      2db1a754
  3. Sep 28, 2022
    • Antoine Lambert's avatar
      discovery: Fix compatibility with storage RPC API · 7375a83c
      Antoine Lambert authored
      Software Heritage homemade RPC layer does not known how to serialize
      set objects so we need to pass lists as parameters of *_missing
      methods from storage API.
      7375a83c
    • Raphaël Gomès's avatar
      Setup async interface for discovery module · 1facea3c
      Raphaël Gomès authored
      This will allow us to use this interface in async code like ``swh-scanner``.
      
      Unfortunately, this means calling ``asyncio.run`` for sync code, but the
      performance impact should be negligible.
      
      The ``swh_storage.*missing*`` APIs are inconsistent for each type, which
      requires a lot of boilerplate code. This should be addressed in a
      follow-up.
      1facea3c
  4. Sep 26, 2022
    • Raphaël Gomès's avatar
      Use a Merkle discovery algorithm with archives · 798f749e
      Raphaël Gomès authored
      "Discovery" is the term used to find out the differences between two
      Merkle graphs. Using such an algorithm is useful in that it drastically
      reduces the amount of data that needs to be transferred.
      
      This commit introduces an efficient but simple algorithm that is a good
      starting point for improved performance: random sampling of directories,
      the details of which are explained in the docstrings.
      
      Mercurial uses a more sophisticated algorithm for its discovery, but it
      is quite a bit more involved and would introduce too much complexity at
      once. Also, the constraints for speed that Mercurial has (in the order
      of milliseconds) don't apply as obviously to this context without
      further investigation.
      
      Benchmarks
      ==========
      
      Setup
      -----
      - With a local postgresql storage (so no network overhead), a local
        tmpfs obstorage on a fast NVME SSD, all of which should make this
        improvement look less good than it will be in production
      - With a tarball of the linux kernel at commit
        d96d875ef5dd372f533059a44f98e92de9cf0d42 already loaded
      - Loading a tarball of 20 commits earlier
        (bf3f401db6cbe010095fe3d1e233a5fde54e8b78)
      - Only taking into account the loading (not the downloading of the
        tarball, or its decompression)
      
      Result
      ------
      
      before: ~30s
      after: ~17s
      
      Reproduced 4 times.
      798f749e
    • Antoine Lambert's avatar
      26fe954b
  5. Sep 21, 2022
  6. Sep 20, 2022
  7. Sep 19, 2022
  8. Sep 13, 2022
  9. Sep 09, 2022
  10. Sep 05, 2022
  11. Aug 30, 2022
  12. Aug 29, 2022
  13. Aug 26, 2022
  14. Aug 19, 2022
  15. Aug 08, 2022
    • vlorentz's avatar
      Initialize 'status' before try block · 43597c48
      vlorentz authored
      It seems that despite setting it in the 'except BaseException' block,
      it is still occasionally undefined in the 'finally' block when
      triggered by a SystemExit exception.
      
      This should hopefully prevent UnboundLocalError from
      being raised from the 'finally' block from now on
      43597c48
  16. Aug 03, 2022
  17. Jun 29, 2022
  18. Jun 21, 2022
  19. Jun 17, 2022
    • Franck Bret's avatar
      Arch Linux loader · b6af2638
      Franck Bret authored
      Fetch Arch linux packages from lister discovered origins.
      For each origin it get versions from extra_loader_arguments['artifacts']
      
      Arch Linux package can comes as .xz or .zst file archive.
      Support for .zst (Zstandard compression) has been requested with D7993.
      
      Related to T4233
      b6af2638
  20. Jun 07, 2022
    • Antoine Lambert's avatar
      package/archive: Handle tarball artifact with null time · d925d06e
      Antoine Lambert authored
      An artifact without time info can be provided in the artifacts list
      parameter of the loader (for instance last modification date
      is not available for tarballs coming from github releases).
      
      That case was not handled by the archive loader wich was resulting
      in loading error so add fix for it.
      d925d06e
  21. May 20, 2022
  22. May 16, 2022
    • Antoine Lambert's avatar
      loader: Ensure success is False when entering exception handler · d9a6ba05
      Antoine Lambert authored
      The post_load method of a loader can raise an exception so we must
      ensure to turn back the success variable to False in that case.
      
      For instance, the subversion loader post_load checks that latest
      exported revision is consistent with what the official subversion
      client produces. If it is not an exception will be raised to set
      the visit status to partial.
      v3.4.2
      d9a6ba05
  23. May 13, 2022
Loading