Skip to content
Snippets Groups Projects
  1. Sep 29, 2022
    • Franck Bret's avatar
      Cpan: Cpan loader loads Perl modules from cpan.org · 2db1a754
      Franck Bret authored
      For each origin it calls an http api endpoint to retrieve extrinsic
      metadata for each version of a module.
      Author and package description are extracted from intrinsic metadata
      parsing data from META.json or META.yml at the root of the archive.
      
      Related T2833
      2db1a754
  2. Sep 28, 2022
    • Antoine Lambert's avatar
      discovery: Fix compatibility with storage RPC API · 7375a83c
      Antoine Lambert authored
      Software Heritage homemade RPC layer does not known how to serialize
      set objects so we need to pass lists as parameters of *_missing
      methods from storage API.
      7375a83c
    • Raphaël Gomès's avatar
      Setup async interface for discovery module · 1facea3c
      Raphaël Gomès authored
      This will allow us to use this interface in async code like ``swh-scanner``.
      
      Unfortunately, this means calling ``asyncio.run`` for sync code, but the
      performance impact should be negligible.
      
      The ``swh_storage.*missing*`` APIs are inconsistent for each type, which
      requires a lot of boilerplate code. This should be addressed in a
      follow-up.
      1facea3c
  3. Sep 26, 2022
    • Raphaël Gomès's avatar
      Use a Merkle discovery algorithm with archives · 798f749e
      Raphaël Gomès authored
      "Discovery" is the term used to find out the differences between two
      Merkle graphs. Using such an algorithm is useful in that it drastically
      reduces the amount of data that needs to be transferred.
      
      This commit introduces an efficient but simple algorithm that is a good
      starting point for improved performance: random sampling of directories,
      the details of which are explained in the docstrings.
      
      Mercurial uses a more sophisticated algorithm for its discovery, but it
      is quite a bit more involved and would introduce too much complexity at
      once. Also, the constraints for speed that Mercurial has (in the order
      of milliseconds) don't apply as obviously to this context without
      further investigation.
      
      Benchmarks
      ==========
      
      Setup
      -----
      - With a local postgresql storage (so no network overhead), a local
        tmpfs obstorage on a fast NVME SSD, all of which should make this
        improvement look less good than it will be in production
      - With a tarball of the linux kernel at commit
        d96d875ef5dd372f533059a44f98e92de9cf0d42 already loaded
      - Loading a tarball of 20 commits earlier
        (bf3f401db6cbe010095fe3d1e233a5fde54e8b78)
      - Only taking into account the loading (not the downloading of the
        tarball, or its decompression)
      
      Result
      ------
      
      before: ~30s
      after: ~17s
      
      Reproduced 4 times.
      798f749e
    • Antoine Lambert's avatar
      26fe954b
  4. Sep 21, 2022
  5. Sep 20, 2022
  6. Sep 19, 2022
  7. Sep 13, 2022
  8. Sep 09, 2022
  9. Sep 05, 2022
  10. Aug 30, 2022
  11. Aug 29, 2022
  12. Aug 26, 2022
  13. Aug 19, 2022
  14. Aug 08, 2022
    • vlorentz's avatar
      Initialize 'status' before try block · 43597c48
      vlorentz authored
      It seems that despite setting it in the 'except BaseException' block,
      it is still occasionally undefined in the 'finally' block when
      triggered by a SystemExit exception.
      
      This should hopefully prevent UnboundLocalError from
      being raised from the 'finally' block from now on
      43597c48
  15. Aug 03, 2022
  16. Jun 29, 2022
  17. Jun 21, 2022
  18. Jun 17, 2022
    • Franck Bret's avatar
      Arch Linux loader · b6af2638
      Franck Bret authored
      Fetch Arch linux packages from lister discovered origins.
      For each origin it get versions from extra_loader_arguments['artifacts']
      
      Arch Linux package can comes as .xz or .zst file archive.
      Support for .zst (Zstandard compression) has been requested with D7993.
      
      Related to T4233
      b6af2638
  19. Jun 07, 2022
    • Antoine Lambert's avatar
      package/archive: Handle tarball artifact with null time · d925d06e
      Antoine Lambert authored
      An artifact without time info can be provided in the artifacts list
      parameter of the loader (for instance last modification date
      is not available for tarballs coming from github releases).
      
      That case was not handled by the archive loader wich was resulting
      in loading error so add fix for it.
      d925d06e
  20. May 20, 2022
  21. May 16, 2022
    • Antoine Lambert's avatar
      loader: Ensure success is False when entering exception handler · d9a6ba05
      Antoine Lambert authored
      The post_load method of a loader can raise an exception so we must
      ensure to turn back the success variable to False in that case.
      
      For instance, the subversion loader post_load checks that latest
      exported revision is consistent with what the official subversion
      client produces. If it is not an exception will be raised to set
      the visit status to partial.
      v3.4.2
      d9a6ba05
  22. May 13, 2022
  23. May 06, 2022
Loading