Skip to content
Snippets Groups Projects
  1. Sep 28, 2021
    • Jenkins for Software Heritage's avatar
    • Antoine Lambert's avatar
      loader: Add support for dumb HTTP transfer protocol · d3976ca6
      Antoine Lambert authored
      Git supports two HTTP based transfer protocols to exchange data
      between two repositories: the dumb protocol and the smart protocol.
      
      Nowadays, the smart protocol is a common method of transferring
      data because it is more efficient but there is still some git
      servers in the wild that only support the dumb protocol.
      
      Unfortunately the dulwich package does not support such protocol
      so this kind of git repository could not be loaded into the archive.
      
      That commit adds support to load such git repository by fetching
      objects according to the dumb HTTP transfer protocol specification.
      
      Related to T2489
      v1.1.0
      d3976ca6
  2. Sep 22, 2021
  3. Sep 21, 2021
  4. Sep 17, 2021
  5. Sep 16, 2021
  6. Aug 09, 2021
  7. Aug 06, 2021
  8. Aug 03, 2021
  9. Jul 30, 2021
  10. Jul 26, 2021
  11. Jun 09, 2021
  12. Jun 08, 2021
  13. May 11, 2021
  14. Apr 26, 2021
    • Antoine Lambert's avatar
      tox: Add sphinx environments to check sane doc build · 15e12fae
      Antoine Lambert authored
      Enable to check package documentation can be built without producing
      sphinx warnings.
      
      The sphinx environment is designed to be used in continuous integration
      in order to prevent breaking documentation build when committing changes.
      
      The sphinx-dev environment is designed to be used inside a full swh
      development environment.
      
      Related to T3258
      15e12fae
  15. Apr 16, 2021
  16. Apr 07, 2021
  17. Apr 04, 2021
  18. Mar 16, 2021
    • vlorentz's avatar
      Rename 'git_metadata' to 'extra_headers' · 1eb1c573
      vlorentz authored
      Because they are now stored in the 'extra_headers' field instead
      of the 'metadata' field.
      
      Motivation: consistency + keep it out of 'grep metadata */swh/ -r'
      1eb1c573
  19. Feb 25, 2021
    • Jenkins for Software Heritage's avatar
    • Nicolas Dandrimont's avatar
      Hardcode the use of the tcp transport for GitHub origins · 342f8fde
      Nicolas Dandrimont authored
      This change is necessary because of a shortcoming in the Dulwich HTTP
      transport: even if the Dulwich API lets us process the packfile in
      chunks as it's received, the HTTP transport implementation needs to
      entirely allocate the packfile in memory *twice*, once in the HTTP
      library, and once in a BytesIO managed by Dulwich, before passing it on
      to us as a chunked reader. Overall this triples the memory usage before
      we can even try to interrupt the loader before it overruns its memory limit.
      
      In contrast, the Dulwich TCP transport just gives us the read handle on
      the underlying socket, doing no processing or copying of the bytes. We
      can interrupt it as soon as we've received too many bytes.
      v0.9.0
      342f8fde
    • Nicolas Dandrimont's avatar
      Stop processing packfiles before sending objects · 61afbc56
      Nicolas Dandrimont authored
      Since its creation, the git loader would process the packfile downloaded
      from the remote repository, to make an index of all objects, filtering
      them before sending them on to the storage. Since this functionality has
      been implemented as a filter proxy in the storage API itself, the
      built-in filtering by the git loader is now redundant.
      
      The way the filtering was implemented in the loader would run through
      the packfile six times: once for the basic object id indexing, once to
      get content ids, then once for each object type. This change removes the
      first two runs. By eschewing the double filtering, we should also reduce
      the load on the backend storage (we would call the <object_type>_missing
      endpoints twice).
      
      Finally, as this change removes the global index of objects, and sends
      the converted objects to the storage as soon as they're read, the memory
      usage decreases substantially for large loads.
      61afbc56
    • Nicolas Dandrimont's avatar
      5e434d6f
  20. Feb 23, 2021
  21. Feb 17, 2021
  22. Feb 12, 2021
  23. Feb 11, 2021
  24. Feb 03, 2021
  25. Feb 02, 2021
  26. Nov 24, 2020
  27. Nov 23, 2020
  28. Nov 13, 2020
  29. Oct 02, 2020
Loading