Skip to content
Snippets Groups Projects
  1. Oct 20, 2021
  2. Oct 11, 2021
  3. Oct 05, 2021
  4. Oct 01, 2021
  5. Sep 30, 2021
  6. Sep 28, 2021
  7. Sep 21, 2021
  8. Sep 16, 2021
  9. Aug 09, 2021
  10. Aug 06, 2021
  11. Jul 30, 2021
  12. Jul 26, 2021
  13. Jun 09, 2021
  14. May 11, 2021
  15. Apr 26, 2021
    • Antoine Lambert's avatar
      tox: Add sphinx environments to check sane doc build · 15e12fae
      Antoine Lambert authored
      Enable to check package documentation can be built without producing
      sphinx warnings.
      
      The sphinx environment is designed to be used in continuous integration
      in order to prevent breaking documentation build when committing changes.
      
      The sphinx-dev environment is designed to be used inside a full swh
      development environment.
      
      Related to T3258
      15e12fae
  16. Apr 07, 2021
  17. Apr 04, 2021
  18. Mar 16, 2021
    • vlorentz's avatar
      Rename 'git_metadata' to 'extra_headers' · 1eb1c573
      vlorentz authored
      Because they are now stored in the 'extra_headers' field instead
      of the 'metadata' field.
      
      Motivation: consistency + keep it out of 'grep metadata */swh/ -r'
      1eb1c573
  19. Feb 25, 2021
    • Nicolas Dandrimont's avatar
      Hardcode the use of the tcp transport for GitHub origins · 342f8fde
      Nicolas Dandrimont authored
      This change is necessary because of a shortcoming in the Dulwich HTTP
      transport: even if the Dulwich API lets us process the packfile in
      chunks as it's received, the HTTP transport implementation needs to
      entirely allocate the packfile in memory *twice*, once in the HTTP
      library, and once in a BytesIO managed by Dulwich, before passing it on
      to us as a chunked reader. Overall this triples the memory usage before
      we can even try to interrupt the loader before it overruns its memory limit.
      
      In contrast, the Dulwich TCP transport just gives us the read handle on
      the underlying socket, doing no processing or copying of the bytes. We
      can interrupt it as soon as we've received too many bytes.
      v0.9.0
      342f8fde
    • Nicolas Dandrimont's avatar
      Stop processing packfiles before sending objects · 61afbc56
      Nicolas Dandrimont authored
      Since its creation, the git loader would process the packfile downloaded
      from the remote repository, to make an index of all objects, filtering
      them before sending them on to the storage. Since this functionality has
      been implemented as a filter proxy in the storage API itself, the
      built-in filtering by the git loader is now redundant.
      
      The way the filtering was implemented in the loader would run through
      the packfile six times: once for the basic object id indexing, once to
      get content ids, then once for each object type. This change removes the
      first two runs. By eschewing the double filtering, we should also reduce
      the load on the backend storage (we would call the <object_type>_missing
      endpoints twice).
      
      Finally, as this change removes the global index of objects, and sends
      the converted objects to the storage as soon as they're read, the memory
      usage decreases substantially for large loads.
      61afbc56
    • Nicolas Dandrimont's avatar
      5e434d6f
  20. Feb 23, 2021
  21. Feb 17, 2021
  22. Feb 11, 2021
  23. Feb 02, 2021
  24. Nov 24, 2020
  25. Nov 23, 2020
  26. Nov 13, 2020
  27. Oct 02, 2020
Loading