Skip to content
Snippets Groups Projects
  1. Jan 10, 2022
  2. Dec 20, 2021
    • vlorentz's avatar
      tests: Remove the SWHTag mock, use dulwich.objects.Tag instead. · 0cc96c25
      vlorentz authored
      This mock was clunky because it didn't actually behave much like
      dulwich's Tag.
      
      Additionally, a future commit will need to access the as_raw_chunks()
      method of ShaFile objects, so SWHTag isn't suitable anymore as it
      would need to diverge even more by implementing its own serialization.
      0cc96c25
  3. Dec 16, 2021
  4. Oct 28, 2021
  5. Oct 21, 2021
  6. Oct 20, 2021
  7. Oct 11, 2021
  8. Oct 05, 2021
  9. Oct 01, 2021
  10. Sep 30, 2021
  11. Sep 28, 2021
  12. Sep 21, 2021
  13. Sep 16, 2021
  14. Aug 09, 2021
  15. Aug 06, 2021
  16. Jul 30, 2021
  17. Jul 26, 2021
  18. Jun 09, 2021
  19. May 11, 2021
  20. Apr 26, 2021
    • Antoine Lambert's avatar
      tox: Add sphinx environments to check sane doc build · 15e12fae
      Antoine Lambert authored
      Enable to check package documentation can be built without producing
      sphinx warnings.
      
      The sphinx environment is designed to be used in continuous integration
      in order to prevent breaking documentation build when committing changes.
      
      The sphinx-dev environment is designed to be used inside a full swh
      development environment.
      
      Related to T3258
      15e12fae
  21. Apr 07, 2021
  22. Apr 04, 2021
  23. Mar 16, 2021
    • vlorentz's avatar
      Rename 'git_metadata' to 'extra_headers' · 1eb1c573
      vlorentz authored
      Because they are now stored in the 'extra_headers' field instead
      of the 'metadata' field.
      
      Motivation: consistency + keep it out of 'grep metadata */swh/ -r'
      1eb1c573
  24. Feb 25, 2021
    • Nicolas Dandrimont's avatar
      Hardcode the use of the tcp transport for GitHub origins · 342f8fde
      Nicolas Dandrimont authored
      This change is necessary because of a shortcoming in the Dulwich HTTP
      transport: even if the Dulwich API lets us process the packfile in
      chunks as it's received, the HTTP transport implementation needs to
      entirely allocate the packfile in memory *twice*, once in the HTTP
      library, and once in a BytesIO managed by Dulwich, before passing it on
      to us as a chunked reader. Overall this triples the memory usage before
      we can even try to interrupt the loader before it overruns its memory limit.
      
      In contrast, the Dulwich TCP transport just gives us the read handle on
      the underlying socket, doing no processing or copying of the bytes. We
      can interrupt it as soon as we've received too many bytes.
      v0.9.0
      342f8fde
    • Nicolas Dandrimont's avatar
      Stop processing packfiles before sending objects · 61afbc56
      Nicolas Dandrimont authored
      Since its creation, the git loader would process the packfile downloaded
      from the remote repository, to make an index of all objects, filtering
      them before sending them on to the storage. Since this functionality has
      been implemented as a filter proxy in the storage API itself, the
      built-in filtering by the git loader is now redundant.
      
      The way the filtering was implemented in the loader would run through
      the packfile six times: once for the basic object id indexing, once to
      get content ids, then once for each object type. This change removes the
      first two runs. By eschewing the double filtering, we should also reduce
      the load on the backend storage (we would call the <object_type>_missing
      endpoints twice).
      
      Finally, as this change removes the global index of objects, and sends
      the converted objects to the storage as soon as they're read, the memory
      usage decreases substantially for large loads.
      61afbc56
    • Nicolas Dandrimont's avatar
      5e434d6f
  25. Feb 23, 2021
  26. Feb 17, 2021
  27. Feb 11, 2021
Loading