Skip to content
Snippets Groups Projects
  1. Feb 25, 2021
    • Jenkins for Software Heritage's avatar
    • Nicolas Dandrimont's avatar
      Hardcode the use of the tcp transport for GitHub origins · 342f8fde
      Nicolas Dandrimont authored
      This change is necessary because of a shortcoming in the Dulwich HTTP
      transport: even if the Dulwich API lets us process the packfile in
      chunks as it's received, the HTTP transport implementation needs to
      entirely allocate the packfile in memory *twice*, once in the HTTP
      library, and once in a BytesIO managed by Dulwich, before passing it on
      to us as a chunked reader. Overall this triples the memory usage before
      we can even try to interrupt the loader before it overruns its memory limit.
      
      In contrast, the Dulwich TCP transport just gives us the read handle on
      the underlying socket, doing no processing or copying of the bytes. We
      can interrupt it as soon as we've received too many bytes.
      v0.9.0
      342f8fde
    • Nicolas Dandrimont's avatar
      Stop processing packfiles before sending objects · 61afbc56
      Nicolas Dandrimont authored
      Since its creation, the git loader would process the packfile downloaded
      from the remote repository, to make an index of all objects, filtering
      them before sending them on to the storage. Since this functionality has
      been implemented as a filter proxy in the storage API itself, the
      built-in filtering by the git loader is now redundant.
      
      The way the filtering was implemented in the loader would run through
      the packfile six times: once for the basic object id indexing, once to
      get content ids, then once for each object type. This change removes the
      first two runs. By eschewing the double filtering, we should also reduce
      the load on the backend storage (we would call the <object_type>_missing
      endpoints twice).
      
      Finally, as this change removes the global index of objects, and sends
      the converted objects to the storage as soon as they're read, the memory
      usage decreases substantially for large loads.
      61afbc56
    • Nicolas Dandrimont's avatar
      5e434d6f
  2. Feb 23, 2021
  3. Feb 17, 2021
  4. Feb 12, 2021
  5. Feb 11, 2021
  6. Feb 03, 2021
  7. Feb 02, 2021
  8. Nov 24, 2020
  9. Nov 23, 2020
  10. Nov 13, 2020
  11. Oct 02, 2020
  12. Sep 25, 2020
  13. Sep 18, 2020
  14. Sep 17, 2020
  15. Aug 25, 2020
  16. Aug 10, 2020
  17. Aug 06, 2020
  18. Jul 30, 2020
  19. Jul 28, 2020
  20. Jul 26, 2020
  21. Jul 17, 2020
  22. Jul 16, 2020
Loading