Skip to content
GitLab
Projects Groups Snippets
  • /
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
    • Contribute to GitLab
  • Register
  • Sign in
  • S swh-loader-git
  • Project information
    • Project information
    • Activity
    • Labels
    • Members
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributors
    • Graph
    • Compare
  • Issues 28
    • Issues 28
    • List
    • Boards
    • Service Desk
    • Milestones
  • Merge requests 4
    • Merge requests 4
  • CI/CD
    • CI/CD
    • Pipelines
    • Jobs
    • Schedules
  • Deployments
    • Deployments
    • Environments
    • Releases
  • Monitor
    • Monitor
    • Incidents
  • Analytics
    • Analytics
    • Value stream
    • CI/CD
    • Repository
  • Wiki
    • Wiki
  • Snippets
    • Snippets
  • Activity
  • Graph
  • Create a new issue
  • Jobs
  • Commits
  • Issue Boards
Collapse sidebar
  • Platform
  • Development
  • swh-loader-git
  • Issues
  • #4216
Closed
Open
Issue created Apr 29, 2022 by Nicolas Dandrimont@olasdMaintainer

git loader packfile size limit is poorly applied to HTTP(s) repositories

Quoting #3544 (closed):

Just to recap, the original (and pretty much only) problem that this hardcoding of the TCP transport was working around, is that the dulwich http(s) client is fetching the full packfile in memory before streaming it to the user-provided do_pack function. This function is what enforces our pack file size limit.

We're not actually using any negotiation of objects to fetch in the current implementation of the git loader. We only send a static list of known object ids.

Practically, this means that the dulwich http client is streaming the packfile response entirely (via urllib3 (?), in RAM (?)) before passing it in chunks to do_pack which is able to enforce the limit. So we fetch the full response, store it in ram, before dropping it on the floor (rather than rejecting it mid-flight).

This is a dulwich limitation that we should investigate how to patch, to avoid wasting bandwidth and memory.


Migrated from T4216 (view on Phabricator)

Assignee
Assign to
Time tracking