Skip to content
GitLab
Projects Groups Snippets
  • /
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
    • Contribute to GitLab
  • Register
  • Sign in
  • M Meta
  • Project information
    • Project information
    • Activity
    • Labels
    • Members
  • Issues 459
    • Issues 459
    • List
    • Boards
    • Service Desk
    • Milestones
  • Snippets
    • Snippets
  • Activity
  • Create a new issue
  • Issue Boards
Collapse sidebar
  • Platform
  • Meta
  • Issues
  • #3273
Closed
Open
Issue created Apr 19, 2021 by vlorentz@vlorentzMaintainer

Use "fork" relationships to speed-up initial load of large repositories

(I'm writing this task just so that I don't forget the idea, but I don't expect it to be actionable in the short term)

To work incrementally, VCS loaders fetch the last snapshot of the origin, which gives them a set of "heads", they can pass to origins, so origins will detect what revisions they don't need to send.

Unfortunately, when someone forks a large repository (such as https://github.com/chromium/chromium) and we see it for the first time, we don't have that snapshot; so the server needs to send all revisions, and we then discard almost all of them, because they are already in the archive.

However, if we could detect new repositories are forks (from extrinsic metadata, from heuristics based on repository names, ...), we could fetch the snapshot from the original repositories and use them as the base to load the fork incrementally


Migrated from T3273 (view on Phabricator)

Assignee
Assign to
Time tracking