Vault: Add a "git bare" tarball cooker
We currently use the git fast-import format to allow people to retrieve revisions from the archive using the vault. However, this is wildly inefficient and generally a bad idea for several reasons:
- It's a poorly documented format (
man git-fast-import
), non-trivial to understand, and generally niche, which makes development hard - It's lossy. It doesn't always handle signed tags properly as well as tagged trees (see
man git-fast-export
-> /LIMITATIONS). No guarantee of retrieving the same hash in output. - The output is extremely large. The format was designed to be trivial to use as an in-memory piped interchange between a VCS and a git fast-import process, no considerations were given to space utilization. All the objects are deduplicated with no delta encoding.
- Exporting it from the archive is expensive. Only modified files are exported at each commit, so finding out which files where modified requires diffing all the commit trees with their parents, which is stupidly expensive and virtually impossible to parallelize.
A better option would be to create a "git bare" cooker: a bare Git repository (= with no working directory) where we put all the git objects in .git/objects directly. This is very fast and easily parallelizable on our side, and we can recompress all the objects together before caching the bundle by calling git repack-objects.
Once we have this bare repository, we could just create a tarball of it and cache this. Another option would be to investigate the git bundle
format, which apparently serves a similar purpose, but could be simpler to import for the users.
Migrated from T843 (view on Phabricator)