Skip to content

cookers/git_bare: Speedup repository cooking with multi-threading

Previously when cooking a git bare repository, contents bytes were fetched sequentially which could take a good amount of time for an origin with a large revisions history.

In order to speedup the cooking process, retrieve the contents bytes in parallel with the help of the concurrent.futures module from the Python standard library which fits particularly well for making loops of I/O-bound tasks concurrent and for issuing tasks asynchronously.

Below are the timings for cooking a git bare repository for the currently archived tip revision of swh-model.

  • Without multi-threading:
$ time swh -l DEBUG vault cook -C /tmp/vault.yml --bundle-type git_bare swh:1:rev:51b5aa94f13c4bd7358475d78fb7d5684cfb6fd1 /tmp/git_repo.tar

real    16m43,282s
user    0m13,142s
sys     0m1,462s
  • With multi-threading:
$ time swh -l DEBUG vault cook -C /tmp/vault.yml --bundle-type git_bare swh:1:rev:51b5aa94f13c4bd7358475d78fb7d5684cfb6fd1 /tmp/git_repo.tar

real    2m23,676s
user    0m13,520s
sys     0m1,310s

The code that retrieves directory data in parallel has also been ported to concurrent.futures in another commit.

Merge request reports