Skip to content

Spool large packfiles to disk instead of consuming tons of memory

This lowers memory consumption by writing packfiles above a given threshold to disk. This reduces the memory pressure on workers (but increases the disk churn), and also allows to use the git loader on more, memory constrained, systems.

As there is a single temporary file which we hold open, we can use the default Python tempfile feature which unlinks the temporary file directly, allowing the file to be reaped as soon as the process disappears, even if the process gets killed. This avoids the need for any manual tempfile cleanup.

Related to swh/infra/sysadm-environment#3025

Test Plan

This has been exercised on large repositories (e.g. linux.git yields a packfile that is almost 4GiB).


Migrated from D5657 (view on Phabricator)

Merge request reports