Skip to content

pypi.client: Improve tarballs download time

While working on the npm loader by adapting the code from the PyPI one, I noticed that the download of tarballs was wery slow.

Turned out that this is due to the use of the iter_content method from the requests reponse api [1]. By default, that method iterates on the response content one bytes at a time so the slow download.

Turning the chunk_size parameter of that method to None will read data as it arrives in whatever size the chunks are received and greatly speedup download time.

For instance, before that fix, loading all Sphinx packages took:

$ time python3 -m swh.loader.pypi.loader sphinx
...
real    53m53,489s
user    53m19,212s
sys     0m11,460s

After that fix, that process now takes:

$ time python3 -m swh.loader.pypi.loader sphinx
...
real    2m21,667s
user    0m55,900s
sys     0m10,416s

Migrated from D738 (view on Phabricator)

Merge request reports