- Feb 25, 2021
-
-
Nicolas Dandrimont authored
This change is necessary because of a shortcoming in the Dulwich HTTP transport: even if the Dulwich API lets us process the packfile in chunks as it's received, the HTTP transport implementation needs to entirely allocate the packfile in memory *twice*, once in the HTTP library, and once in a BytesIO managed by Dulwich, before passing it on to us as a chunked reader. Overall this triples the memory usage before we can even try to interrupt the loader before it overruns its memory limit. In contrast, the Dulwich TCP transport just gives us the read handle on the underlying socket, doing no processing or copying of the bytes. We can interrupt it as soon as we've received too many bytes.
-
Nicolas Dandrimont authored
Since its creation, the git loader would process the packfile downloaded from the remote repository, to make an index of all objects, filtering them before sending them on to the storage. Since this functionality has been implemented as a filter proxy in the storage API itself, the built-in filtering by the git loader is now redundant. The way the filtering was implemented in the loader would run through the packfile six times: once for the basic object id indexing, once to get content ids, then once for each object type. This change removes the first two runs. By eschewing the double filtering, we should also reduce the load on the backend storage (we would call the <object_type>_missing endpoints twice). Finally, as this change removes the global index of objects, and sends the converted objects to the storage as soon as they're read, the memory usage decreases substantially for large loads.
-
Nicolas Dandrimont authored
- Feb 23, 2021
-
-
Antoine R. Dumont authored
-
- Feb 17, 2021
-
-
Antoine R. Dumont authored
Note that this also updated some docstrings and type along the way. Related to T1410
- Feb 12, 2021
- Feb 11, 2021
-
-
Antoine R. Dumont authored
When the initial communication with the git server is failing initially (e.g repository is not found), this marks the visit status as not_found. When the initial communication is ok but a failure occurs during the fetch step (e.g pack file too big, ...), the visit status is marked as failed. Related to T3030
-
Antoine R. Dumont authored
With the new loader.core 0.17, failed or partial status changed slightly. This adds the necessary tests to explicit those. Related to T3030
-
- Feb 03, 2021
- Feb 02, 2021
-
-
Antoine R. Dumont authored
-
- Nov 24, 2020
-
-
Antoine Lambert authored
-
- Nov 23, 2020
-
-
Antoine Lambert authored
dulwich recently adds PEP-561 compatibility so ignore typecheck for older dulwich versions.
-
- Nov 13, 2020
-
-
Antoine R. Dumont authored
This drops the older cli in swh.loader.git.from_disk which was broken and not covered by test. Related to T2770#52497
- Oct 02, 2020
-
-
Antoine R. Dumont authored
-
Stefano Zacchiroli authored
-
Antoine R. Dumont authored
Related to T1532 T1410 D3965
- Sep 25, 2020
-
-
Nicolas Dandrimont authored
-
- Sep 18, 2020
-
-
vlorentz authored
They are an implementation detail of storage backends; and get_stats() will stop returning the 'person' count in the next version of swh-storage-core.
-
- Sep 17, 2020
-
-
Antoine Lambert authored
Related to T2610
-
Antoine Lambert authored
Related to T2610
-
Antoine Lambert authored
flake8 hook has been removed from https://github.com/pre-commit/pre-commit-hooks so now use the one from https://gitlab.com/pycqa/flake8
-
- Aug 25, 2020
-
-
vlorentz authored
pytest wastes a lot of time in .hypothesis and .git; this commit excludes them.
-
- Aug 10, 2020
-
-
vlorentz authored
snapshot_get is deprecated.
-
- Aug 06, 2020
-
-
Antoine R. Dumont authored
Fixes build [1] [1] https://jenkins.softwareheritage.org/job/DLDG/job/tests/768/console Related to T2517
- Jul 30, 2020
-
-
Antoine R. Dumont authored
Fixes the build [1] [1] https://jenkins.softwareheritage.org/job/DLDG/job/tests/759/console
-
- Jul 28, 2020
- Jul 26, 2020
-
-
Antoine R. Dumont authored
Related to T645
-
Antoine R. Dumont authored
Related to T2105
-
Antoine R. Dumont authored
This should fix the debian package build [1] [1] https://jenkins.softwareheritage.org/view/Debian%20packages/job/debian/job/packages/job/DLDG/job/gbp-buildpackage/47/console
- Jul 17, 2020
-
-
Antoine R. Dumont authored
Related to T2484
-
Antoine R. Dumont authored
Related to T2494
- Jul 16, 2020