- Sep 28, 2021
-
-
Antoine Lambert authored
Git supports two HTTP based transfer protocols to exchange data between two repositories: the dumb protocol and the smart protocol. Nowadays, the smart protocol is a common method of transferring data because it is more efficient but there is still some git servers in the wild that only support the dumb protocol. Unfortunately the dulwich package does not support such protocol so this kind of git repository could not be loaded into the archive. That commit adds support to load such git repository by fetching objects according to the dumb HTTP transfer protocol specification. Related to T2489
- Sep 22, 2021
- Sep 21, 2021
-
- Sep 17, 2021
- Sep 16, 2021
- Aug 09, 2021
-
-
vlorentz authored
-
- Aug 06, 2021
-
-
vlorentz authored
Old versions of Git didn't writer them, eg. see tags refs/tags/v2.6.11 to refs/tags/v2.6.13-rc3 in linux.git
-
- Aug 03, 2021
- Jul 30, 2021
-
-
vlorentz authored
Since version 0.19.10 (more specifically, this commit: <https://github.com/dulwich/dulwich/commit/72aec0c79fb395689e40f9228df93e2a39cf8fb0>), Dulwich strips GPG signatures from the 'message' attribute of Tag objects, and stores it in a new attribute, 'signature'. This means we were silently dropping all signatures from releases.
- Jul 26, 2021
-
-
vlorentz authored
* Lazy substitution (instead of %) * Log actual error message in the text * Rename variable according to PEP 8
-
- Jun 09, 2021
-
-
Antoine Lambert authored
-
- Jun 08, 2021
- May 11, 2021
-
-
Nicolas Dandrimont authored
-
- Apr 26, 2021
-
-
Antoine Lambert authored
Enable to check package documentation can be built without producing sphinx warnings. The sphinx environment is designed to be used in continuous integration in order to prevent breaking documentation build when committing changes. The sphinx-dev environment is designed to be used inside a full swh development environment. Related to T3258
-
- Apr 16, 2021
- Apr 07, 2021
-
-
Aastha Asthana authored
-
- Apr 04, 2021
-
-
Aastha Asthana authored
-
- Mar 16, 2021
-
-
vlorentz authored
Because they are now stored in the 'extra_headers' field instead of the 'metadata' field. Motivation: consistency + keep it out of 'grep metadata */swh/ -r'
-
- Feb 25, 2021
-
-
Nicolas Dandrimont authored
This change is necessary because of a shortcoming in the Dulwich HTTP transport: even if the Dulwich API lets us process the packfile in chunks as it's received, the HTTP transport implementation needs to entirely allocate the packfile in memory *twice*, once in the HTTP library, and once in a BytesIO managed by Dulwich, before passing it on to us as a chunked reader. Overall this triples the memory usage before we can even try to interrupt the loader before it overruns its memory limit. In contrast, the Dulwich TCP transport just gives us the read handle on the underlying socket, doing no processing or copying of the bytes. We can interrupt it as soon as we've received too many bytes.
-
Nicolas Dandrimont authored
Since its creation, the git loader would process the packfile downloaded from the remote repository, to make an index of all objects, filtering them before sending them on to the storage. Since this functionality has been implemented as a filter proxy in the storage API itself, the built-in filtering by the git loader is now redundant. The way the filtering was implemented in the loader would run through the packfile six times: once for the basic object id indexing, once to get content ids, then once for each object type. This change removes the first two runs. By eschewing the double filtering, we should also reduce the load on the backend storage (we would call the <object_type>_missing endpoints twice). Finally, as this change removes the global index of objects, and sends the converted objects to the storage as soon as they're read, the memory usage decreases substantially for large loads.
-
Nicolas Dandrimont authored
- Feb 23, 2021
-
-
Antoine R. Dumont authored
-
- Feb 17, 2021
-
-
Antoine R. Dumont authored
Note that this also updated some docstrings and type along the way. Related to T1410
- Feb 12, 2021
- Feb 11, 2021
-
-
Antoine R. Dumont authored
When the initial communication with the git server is failing initially (e.g repository is not found), this marks the visit status as not_found. When the initial communication is ok but a failure occurs during the fetch step (e.g pack file too big, ...), the visit status is marked as failed. Related to T3030
-
Antoine R. Dumont authored
With the new loader.core 0.17, failed or partial status changed slightly. This adds the necessary tests to explicit those. Related to T3030
-
- Feb 03, 2021
- Feb 02, 2021
-
-
Antoine R. Dumont authored
-
- Nov 24, 2020
-
-
Antoine Lambert authored
-
- Nov 23, 2020
-
-
Antoine Lambert authored
dulwich recently adds PEP-561 compatibility so ignore typecheck for older dulwich versions.
-
- Nov 13, 2020
-
-
Antoine R. Dumont authored
This drops the older cli in swh.loader.git.from_disk which was broken and not covered by test. Related to T2770#52497
- Oct 02, 2020