- Jan 06, 2023
-
-
Antoine Lambert authored
The git loader can now discover submodules while loading a repository. That process works the following way: 1. Before sending a new directory to archive in the storage, check if it has a ".gitmodules" file in its entries and add the tuple (directory_id, content_sha1git) in a global set if it is the case. 2. During the post_load operation, process each discovered ".gitmodules" file the following way: - retrieve content metadata to get sha1 checksum of file - retrieve .gitmodules content bytes in objstorage from sha1 - parse .gitmodules file content - for each submodule definition: * get git commit id associated to submodule path * check if git commit has been archived by SWH * if not, add the submodule repository URL in a set - for each submodule detected as not archived or partially archived, create a one shot git loading task with high priority in the scheduler database Related to T3311 Related to T3923
-
- Feb 10, 2022
-
-
Antoine Lambert authored
To install the new hook: $ pre-commit install -t commit-msg
-
- Jan 21, 2022
-
-
Antoine R. Dumont authored
This currently fails the origin visit and update the visit status to 'failed'. This got listed by listers but current access to such origin is actually private, it'd probably make sense to make the status of the visit as not_found instead. This takes care of the most frequent issue so (460k) [1]. [1] https://sentry.softwareheritage.org/share/issue/3a3663f8cc424a48999af28728152ef0/
-
- Jan 14, 2022
-
-
vlorentz authored
swh-model 5.0.0 removes these arguments from the constructor.
-
vlorentz authored
This allows representing git trees with disordered entries, as the "normal" data model requires them to be sorted.
-
vlorentz authored
This allows representing all git objects instead of rejecting objects that do not fit in our "normal" data model. This commit is restricted to revisions and releases for now, a future commit will add directories.
-
- Jan 11, 2022
-
-
Antoine Lambert authored
urljoin does not produce the same output if the base URL does not have a trailing slash. >>> from urllib.parse import urljoin >>> urljoin("https://git.example.org/repo", "info/refs") 'https://git.example.org/info/refs' >>> urljoin("https://git.example.org/repo/", "info/refs") 'https://git.example.org/repo/info/refs' So ensure the base URL ends with a slash to avoid generating invalid URLs and make loading failed.
-
- Jan 10, 2022
-
-
vlorentz authored
instead of writing them all at once, which partially defeats the point of using a spooled buffer.
-
vlorentz authored
'requests' does the job just fine with less complexity.
-
vlorentz authored
response.content_type is set by Dulwich, but isn't part of urllib3's HTTPResponse, so we shouldn't rely on it. (And it makes mypy complain when the 'types-urllib3' package is installed)
-
- Dec 20, 2021
-
-
vlorentz authored
This mock was clunky because it didn't actually behave much like dulwich's Tag. Additionally, a future commit will need to access the as_raw_chunks() method of ShaFile objects, so SWHTag isn't suitable anymore as it would need to diverge even more by implementing its own serialization.
-
- Dec 16, 2021
-
-
Antoine R. Dumont authored
This also drops spurious copyright headers to those files if present. Related to T3812
-
- Oct 28, 2021
-
-
Antoine R. Dumont authored
This: - unifies this parameter name with names similar to what's used in lister - also documents it better Related to T3695
-
- Oct 21, 2021
-
-
vlorentz authored
-
- Oct 20, 2021
-
-
Antoine R. Dumont authored
-
- Oct 11, 2021
-
-
vlorentz authored
They are not serialized the same way, so they cause hash mismatches.
-
- Oct 05, 2021
-
-
Antoine Lambert authored
Some dumb git servers might reference a no longer existing pack file while it is possible to fully load a repository without it. So remove bogus pack file from the global packs list when encountering such edge case and try to continue the loading anyway. Related to T3618
-
Antoine Lambert authored
An error was raised previously when trying to fetch HEAD. Related to T3618
-
- Oct 01, 2021
-
-
Antoine R. Dumont authored
-
Antoine R. Dumont authored
-
Antoine R. Dumont authored
This unifies logging instructions with swh packages.
-
- Sep 30, 2021
-
-
Antoine R. Dumont authored
-
- Sep 28, 2021
-
-
Antoine R. Dumont authored
The current conversions done were a bit ambiguous, specifying the types clarifies the need.
-
Antoine Lambert authored
Git supports two HTTP based transfer protocols to exchange data between two repositories: the dumb protocol and the smart protocol. Nowadays, the smart protocol is a common method of transferring data because it is more efficient but there is still some git servers in the wild that only support the dumb protocol. Unfortunately the dulwich package does not support such protocol so this kind of git repository could not be loaded into the archive. That commit adds support to load such git repository by fetching objects according to the dumb HTTP transfer protocol specification. Related to T2489
- Sep 21, 2021
-
-
vlorentz authored
Current Dulwich versions (unconditionally) add \n at the end of tag messages
-
- Sep 16, 2021
-
-
vlorentz authored
This makes sure we don't write corrupt objects to the storage, like the examples in T75.
-
vlorentz authored
-
vlorentz authored
I want to use parametrized tests in a future commit, but pytest does not support them on unittest-style classes. self.subTest() would work too, but I figured it's a good time to migrate these tests to be consistent with the rest of the codebase.
-
vlorentz authored
-
- Aug 09, 2021
-
-
vlorentz authored
-
- Aug 06, 2021
-
-
vlorentz authored
Old versions of Git didn't writer them, eg. see tags refs/tags/v2.6.11 to refs/tags/v2.6.13-rc3 in linux.git
-
- Jul 30, 2021
-
-
vlorentz authored
AFAICT that's only the empty tree, because trees are the only Dulwich object with a __len__, and no Dulwich objects have a __bool__.
-
vlorentz authored
Since version 0.19.10 (more specifically, this commit: <https://github.com/dulwich/dulwich/commit/72aec0c79fb395689e40f9228df93e2a39cf8fb0>), Dulwich strips GPG signatures from the 'message' attribute of Tag objects, and stores it in a new attribute, 'signature'. This means we were silently dropping all signatures from releases.
-
- Jul 26, 2021
-
-
vlorentz authored
* Lazy substitution (instead of %) * Log actual error message in the text * Rename variable according to PEP 8
-
- Jun 09, 2021
-
-
Antoine Lambert authored
-
- May 11, 2021
-
- Apr 26, 2021
-
-
Antoine Lambert authored
Enable to check package documentation can be built without producing sphinx warnings. The sphinx environment is designed to be used in continuous integration in order to prevent breaking documentation build when committing changes. The sphinx-dev environment is designed to be used inside a full swh development environment. Related to T3258
-
- Apr 07, 2021
-