- Nov 04, 2022
-
-
Nicolas Dandrimont authored
With the proper implementation of packfile negotiation, remotes can return packfiles that do not contain all of our wanted objects. Consider the following two histories: * c1 * c1 ← [refs/tags/original] ↑ ↑
* c2 ← [refs/heads/main] | * c3 ← [refs/heads/main] * c2 ← [refs/heads/broken] The first visit of the origin would load commits c1 and c2, and write a snapshot referencing c2. During the second visit, the loader would tell the origin that c2 is known, and that c1 and c3 are wanted (as new heads). The origin, knowing that c1 is a parent of c2, would be allowed by the git protocol to send a packfile containing only c3. Under these circumstances, the loader cannot tell what object type the snapshot branch [refs/tags/original] should point to. The repository in tests has a similar structure ([refs/heads/master] is in the history of [refs/tags/branch2-before-delete]), so refactor the incremental load test to exercise this specific behavior. This test can be moved to the common tests as well. -
Nicolas Dandrimont authored
Even though this is only HEAD, we should make sure that it's filtered anyway.
-
Nicolas Dandrimont authored
In terms of mypy, this function is just doing some types-washing anyway.
-
Nicolas Dandrimont authored
As dulwich's client.fetch_pack expects an instance of history graph walker with set of known heads, move the local heads caching from `determine_wants` to the RepoRepresentation initialization logic. Our previous code would always initialize the graph walker with an empty set of heads (as the `graph_walker()` method is called before `determine_wants()` has run, so `self.heads` was always empty), so we would never actually fetch an incremental pack file.
-
- Nov 03, 2022
-
-
Nicolas Dandrimont authored
-
- Oct 31, 2022
-
-
Antoine Lambert authored
ShaFile.get_type was deprecated and has been removed. New typings have been added in dulwich that trigerred a new mypy error.
-
- Oct 25, 2022
- Oct 19, 2022
-
-
Antoine Lambert authored
-
- Oct 18, 2022
-
-
David Douard authored
- pre-commit from 4.1.0 to 4.3.0, - codespell from 2.2.1 to 2.2.2, - black from 22.3.0 to 22.10.0 and - flake8 from 4.0.1 to 5.0.4. Also freeze flake8 dependencies. Also change flake8's repo config to github (the gitlab mirror being outdated).
-
Antoine Lambert authored
Use helper fixture loading_task_creation_for_listed_origin_test from swh-loader-core and remove redundant tests.
-
- Jul 19, 2022
-
- Jun 16, 2022
-
-
Antoine Lambert authored
Dulwich 0.20.43 dropped the double caching of HTTP responses so we can now remove comments about that issue. Related to T4311
-
- May 24, 2022
-
- May 20, 2022
- May 17, 2022
-
-
vlorentz authored
-
- May 16, 2022
-
-
vlorentz authored
base_snapshot_reverse_branches needs to contain all objects that may be a snapshot target that the remote did not send to us. Because we now use all snapshots to build the "have" list, such targets include all targets of snapshots of "parent" origins, not just the previous snapshot of the current origin. This typically happens when a forge-fork pull branches from its parent. Resolves Sentry issue [[ https://sentry.softwareheritage.org/share/issue/1c08f5d764e7494e83ba254dc47f17af/ | SWH-LOADER-GIT-102 ]]
-
vlorentz authored
Only instances of (subclasses of) AbstractHttpGitClient have this attribute. For other instances, we can consider it to be False, because the dumb protocol only exists over HTTP(S). This issue was found by mypy, thanks to the addition of type annotations to changes in dulwich 0.20.36 affecting mypy's type inference.
-
- May 13, 2022
-
-
vlorentz authored
Before this commit, determine_wants() used the origin's last snapshot if any, or the closest parent's snapshot if not. However, we noticed that many repositories that are very slow to load are forks that were already visited, but their owner rebased it on the parent since the last visit, causing potentially many commits to be added to the origin. This ensures we do not needlessly fetch these new commits when we already loaded the parent.
-
- May 06, 2022
-
- May 02, 2022
-
-
Pratyush authored
-
- Apr 27, 2022
-
-
Antoine Lambert authored
Recent changes in swh-scheduler add new parameters to the celery tasks produced from swh.scheduler.model.ListedOrigin instances. So ensure to handle any new parameters by not hardcoding the expected ones in task signatures. Rename date parameter to visit_date in from disk loader tasks and make it non mandatory. Add new tests checking task parameters produced from ListedOrigin instances do no raise error when attempting to create a git loader. Related to T4187
- Apr 26, 2022
-
-
vlorentz authored
-
- Apr 21, 2022
-
-
Antoine Lambert authored
That hook can be frustrating as it can discard a long commit message if it finds a typo in it so better removing it.
-
vlorentz authored
-
- Apr 20, 2022
-
-
vlorentz authored
-
- Apr 08, 2022
-
-
Antoine Lambert authored
-
Antoine Lambert authored
Related to T3922
-
Antoine Lambert authored
black is considered stable since release 22.1.0 and the version we are currently using is quite outdated and not compatible with click 8.1.0, so it is time to bump it to its latest stable release. Please note that E501 pycodestyle warning related to line length is replaced by B950 one from flake8-bugbear as recommended by black. https://black.readthedocs.io/en/stable/the_black_code_style/current_style.html#line-length Related to T3922
-
- Mar 22, 2022
-
-
Antoine Lambert authored
Due to test modules being copied in subdirectories of the build directory by setuptools, it makes pytest fail by raising ImportPathMismatchError exceptions when invoked from root directory of the module. So ignore the build folder to discover tests.
-
- Feb 10, 2022
-
-
Antoine Lambert authored
To install the new hook: $ pre-commit install -t commit-msg
-
- Jan 21, 2022
-
-
Antoine R. Dumont authored
This currently fails the origin visit and update the visit status to 'failed'. This got listed by listers but current access to such origin is actually private, it'd probably make sense to make the status of the visit as not_found instead. This takes care of the most frequent issue so (460k) [1]. [1] https://sentry.softwareheritage.org/share/issue/3a3663f8cc424a48999af28728152ef0/
-
- Jan 14, 2022
-
-
vlorentz authored
swh-model 5.0.0 removes these arguments from the constructor.
-
vlorentz authored
This allows representing git trees with disordered entries, as the "normal" data model requires them to be sorted.
-
vlorentz authored
This allows representing all git objects instead of rejecting objects that do not fit in our "normal" data model. This commit is restricted to revisions and releases for now, a future commit will add directories.
-
- Jan 11, 2022
-
-
Antoine Lambert authored
urljoin does not produce the same output if the base URL does not have a trailing slash. >>> from urllib.parse import urljoin >>> urljoin("https://git.example.org/repo", "info/refs") 'https://git.example.org/info/refs' >>> urljoin("https://git.example.org/repo/", "info/refs") 'https://git.example.org/repo/info/refs' So ensure the base URL ends with a slash to avoid generating invalid URLs and make loading failed.
-