- Jan 24, 2024
-
-
Nicolas Dandrimont authored
This is useful to override the default settings of the requests Session, e.g. certificate verification of connect/read timeouts.
-
Nicolas Dandrimont authored
This is useful to override the default settings of the dulwich urllib3 adapter, e.g. certificate verification of connect/read timeouts.
-
Nicolas Dandrimont authored
-
Nicolas Dandrimont authored
Newer versions of git create a ".rev" file next to the existing ".pack" and ".idx", making the nb_files inconsistent.
-
- Jan 16, 2024
-
-
Antoine Lambert authored
A utility function was renamed in swh-loader-core.
-
Antoine Lambert authored
If the submodules parameter of the loader is True but no .gitmodules file is found in root directory of the repository, the repository path is not yielded and thus its loading discarded.
-
- Jan 08, 2024
-
-
Antoine Lambert authored
It indicates if submodules should be retrieved after the git checkout operation as some guix origins require it. Related to #4751.
-
- Dec 05, 2023
-
-
David Douard authored
-
- Dec 04, 2023
-
-
David Douard authored
-
- Dec 03, 2023
-
-
David Douard authored
-
- Nov 27, 2023
-
-
Jérémy Bobbio (Lunar) authored
-
- Nov 20, 2023
-
-
David Douard authored
Make it valid for pypi.
-
David Douard authored
Convert README from markdown to ReST to make it embeddable in docs/index.rst
-
- Nov 17, 2023
-
-
David Douard authored
Seems for some reason we do not want to install the package as editable any more now...
-
David Douard authored
-
David Douard authored
This later version changed the API of the directory filetering mechanism in BaseDirectoryLoader (path are now expected to be bytes).
-
- Oct 09, 2023
-
-
Antoine Lambert authored
It fixes some cases where the tag of interest was not fetched. Related to #4751
-
- Oct 05, 2023
-
-
Antoine Lambert authored
Ensure to remove trailing slash in git URL when computing its basename as an empty string is returned otherwise. When a shallow fetch failed, typically when the ref is a commit short hash, retry a full fetch in order for ref checkout to succeed. Related to #4751.
-
Antoine Lambert authored
It has been observed that the process used by SWH to checkout a remote git reference can lead to different recursive nar hash values compared to those computed by guix. This seems related to CR/LF normalization. So prefer to align the process to checkout a remote git reference with the one used by guix. It seems also faster than the previous approach. Also refine the not found repository detection process as previously some non related git errors could be missed. Related to #4751.
-
- Sep 18, 2023
-
-
Antoine Lambert authored
The git directory loader is used to archive guix source packages where source code is located in a git repository at a specific reference. To ensure SWH archives the exact same set of source code files for a guix package, the recursive NAR hash of the source code directory is computed and compared against the one computed by guix. Previously the loader was always fetching git submodules if some were set for the git repository but guix only fetch those for a couple of packages and not for all git based ones. This could result in directory hash mismatch when the loader fetches the submodules while it should have not. In order to woraround this, first compute the NAR hash without fetching submodules and if this results in a directory hash mismatch then retry the operation with the submodules fetched. Related to #4751.
-
- Aug 24, 2023
-
-
Antoine R. Dumont authored
Without this, some git clones are failing to be ingested because they referenced submodule which is not initialized. This results in hash mismatch since the git tree checkouted does not match the upstream nix/guix manifest. Refs. #4751
-
- Aug 22, 2023
-
-
Antoine R. Dumont authored
-
- Aug 21, 2023
-
-
Antoine R. Dumont authored
Inspired from the pip cloning step [1]. This makes the cloning steps only fetch the commit information and the tree at the current heads. Then a subsequent switch (checkout) retrieves the tree at the reference we want. In effect, this retrieves way faster the necessary tree needed to ingest the repository. [1] it uses a blobless cloning though.
-
- Aug 07, 2023
-
-
Antoine R. Dumont authored
Refs. swh/meta#3781
-
- Jul 03, 2023
-
-
Antoine Lambert authored
Previous commit modified the dumb.check_protocol function to raise an HTTPError exception when the request to check dumb protocol support failed. As NotFound exception inherits from ValueError, the code for checking dumb protocol support was executed even when a repository was not found. So an HTTPError exception was raised with a 404 status code and the NotFound exception was no longer propagated to the base loader class, resulting in a failed visit status instead of a not_found one.
-
- Jun 14, 2023
-
-
Antoine Lambert authored
Some network issues can also happen when checking a git repository can be cloned using the dump protocol so add HTTP retry feature to the check_protocol function.
-
Antoine Lambert authored
-
- Jun 09, 2023
-
-
Antoine R. Dumont authored
Most cli uses the - as separator and not the _ `git_disk` is kept as is because it's old.
-
Antoine R. Dumont authored
This also fixes the git checkout related loader and task inconsistently named. Refs. swh/infra/sysadm-environment#4906
-
Antoine Lambert authored
It enables to use the loader through the following command. $ swh loader run git_checkout <url> ref=<ref> checkums=<checksums>
-
- Jun 07, 2023
-
-
Antoine R. Dumont authored
The 0.21.4.1 is actually broken.
-
- Jun 06, 2023
-
-
Antoine R. Dumont authored
Refs. swh/meta#4979
-
- Jun 05, 2023
-
-
Antoine R. Dumont authored
Otherwise, we'd lose the context in the snapshot. Refs. swh/meta#4979
-
Antoine R. Dumont authored
This unifies with other swh import.
-
Antoine R. Dumont authored
-
- Jun 01, 2023
-
-
Antoine R. Dumont authored
This provides the method `fetch_artifact`. It clones a repository at a specific branch, tag or commit and ingests the DAG objects from the resulting directory tree. It also checks the Nar checksums if provided. Refs. swh/meta#4979 Co-Authored: Antoine Lambert <antoine.lambert@inria.fr>
-
- May 05, 2023
-
-
vlorentz authored
`_parse_message` has a return type annotation since https://github.com/jelmer/dulwich/commit/a8df40933d5fbde613b6019c6e7eb4606756ea06
-
- Apr 26, 2023
-
-
Antoine Lambert authored
Those cases have been observed when trying to clone no longer existing BitBucket repositories. Fix #4750.
-
- Apr 20, 2023
-
-
Antoine Lambert authored
GitHub API provides for each repository the pack file size in kibibytes corresponding to a full clone. As metadata for a GitHub repository are fetched at the beginning of the loading process (currently only for origins discovered by the github lister), parse their raw JSON bytes and store pack file size as a loader attribute. Then, before fetching the pack file for a github origin without any base snapshot in the archive, check the pack file size is not greater than the threshold defined by the loader. If it is the case, abort the loading in order to save some network bandwidth. Related to #3652
-
- Apr 14, 2023
-
-
Antoine Lambert authored
Some network issues can happen when loading a git repository using the dump protocol so add HTTP retry feature to the GitObjectsFetcher class.
-