- Jun 04, 2024
-
-
Antoine Lambert authored
SWH data model allows an origin to have multiple visit types, in particular a git origin can have visit types 'git' and 'git-checkout'. We must ensure to retrieve the latest snapshot for a git visit type in the git loader implementation as it can break incremental loading of a git origin having both visit types mentioned above. Indeed a 'git-checkout' visit type produces a snapshot with a single branch while a 'git' visit type produces a snapshot containing all branches of the loaded repository. Previously, if the latest snapshot retrieved was produced by a 'git-checkout' visit type, the loader would refetch all branches and associated git objects while most of them have already been archived. Related to swh/meta#5092.
-
- May 30, 2024
-
-
Antoine Lambert authored
Side effect of swh.loader.core v5.18.0 release.
-
- May 15, 2024
-
-
Pierre-Yves David authored
-
- Mar 29, 2024
-
-
David Douard authored
-
- Feb 26, 2024
-
-
Antoine Lambert authored
-
Antoine Lambert authored
Some dumb git servers can send a HEAD file in a legacy format that contains a commit id instead of the string: "ref: <ref_name>". So handle that edge case to avoid an error when loading such repository.
-
- Feb 22, 2024
-
-
Antoine Lambert authored
As with the smart git loader, restrain the maximum size for a pack file to download. Move the code writing pack data bytes and checking size in an utility class to avoid code duplication. Add missing tests covering the cases where the pack size limit is reached.
-
Antoine Lambert authored
When using the requests library to perform HTTP requests, if responses need to be streamed the stream parameter must be set to True to ensure content is downloaded by chunks. Previously, a whole HTTP response was cached in memory which could lead to OOM errors when dealing with a repository with large pack files.
-
- Feb 20, 2024
-
-
Antoine Lambert authored
Related to swh/devel/swh-loader-core@c9b51f8b.
-
- Feb 05, 2024
-
-
Antoine Lambert authored
Related to swh/meta#5075.
-
- Feb 02, 2024
-
-
Nicolas Dandrimont authored
-
- Jan 29, 2024
-
-
Nicolas Dandrimont authored
Git loading tasks can take a pretty long time, and it's not easy to diagnose if it's stuck or if it's just taking a while. Instead of only logging at the end of the task, print a log line after each object type has been fully processed. Also print a log line every 3 minutes while objects are being processed.
-
Nicolas Dandrimont authored
The packfile fetching operation can take a long time. Send one log line every minute while it progresses.
-
Nicolas Dandrimont authored
Instead of dumping the dulwich remote communication stream to stderr, add a separate logger for remote messages, and handle the remote stream as proper log entries.
-
- Jan 24, 2024
-
-
Nicolas Dandrimont authored
This hooks into the right urllib3 and requests settings for both the smart and dumb loader.
-
Nicolas Dandrimont authored
This sets the connect and read timeout for both the smart loader (via urllib3/dulwich) and for the dumb loader (via requests).
-
Nicolas Dandrimont authored
This is useful to override the default settings of the requests Session, e.g. certificate verification of connect/read timeouts.
-
Nicolas Dandrimont authored
This is useful to override the default settings of the dulwich urllib3 adapter, e.g. certificate verification of connect/read timeouts.
-
Nicolas Dandrimont authored
-
Nicolas Dandrimont authored
Newer versions of git create a ".rev" file next to the existing ".pack" and ".idx", making the nb_files inconsistent.
-
- Jan 16, 2024
-
-
Antoine Lambert authored
A utility function was renamed in swh-loader-core.
-
Antoine Lambert authored
If the submodules parameter of the loader is True but no .gitmodules file is found in root directory of the repository, the repository path is not yielded and thus its loading discarded.
-
- Jan 08, 2024
-
-
Antoine Lambert authored
It indicates if submodules should be retrieved after the git checkout operation as some guix origins require it. Related to #4751.
-
- Dec 05, 2023
-
-
David Douard authored
-
- Dec 04, 2023
-
-
David Douard authored
-
- Dec 03, 2023
-
-
David Douard authored
-
- Nov 27, 2023
-
-
Jérémy Bobbio (Lunar) authored
-
- Nov 20, 2023
-
-
David Douard authored
Make it valid for pypi.
-
David Douard authored
Convert README from markdown to ReST to make it embeddable in docs/index.rst
-
- Nov 17, 2023
-
-
David Douard authored
Seems for some reason we do not want to install the package as editable any more now...
-
David Douard authored
-
David Douard authored
This later version changed the API of the directory filetering mechanism in BaseDirectoryLoader (path are now expected to be bytes).
-
- Oct 09, 2023
-
-
Antoine Lambert authored
It fixes some cases where the tag of interest was not fetched. Related to #4751
-
- Oct 05, 2023
-
-
Antoine Lambert authored
Ensure to remove trailing slash in git URL when computing its basename as an empty string is returned otherwise. When a shallow fetch failed, typically when the ref is a commit short hash, retry a full fetch in order for ref checkout to succeed. Related to #4751.
-
Antoine Lambert authored
It has been observed that the process used by SWH to checkout a remote git reference can lead to different recursive nar hash values compared to those computed by guix. This seems related to CR/LF normalization. So prefer to align the process to checkout a remote git reference with the one used by guix. It seems also faster than the previous approach. Also refine the not found repository detection process as previously some non related git errors could be missed. Related to #4751.
-
- Sep 18, 2023
-
-
Antoine Lambert authored
The git directory loader is used to archive guix source packages where source code is located in a git repository at a specific reference. To ensure SWH archives the exact same set of source code files for a guix package, the recursive NAR hash of the source code directory is computed and compared against the one computed by guix. Previously the loader was always fetching git submodules if some were set for the git repository but guix only fetch those for a couple of packages and not for all git based ones. This could result in directory hash mismatch when the loader fetches the submodules while it should have not. In order to woraround this, first compute the NAR hash without fetching submodules and if this results in a directory hash mismatch then retry the operation with the submodules fetched. Related to #4751.
-
- Aug 24, 2023
-
-
Antoine R. Dumont authored
Without this, some git clones are failing to be ingested because they referenced submodule which is not initialized. This results in hash mismatch since the git tree checkouted does not match the upstream nix/guix manifest. Refs. #4751
-
- Aug 22, 2023
-
-
Antoine R. Dumont authored
-
- Aug 21, 2023
-
-
Antoine R. Dumont authored
Inspired from the pip cloning step [1]. This makes the cloning steps only fetch the commit information and the tree at the current heads. Then a subsequent switch (checkout) retrieves the tree at the reference we want. In effect, this retrieves way faster the necessary tree needed to ingest the repository. [1] it uses a blobless cloning though.
-
- Aug 07, 2023
-
-
Antoine R. Dumont authored
Refs. swh/meta#3781
-