- Mar 19, 2025
-
- Feb 21, 2025
-
-
Antoine Lambert authored
When calling the content_get_data method from the storage interface, ensure to provide all content hashes as parameter to avoid an extra request on the storage server to fetch missing hashes.
-
-
Previously the dumb git loader was only considering refs targeting commits and tags but those can also occasionally target blobs and trees, so ensure to support such refs. Fixes #4756.
-
- Feb 17, 2025
-
-
Antoine Lambert authored
This was used at the time we were building debian packages for swh components but we no longer do that.
-
Antoine Lambert authored
-
Antoine Lambert authored
Bump development tools: mypy, codespell, isort, ... Move all tools configuration in pyproject.toml. Remove no longer needed mypy overrides.
-
- Jan 08, 2025
-
-
Antoine Lambert authored
It can lead to test failures if commit signing is globally enabled in git configuration.
-
Antoine Lambert authored
-
Antoine Lambert authored
-
- Sep 10, 2024
-
-
Antoine Lambert authored
BaseLoader.load now returns a dict with an extra error field when a loading fails.
-
Renaud Boyer authored
-
- Aug 30, 2024
-
-
Antoine Lambert authored
-
- Aug 27, 2024
-
-
David Douard authored
-
- Jun 28, 2024
-
-
Antoine Lambert authored
Latest tenacity release adds some internal changes that broke the mocking of sleep calls in tests. Fix it by directly mocking time.sleep (was not working previously).
-
- Jun 06, 2024
-
-
David Douard authored
as well as in GitCheckoutLoader.
-
- Jun 04, 2024
-
-
Antoine Lambert authored
Previous implementation was building an invalid pack file with REF_DELTA object types as it was using the new object to deltify as the base of the delta. This was leading to errors and undefined behavior after building an index for such a pack file as the deltified objects could not be properly resolved by dulwich (observed by stsp while working on git loader improvements). The bases for deltified objects are now objects that were previously loaded into the archive. Tag objects produced in that test are also ensured to be valid.
-
Antoine Lambert authored
SWH data model allows an origin to have multiple visit types, in particular a git origin can have visit types 'git' and 'git-checkout'. We must ensure to retrieve the latest snapshot for a git visit type in the git loader implementation as it can break incremental loading of a git origin having both visit types mentioned above. Indeed a 'git-checkout' visit type produces a snapshot with a single branch while a 'git' visit type produces a snapshot containing all branches of the loaded repository. Previously, if the latest snapshot retrieved was produced by a 'git-checkout' visit type, the loader would refetch all branches and associated git objects while most of them have already been archived. Related to swh/meta#5092.
-
- May 30, 2024
-
-
Antoine Lambert authored
Side effect of swh.loader.core v5.18.0 release.
-
- May 15, 2024
-
-
Pierre-Yves David authored
-
- Mar 29, 2024
-
-
David Douard authored
-
- Feb 26, 2024
-
-
Antoine Lambert authored
-
Antoine Lambert authored
Some dumb git servers can send a HEAD file in a legacy format that contains a commit id instead of the string: "ref: <ref_name>". So handle that edge case to avoid an error when loading such repository.
-
- Feb 22, 2024
-
-
Antoine Lambert authored
As with the smart git loader, restrain the maximum size for a pack file to download. Move the code writing pack data bytes and checking size in an utility class to avoid code duplication. Add missing tests covering the cases where the pack size limit is reached.
-
Antoine Lambert authored
When using the requests library to perform HTTP requests, if responses need to be streamed the stream parameter must be set to True to ensure content is downloaded by chunks. Previously, a whole HTTP response was cached in memory which could lead to OOM errors when dealing with a repository with large pack files.
-
- Feb 20, 2024
-
-
Antoine Lambert authored
Related to swh-loader-core@c9b51f8b.
-
- Feb 05, 2024
-
-
Antoine Lambert authored
Related to swh/meta#5075.
-
- Feb 02, 2024
-
-
Nicolas Dandrimont authored
-
- Jan 29, 2024
-
-
Nicolas Dandrimont authored
Git loading tasks can take a pretty long time, and it's not easy to diagnose if it's stuck or if it's just taking a while. Instead of only logging at the end of the task, print a log line after each object type has been fully processed. Also print a log line every 3 minutes while objects are being processed.
-
Nicolas Dandrimont authored
The packfile fetching operation can take a long time. Send one log line every minute while it progresses.
-
Nicolas Dandrimont authored
Instead of dumping the dulwich remote communication stream to stderr, add a separate logger for remote messages, and handle the remote stream as proper log entries.
-
- Jan 24, 2024
-
-
Nicolas Dandrimont authored
This hooks into the right urllib3 and requests settings for both the smart and dumb loader.
-
Nicolas Dandrimont authored
This sets the connect and read timeout for both the smart loader (via urllib3/dulwich) and for the dumb loader (via requests).
-
Nicolas Dandrimont authored
This is useful to override the default settings of the requests Session, e.g. certificate verification of connect/read timeouts.
-
Nicolas Dandrimont authored
This is useful to override the default settings of the dulwich urllib3 adapter, e.g. certificate verification of connect/read timeouts.
-
Nicolas Dandrimont authored
-
Nicolas Dandrimont authored
Newer versions of git create a ".rev" file next to the existing ".pack" and ".idx", making the nb_files inconsistent.
-
- Jan 16, 2024
-
-
Antoine Lambert authored
A utility function was renamed in swh-loader-core.
-
Antoine Lambert authored
If the submodules parameter of the loader is True but no .gitmodules file is found in root directory of the repository, the repository path is not yielded and thus its loading discarded.
-
- Jan 08, 2024
-
-
Antoine Lambert authored
It indicates if submodules should be retrieved after the git checkout operation as some guix origins require it. Related to #4751.
-