- Feb 17, 2025
-
-
Antoine Lambert authored
-
Antoine Lambert authored
Bump development tools: mypy, codespell, isort, ... Move all tools configuration in pyproject.toml. Remove no longer needed mypy overrides.
-
- Sep 10, 2024
-
-
Antoine Lambert authored
BaseLoader.load now returns a dict with an extra error field when a loading fails.
-
- Aug 30, 2024
-
-
Antoine Lambert authored
-
Antoine Lambert authored
-
- Aug 27, 2024
-
-
David Douard authored
-
- May 30, 2024
-
-
Antoine Lambert authored
Side effect of swh.loader.core v5.18.0 release.
-
- May 15, 2024
-
-
Pierre-Yves David authored
Everything is a Content now. So we no longer needs this complexity.
-
- May 03, 2024
-
-
Antoine Lambert authored
-
- Mar 29, 2024
-
-
David Douard authored
-
- Feb 05, 2024
-
-
Antoine Lambert authored
Related to swh/meta#5075.
-
- Jan 25, 2024
-
-
Antoine Lambert authored
Create the file to write directly in the method implementation using a context manager instead of providing it as parameter.
-
Antoine Lambert authored
It enables to avoid creating too many temporary files by keeping file bytes in memory when the file size is lower than a cutoff value.
-
- Dec 05, 2023
-
-
Antoine Lambert authored
When using pserver protocol, the previous implementation could fail to checkout some files while it is working fine when using the cvsimport command from git. So align the pserver commands sent to checkout a file with the ones used in the git cvsimport implementation as it effectively fixes the observed file checkout issues.
-
David Douard authored
-
David Douard authored
-
- Dec 03, 2023
-
-
David Douard authored
-
- Nov 28, 2023
-
-
Jérémy Bobbio (Lunar) authored
Otherwise it will find `swh/loader/cvs/rcsparse/setup.py` which will annoy Sphinx as `swh.loader.cvs.rcsparse` is not a package.
-
Jérémy Bobbio (Lunar) authored
-
- Jun 09, 2023
-
-
Antoine R. Dumont authored
Refs. swh/infra/sysadm-environment#4906
-
- May 24, 2023
-
-
Antoine Lambert authored
Previously an assertion was raised when encountering a CVS module with no changesets. As such module is empty, we should rather produce an empty snapshot and marks the loading as uneventful (as in the git loader). Fixes #4648.
-
Antoine Lambert authored
When attempting to fetch CVS repository data using rsync protocol but that process exits with return code 23, it means the repository is no longer available so NotFound must be raised. Fixes #4572
-
- Mar 16, 2023
-
-
Antoine Lambert authored
When using the rsync protocol the loader parses the CVSROOT/config file to look for a custom keyword definition. However, it exists CVS repositories where such file is missing and thus loadings were failing as file existence was not checked prior opening it.
-
- Feb 23, 2023
-
-
Jérémy Bobbio (Lunar) authored
GitLab will display the content of the README file when browsing the repository. But in case the file is a symlink, it will display the path pointed by the symlink. There is a 6 year old issue about this: https://gitlab.com/gitlab-org/gitlab/-/issues/15093 We can workaround the issue by having the content at the root of the repository and a symlink to this file in the `docs/` directory. Tested in swh/devel/swh-py-template!27
-
- Feb 17, 2023
-
-
Antoine Lambert authored
Related to swh/meta#4960
-
- Feb 16, 2023
-
-
Jérémy Bobbio (Lunar) authored
Related to swh/meta#4959
-
- Feb 02, 2023
-
-
Antoine Lambert authored
This fixes python 3.7 support due to poetry, a dependency of isort, that removed support for that Python version in a recent release.
-
- Dec 19, 2022
-
-
Antoine Lambert authored
In order to remove warnings about /apidoc/*.rst files being included multiple times in toc when building full swh documentation, prefer to include module indices only when building standalone package documentation. Related to T4496
-
- Oct 20, 2022
-
-
Antoine Lambert authored
Some CVS servers have restriction regarding the expansion of the $Log$ keyword in file to checkout and might skip that operation. Nevertheless, official CVS client will still checkout the file but without expanding the $Log$ keyword so we should have the same behavior in the loader. Closes T4646
-
- Oct 18, 2022
-
-
David Douard authored
- pre-commit from 4.1.0 to 4.3.0, - codespell from 2.2.1 to 2.2.2, - black from 22.3.0 to 22.10.0 and - flake8 from 4.0.1 to 5.0.4. Also freeze flake8 dependencies. Also change flake8's repo config to github (the gitlab mirror being outdated).
-
Antoine Lambert authored
Use helper fixture loading_task_creation_for_listed_origin_test from swh-loader-core and remove redundant test.
-
- Oct 17, 2022
-
-
Antoine Lambert authored
This fixes debian package builds.
-
Antoine Lambert authored
Previously, after each revision replay all files and directories of the CVS repository being loaded were collected and sent to the storage. This is a real bottleneck in terms of loading performances as it delegates the filtering of new objects to archive to the storage filtering proxy. As we known exactly the set of paths that have been modified in a CVS revision, prefer to do that filtering on the loader side and only send modified objects to storage instead of the whole set of contents and directories from the reconstructed filesystem. This should greatly improve loading performance for large repositories but also reduce loader memory consumption.
-
Antoine Lambert authored
Instead of creating a from_disk.Directory instance after each replayed CVS revision by recursively scanning all directories of the repository, prefer to have a single one as class member kept synchronized with the recontructed filesystem after each revision replay. This should improve loader in terms of performance, especially when delaing with large repositories.
-
Antoine Lambert authored
CVS rlog for a given module sent by server is a concatenation of rlog entries. Each entry has a header containing the path to a RCS file plus other info. It exist cases where a rlog entry header is empty which makes the rlog parsing fail. So instead of stopping rlog parsing by raising an exception, prefer to skip that entry and process the next one. Closes T4629
-
- Oct 14, 2022
-
-
Antoine Lambert authored
Instead of using the readlines method on file objects that retrieve all lines of a file and store them in memory, prefer to read files line by line by using the lazy generator of lines from file objects. This significantly reduce loader memory consumption when processing a large rlog output stored in a file.
-
- Oct 13, 2022
-
-
Antoine Lambert authored
That case was handled when using rsync protocol but not when using pserver or ssh protocol. Closes T4631
-
Antoine Lambert authored
When attempting to fetch the rlog for a path that does not exist in the repository, the CVS server will respond with the following lines: E cvs rlog: could not read RCS file for <path> ok That error case was not handled in fetch_rlog so ensure it returns None when encountering it. The issue was spotted when the loader attempts to fetch more rlog data from Attic directories. The paths of these Attic directories are computed from those of the files in the repositories but it exist cases where those directories do not exist.
-
- Sep 15, 2022
-
-
Antoine Lambert authored
Since Python 3.10, support for PyArg_ParseTuple() # formats requires PY_SSIZE_T_CLEAN macro to be defined.
-
- Jul 11, 2022
-
-
Antoine Lambert authored
It exists CVS respositories where revision numbers greather than 1.x are used to version files. Previous loader implementation was raising an error when encountering such kind of revision so ensure it will be processed as the other ones. Also fix tag names extraction from rlog output. Related to T4043
-