- Jul 18, 2024
-
-
Nicolas Dandrimont authored
For now this information is not used downstream, but it can be useful for specific analysis or one-shot scheduling.
-
- Jun 28, 2024
-
-
Antoine Lambert authored
Latest tenacity release adds some internal changes that broke the mocking of sleep calls in tests. Fix it by directly mocking time.sleep (was not working previously).
-
- Jun 05, 2024
-
-
Antoine Lambert authored
Gitea API return next pagination link with all query parameters provided to an API request. As we were also passing a dict of fixed query parameters to the page_request method, some query parameters ended up having multiple instances in the URL for fetching a new page of repositories data. So each time a new page was requested, new instances of these parameters were appended to the URL which could result in a really long URL if the number of pages to retrieve is high and make the request fail. Also remove a debug log already present in http_request method.
-
- May 22, 2024
-
-
Antoine Lambert authored
The oldest part of the scheduler API was updated to use model classes (based on attr package) instead of dictionaries in order to improve typing.
-
- Apr 24, 2024
-
-
Antoine Lambert authored
Redirection URLs can be long and quite obscure in some cases (GitHub CDN for instance) so ensure to use the redirected URL as origin URL. Related to swh/meta#5090.
-
- Apr 16, 2024
-
-
Antoine Lambert authored
As the types-beautifulsoup4 package gets installed in the swh virtualenv as it is a swh-scanner test dependency, some mypy errors were reported related to beautifulsoup4 typing. As the returned type for the find method of bs4 is the following union: Tag | NavigableString | None, isinstance calls must be used to ensure proper typing which is not great. So prefer to use the select_one method instead where a simple None check must be done to ensure typing is correct as it is returning Optional[Tag]. In a similar manner, replace use of find_all method by select method. It also has the advantage to simplify the code.
-
- Mar 29, 2024
-
-
David Douard authored
-
- Mar 14, 2024
-
-
Antoine Lambert authored
Some Guix packages correspond to subset exports of a subversion source tree at a given revision, typically the Tex Live ones. In that case, we must pass an extra parameter to the svn-export loader to specify the sub-paths to export but also use a unique origin URL for each package to archive as otherwise the same one would be used and only a single package would be archived. Related to swh/infra/sysadm-environment#5263.
-
- Mar 13, 2024
-
-
Antoine Lambert authored
Remove use of --import-mode=importlib pytest option and use new option consider_namespace_packages to fix tests execution with latest pytest release.
-
Antoine Lambert authored
It fixes installation of dependencies required by swh-scheduler pytest plugin.
-
- Feb 05, 2024
-
-
Antoine Lambert authored
Related to swh/meta#5075.
-
- Jan 18, 2024
-
-
Antoine Lambert authored
In addition to query parameters also check if any part of URL path contains a tarball filename. It fixes the detection of some tarball URLs provided in Guix manifest. Related to swh/meta#3781.
-
- Jan 17, 2024
-
-
David Douard authored
Link to the user documentation instead. Also add a section on required binary tools.
-
- Jan 10, 2024
-
-
Jérémy Bobbio (Lunar) authored
Commit c2402f40 renamed the entry points from `lister.*` without updating the rest of the framework. Revert the changes (and sort the list alphabetically).
-
- Jan 09, 2024
-
-
Franck Bret authored
Use another Api endpoint that helps the lister to be stateful. The Api endpoint used needs a ``since`` value that represents a sequential index in the history. The ``all_packages_count`` state helps in storing a count which will be used as ``since`` argument on the next run.
-
Franck Bret authored
'url' and 'instance' are mandatory Add elm lister entry to pyproject.toml
-
Franck Bret authored
The Elm Lister lists Elm packages origins from the Elm lang registry. It uses an http api endpoint to list packages origins. Origins are Github repositories, releases take advantages of Github relase Api.
-
- Jan 08, 2024
-
-
Antoine Lambert authored
Guix now provides a "submodule" info in the sources.jon file it produced so exploit it to set the new "submodules" parameter of the git-checkout loader in order to retrieve submodules only when it is required. Related to swh/devel/swh-loader-git#4751.
-
- Dec 18, 2023
-
-
Franck Bret authored
Add a state to the lister to store the ``last_seen_commit`` as a Git commit hash. Use Dulwich to retrieve a Git commit walker since ``last_seen_commit`` if any. For each commit detect if it is a new package or a new package version commit and returns its origin with commit date as last_update.
-
- Dec 05, 2023
-
-
David Douard authored
-
David Douard authored
-
- Dec 03, 2023
-
-
David Douard authored
-
- Dec 01, 2023
-
-
Antoine Lambert authored
Fix hanging test when executed outside tox.
-
- Nov 29, 2023
-
-
David Douard authored
-
- Nov 16, 2023
-
-
David Douard authored
Convert README from markdown to ReST to make it embeddable in docs/index.rst
-
- Nov 15, 2023
-
-
David Douard authored
-
- Nov 14, 2023
-
-
Nicolas Dandrimont authored
-
Antoine Lambert authored
The CRAN lister improvements introduced in 91e4e33d originally used pyreadr to read a RDS file from Python instead of rpy2. As swh-lister was still packaged for debian at the time, the choice of using rpy2 instead was made as a debian package is available for it while it is not for pyreadr. Now debian packaging was dropped for swh-lister we can reinstate the pyreadr based implementation which has the advantages of being faster and not depending on the R language runtime. Related to swh/meta#1709.
-
That fails the current loader ingestion as this must be an exact value (when provided, it's checked against the download operation). Refs. swh/infra/sysadm-environment#4746
-
- Nov 07, 2023
-
-
Antoine Lambert authored
Display the number of processed pages and listed origins after the listing process ended.
-
Antoine Lambert authored
In order to simplify the testing of listers, allow to call the run command of swh-lister CLI without scheduler configuration. In that case a temporary scheduler instance with a postgresql backend is created and used. It enables to easily test a lister with the following command: $ swh -l DEBUG lister run <lister_name> url=<forge_url>
-
- Oct 18, 2023
-
-
Jérémy Bobbio (Lunar) authored
The implementation of `HTTPError` in `requests` does not guarantee that the `response` property will always be set. So we need to ensure it is not `None` before looking for the return code, for example. This also makes mypy checks pass again, as `types-request` was updated in 2.31.0.9 to better match this particular aspect. See: https://github.com/python/typeshed/pull/10875
-
- Oct 12, 2023
-
-
Franck Bret authored
Ensure the registry path does not exists before cloning the repository.
-
- Oct 09, 2023
-
-
Franck Bret authored
-
Franck Bret authored
-
Franck Bret authored
-
Franck Bret authored
-
Franck Bret authored
This module introduce Julia Lister. It retrieves Julia packages origins from the Julia General Registry, a Git repository made of per package directory with Toml definition files.
-
- Oct 02, 2023
-
-
Antoine Lambert authored
Similar to cgit, it exist cases where git clone URLs for projects hosted on a gitweb instance cannot be found when scraping project pages or cannot be easily derived from the gitweb instance root URL. So add an optional base_git_url parameter enabling to compute correct clone URLs by appending project names to it.
-
Antoine Lambert authored
Some gitweb instances can have some string prefixes before the displayed git clone URLs so ensure to strip them to properly extract URLs. Related to swh/infra/sysadm-environment#5051.
-