- Dec 01, 2023
-
-
Antoine Lambert authored
Fix hanging test when executed outside tox.
-
- Nov 29, 2023
-
-
David Douard authored
-
- Nov 16, 2023
-
-
David Douard authored
Convert README from markdown to ReST to make it embeddable in docs/index.rst
-
- Nov 15, 2023
-
-
David Douard authored
-
- Nov 14, 2023
-
-
Nicolas Dandrimont authored
-
Antoine Lambert authored
The CRAN lister improvements introduced in 91e4e33d originally used pyreadr to read a RDS file from Python instead of rpy2. As swh-lister was still packaged for debian at the time, the choice of using rpy2 instead was made as a debian package is available for it while it is not for pyreadr. Now debian packaging was dropped for swh-lister we can reinstate the pyreadr based implementation which has the advantages of being faster and not depending on the R language runtime. Related to swh/meta#1709.
-
That fails the current loader ingestion as this must be an exact value (when provided, it's checked against the download operation). Refs. swh/infra/sysadm-environment#4746
-
- Nov 07, 2023
-
-
Antoine Lambert authored
Display the number of processed pages and listed origins after the listing process ended.
-
Antoine Lambert authored
In order to simplify the testing of listers, allow to call the run command of swh-lister CLI without scheduler configuration. In that case a temporary scheduler instance with a postgresql backend is created and used. It enables to easily test a lister with the following command: $ swh -l DEBUG lister run <lister_name> url=<forge_url>
-
- Oct 18, 2023
-
-
Jérémy Bobbio (Lunar) authored
The implementation of `HTTPError` in `requests` does not guarantee that the `response` property will always be set. So we need to ensure it is not `None` before looking for the return code, for example. This also makes mypy checks pass again, as `types-request` was updated in 2.31.0.9 to better match this particular aspect. See: https://github.com/python/typeshed/pull/10875
-
- Oct 12, 2023
-
-
Franck Bret authored
Ensure the registry path does not exists before cloning the repository.
-
- Oct 09, 2023
-
-
Franck Bret authored
-
Franck Bret authored
-
Franck Bret authored
-
Franck Bret authored
-
Franck Bret authored
This module introduce Julia Lister. It retrieves Julia packages origins from the Julia General Registry, a Git repository made of per package directory with Toml definition files.
-
- Oct 02, 2023
-
-
Antoine Lambert authored
Similar to cgit, it exist cases where git clone URLs for projects hosted on a gitweb instance cannot be found when scraping project pages or cannot be easily derived from the gitweb instance root URL. So add an optional base_git_url parameter enabling to compute correct clone URLs by appending project names to it.
-
Antoine Lambert authored
Some gitweb instances can have some string prefixes before the displayed git clone URLs so ensure to strip them to properly extract URLs. Related to swh/infra/sysadm-environment#5051.
-
- Sep 28, 2023
- Sep 26, 2023
-
-
Antoine Lambert authored
rstrip is not a method to remove a string suffix so use another way to extract gitweb project name. It fixes the computation of some gitweb origin URLs. Related to swh/infra/sysadm-environment#5050.
-
- Sep 25, 2023
-
-
Antoine Lambert authored
Extra Packages for Enterprise Linux is a set of additional packages community maintained that can be installed on many Red Hat based distributions.
-
- Sep 21, 2023
-
-
Franck Bret authored
-
Franck Bret authored
This reverts commit c9e2339a
-
- Sep 20, 2023
-
-
Franck Bret authored
-
Franck Bret authored
-
Franck Bret authored
-
Franck Bret authored
-
- Sep 19, 2023
-
-
Franck Bret authored
-
Franck Bret authored
Add a dlang module that retrieve origins from an http api endpoint. Each origin is a git based project url on github.com, gitlab.com or bitbucket.com.
-
- Sep 14, 2023
-
- Sep 06, 2023
-
-
Antoine Lambert authored
Ensure that all lister classes have the same set of mandatory parameters in their constructors, notably: scheduler, url, instance and credentials. Add a new test checking listers classes have mandatory parameters declared in their constructors. The purpose is to avoid deployment issues on staging or production environment as celery tasks can fail to be executed if mandatory parameters are not handled by listers. Reated to swh/infra/sysadm-environment#5030.
-
- Sep 05, 2023
-
-
Antoine R. Dumont authored
Refs. swh/infra/sysadm-environment#5030
-
- Aug 22, 2023
-
-
Antoine R. Dumont authored
This got detected when working on the deployment of the new loader-git. Refs. swh/infra/sysadm-environment#5017
-
- Aug 21, 2023
-
-
Antoine Lambert authored
Previously, the lister was relying on the use of the CRANtools R module but it has the drawback to only list the latest version of each registered package in the CRAN registry. In order to get all possible versions for each CRAN package, prefer to exploit the content of the weekly dump of the CRAN database in RDS format. To read the content of the RDS file from Python, the rpy2 package is used as it has the advantage to be packaged in debian. Related to swh/meta#1709.
-
- Aug 17, 2023
-
-
Antoine Lambert authored
-
- Aug 16, 2023
-
-
Antoine Lambert authored
As Red Hat based linux distributions share the same type of package repository, rework the fedora lister into a generic one to list RPM source packages and their versions from numerous distributions. For a given distribution, the RPM lister will fetch packages metadata from a list of release identifiers and a list of software components. Source packages are then processed and relevant info are extracted to be sent to the RPM loader. When all releases and components were processed, the lister collected all versions for each package name and send those info to the scheduler that will create RPM loading tasks afterwards. Nevertheless, as there is no generic way to list all releases and components for a given distribution but also to guess the right URL to retrieve packages metadata from, those info need to be manually provided to the lister as input parameters. Some examples of those parameters for various distributions can be found in the config directory of the lister. Regarding the produced origin URLs, as there is no way to find valid HTTP ones for all distributions, the same behavior as with the debian lister is used and they have the following form: rpm://{instance}/packages/{package_name} where the instance variable corresponds to the name of the listed distribution such as Fedora, CentOS, or openSUSE. Related to swh/meta#5011.
-
- Aug 04, 2023
-
-
Antoine R. Dumont authored
The constructor allows it but not the celery task. This also aligns the behavior with other lister tasks.
-
Antoine R. Dumont authored
The constructor allows it but not the celery task. This also aligns the behavior with other lister tasks.
-
Antoine R. Dumont authored
Instead of sending one page with all origins listed which is britle. When something goes wrong during the listing, the lister currently records nothing.
-
Antoine R. Dumont authored
-