- Sep 19, 2023
-
-
Franck Bret authored
Add a dlang module that retrieve origins from an http api endpoint. Each origin is a git based project url on github.com, gitlab.com or bitbucket.com.
-
- Sep 14, 2023
-
- Sep 06, 2023
-
-
Antoine Lambert authored
Ensure that all lister classes have the same set of mandatory parameters in their constructors, notably: scheduler, url, instance and credentials. Add a new test checking listers classes have mandatory parameters declared in their constructors. The purpose is to avoid deployment issues on staging or production environment as celery tasks can fail to be executed if mandatory parameters are not handled by listers. Reated to swh/infra/sysadm-environment#5030.
-
- Sep 05, 2023
-
-
Antoine R. Dumont authored
Refs. swh/infra/sysadm-environment#5030
-
- Aug 22, 2023
-
-
Antoine R. Dumont authored
This got detected when working on the deployment of the new loader-git. Refs. swh/infra/sysadm-environment#5017
-
- Aug 21, 2023
-
-
Antoine Lambert authored
Previously, the lister was relying on the use of the CRANtools R module but it has the drawback to only list the latest version of each registered package in the CRAN registry. In order to get all possible versions for each CRAN package, prefer to exploit the content of the weekly dump of the CRAN database in RDS format. To read the content of the RDS file from Python, the rpy2 package is used as it has the advantage to be packaged in debian. Related to swh/meta#1709.
-
- Aug 17, 2023
-
-
Antoine Lambert authored
-
- Aug 16, 2023
-
-
Antoine Lambert authored
As Red Hat based linux distributions share the same type of package repository, rework the fedora lister into a generic one to list RPM source packages and their versions from numerous distributions. For a given distribution, the RPM lister will fetch packages metadata from a list of release identifiers and a list of software components. Source packages are then processed and relevant info are extracted to be sent to the RPM loader. When all releases and components were processed, the lister collected all versions for each package name and send those info to the scheduler that will create RPM loading tasks afterwards. Nevertheless, as there is no generic way to list all releases and components for a given distribution but also to guess the right URL to retrieve packages metadata from, those info need to be manually provided to the lister as input parameters. Some examples of those parameters for various distributions can be found in the config directory of the lister. Regarding the produced origin URLs, as there is no way to find valid HTTP ones for all distributions, the same behavior as with the debian lister is used and they have the following form: rpm://{instance}/packages/{package_name} where the instance variable corresponds to the name of the listed distribution such as Fedora, CentOS, or openSUSE. Related to swh/meta#5011.
-
- Aug 04, 2023
-
-
Antoine R. Dumont authored
The constructor allows it but not the celery task. This also aligns the behavior with other lister tasks.
-
Antoine R. Dumont authored
The constructor allows it but not the celery task. This also aligns the behavior with other lister tasks.
-
Antoine R. Dumont authored
Instead of sending one page with all origins listed which is britle. When something goes wrong during the listing, the lister currently records nothing.
-
Antoine R. Dumont authored
-
Antoine R. Dumont authored
This allows to configure smaller batch when testing from docker & cli.
-
Antoine R. Dumont authored
With or without retry (for a future version of swh.core). This skips the origin when this sporadically happens. It should get picked up by another listing eventually. The listing is currently failing to finish when the github server hangs up on the process. Adding this behavior allows to skip the issue without breaking the listing.
-
Antoine R. Dumont authored
To avoid starting always in the same order the packages list when some problems occur in previous listing.
-
Antoine R. Dumont authored
-
- Aug 02, 2023
-
-
Antoine R. Dumont authored
-
Antoine R. Dumont authored
The current lister implementation lists very few metadata with the hard-coded /p/ base url (404 on mostly all packages). The packagist api implementation must have evolved since the initial implementation of the lister (and the first deployment on staging). Following the upstream documentation [1], it's sensible to first use the /p2/ as it's performant from the packagist api side. It's then fallbacking to use /p2/+~dev url scheme, then the /p/ scheme and finally the /packages/ base url if previous result are either not found or empty (different than no modification since the last visit). It keeps the initial implementation behavior of stopping immediately if a 304 NotModifiedSince is returned by the server. [1] https://repo.packagist.org/apidoc
-
- Aug 01, 2023
-
-
Antoine R. Dumont authored
This adds a test around the batch recording behavior to ensure it's not dropped by mistake.
-
Antoine R. Dumont authored
Prior to this commit, the newly introduced check on url validity was consuming the stream of origins. In effect, this would no longer write origin records regularly. For all listers, that would translate to flush origins only at the end of the listing which could take a while for some (e.g. packagist lister has been running for more than 12h currently without writing anything in the scheduler).
-
- Jul 13, 2023
-
-
Antoine R. Dumont authored
That lister is really near the cgit & gitweb implementations. But the dom data is again structured differently though so this implementation stands on its own. Refs. swh/meta#5048
-
Antoine R. Dumont authored
Gitiles instance returns voluntarily a malformed json output (json prefixed with ``)]}'\n``) [2]. The lister deals with it to properly parse the json response nonetheless. It drops the prefix and then parses the json. If at some point, they drop this prefix to return json directly, the lister will be able to deal with it too. There are 2 tests one with 'standard' gitile format and another with standard json to account for both case. Refs. swh/meta#5045 [2] https://github.com/google/gitiles/issues/263
-
- Jul 10, 2023
-
-
Antoine R. Dumont authored
Depending on some instances, we have some specific heuristics, some instances: - have summary pages which do not not list metadata_url (so some computation happens to list git:// origins which are cloneable) - have summary page which reference metadata_url as a multiple comma separated urls - lists relative urls of the repository so we need to join it with the main instance url to have a complete cloneable origins (or summary page) - lists "down" http origins (cloning those won't work) so lists those as cloneable https ones (when the main url is behind https). Refs. #1800
-
- Jul 04, 2023
-
-
Antoine Lambert authored
This fixes tests hang when building package for debian buster.
-
- Jun 29, 2023
-
-
Antoine Lambert authored
When relisting an opam instance and the opam root directory is already populated, the '--set-default' parameter must be provided otherwise the following error is reported: No switch is currently set. Please use 'opam switch' to set or install a switch Related to swh/infra/sysadm-environment#4971.
-
Antoine Lambert authored
Ensure opam errors are displayed when attempting to list all packages in order to ease debugging. Related to swh/infra/sysadm-environment#4971.
-
Antoine Lambert authored
Use subprocess.run instead of subprocess.call and subprocess.Popen to call opam commands and set check parameter to True in order to raise CalledProcessError exception when an opam command failed. This should help spotting issues with the opam lister. Related to swh/infra/sysadm-environment#4971.
-
- Jun 26, 2023
-
-
Antoine Lambert authored
In contrary of gitea listing which does not require to provide the q query parameter, it is required for the gogs case. After reading the gogs source code, the /repos/search endpoint generates a sql request of the form: "SELECT * FROM repos WHERE name LIKE '%{q}%'". By setting the q parameter value to "_", the LIKE clause acts as a wildcard and all repositories are ensured to be returned. Fixes #4698.
-
- Jun 23, 2023
-
-
Antoine Lambert authored
Missing docstring prevents the task type to be registered in scheduler database.
-
Antoine Lambert authored
Pagure is a git-centered forge, python based using pygit2. Its REST API enables to easily list all projects hosted in an instance so the lister implementation is quite simple. Related to swh/meta#5043.
-
- Jun 21, 2023
-
-
Nicolas Dandrimont authored
The default behavior of subprocess is to pull executables from a hardcoded list, which doesn't work when opam is installed manually in the user's home directory.
-
Nicolas Dandrimont authored
mypy doesn't catch that multiple uses of `self.listed_origins[origin_url]` in the same statement should be identical. Using a temporary local variable for it seems to help.
-
- Jun 20, 2023
-
-
vlorentz authored
The files we use weigh 440MB, and there are ~600MB of files we don't use
-
- Jun 08, 2023
-
-
Antoine R. Dumont authored
For the ones coming from a tarball. This matches the change happened in the associated directory loader. Refs. swh/infra/sysadm-environment#4906
-
- Jun 07, 2023
-
-
Antoine R. Dumont authored
Without this, the loader will fail. Refs. swh/meta#4979
-
- Jun 05, 2023
-
-
Antoine R. Dumont authored
Prior to this, it was sending only 'directory' types for all vcs trees. Multiple directory loaders now exist whose visit type are currently diverging, so the scheduling would not happen correctly without it. This commit is the required adaptation for the scheduling to work appropriately. Refs. swh/meta#4979
-
- May 31, 2023
-
-
Antoine R. Dumont authored
Those will be ingested by the loader as "directory" with "nar" checksum layouts. Refs. swh/infra/sysadm-environment#4868 Refs. swh/meta#4979
-
- May 23, 2023
-
-
Antoine R. Dumont authored
Some cgit instances are at a domain's root path so we can build their url directly from their 'instance' parameter. This unifies further the cli to register a lister and the cli to schedule the listed origins from a forge. [1] ``` https://git.kernel.org https://source.codeaurora.org https://git.trueelena.org https://dev.sanctum.geek.nz https://git.trueelena.org https://git.dpkg.org https://anongit.mindrot.org https://git.aurel32.net https://gitweb.gentoo.org https://git.joeyh.name https://git.adrian.geek.nz ``` Refs. swh/devel/swh-lister#4693
-
- May 19, 2023
-
-
Antoine R. Dumont authored
This pushes the rather elementary logic within the lister's scope. This will simplify and unify cli call between lister and scheduler clis. This will also allow to reduce erroneous operations which can happen for example in the add-forge-now. With the following, we will only have to provide the type and the instance, then everything will be scheduled properly. Refs. swh/devel/swh-lister#4693
-
- May 10, 2023
-
-
vlorentz authored
``` $ swh lister run Traceback (most recent call last): File "/home/dev/.local/bin/swh", line 33, in <module> sys.exit(load_entry_point('swh.core', 'console_scripts', 'swh')()) File "/home/dev/swh-environment/swh-core/swh/core/cli/__init__.py", line 144, in main return swh(auto_envvar_prefix="SWH") File "/home/dev/.local/lib/python3.9/site-packages/click/core.py", line 1130, in __call__ return self.main(*args, **kwargs) File "/home/dev/.local/lib/python3.9/site-packages/click/core.py", line 1055, in main rv = self.invoke(ctx) File "/home/dev/.local/lib/python3.9/site-packages/click/core.py", line 1657, in invoke return _process_result(sub_ctx.command.invoke(sub_ctx)) File "/home/dev/.local/lib/python3.9/site-packages/click/core.py", line 1657, in invoke return _process_result(sub_ctx.command.invoke(sub_ctx)) File "/home/dev/.local/lib/python3.9/site-packages/click/core.py", line 1404, in invoke return ctx.invoke(self.callback, **ctx.params) File "/home/dev/.local/lib/python3.9/site-packages/click/core.py", line 760, in invoke return __callback(*args, **kwargs) File "/home/dev/.local/lib/python3.9/site-packages/click/decorators.py", line 26, in new_func return f(get_current_context(), *args, **kwargs) File "/home/dev/swh-environment/swh-lister/swh/lister/cli.py", line 68, in run get_lister(lister, **config).run() File "/home/dev/swh-environment/swh-lister/swh/lister/__init__.py", line 75, in get_lister raise ValueError( ValueError: Invalid lister None: only supported listers are ['arch', 'aur', 'bitbucket', 'bower', 'cgit', 'conda', 'cpan', 'cran', 'crates', 'debian', 'fedora', 'gitea', 'github', 'gitlab', 'gnu', 'gogs', 'golang', 'hackage', 'hex', 'launchpad', 'maven', 'nixguix', 'npm', 'nuget', 'opam', 'packagist', 'phabricator', 'pubdev', 'puppet', 'pypi', 'rubygems', 'sourceforge', 'tuleap'] ```
-