- Jan 21, 2022
-
-
vlorentz authored
-
- Jan 19, 2022
-
-
Antoine R. Dumont authored
Related to D6967
-
- Dec 16, 2021
-
-
Antoine R. Dumont authored
This also drops spurious copyright headers to those files if present. Related to T3812
-
- Dec 08, 2021
-
-
Antoine Lambert authored
Now that we have packaged tenacity 6.2 for debian buster and use it in production, we can remove the workarounds to support tenacity < 5.
-
- Dec 07, 2021
-
-
vlorentz authored
I would like to use it as the metadata authority URI in the loader, instead of '{p_url.scheme}://{p_url.netloc}/', which I do not think is accurate, as it is possible to have multiple Maven instances at the same netloc.
-
- Dec 06, 2021
-
-
Antoine Lambert authored
A debian package can have sources coming from multiple suites so we need to ensure to update the last_update field in the ListedOrigin model if the current processed suite has a greater modification time for its sources index. Related to T2400
-
- Dec 03, 2021
-
-
Antoine Lambert authored
Use the value of the "Last-Modified" header from the HTTP response resulting of the debian sources index HTTP request. It will prevent to create loading tasks for debian packages with no changes since last listing. Related to T2400
-
Antoine Lambert authored
All debian suites do not necessarily have the same set of components. So prefer to log that a component is missing for a suite instead of raising an excption that will stop the listing.
-
Antoine Lambert authored
Remove no longer used date parameter in extra_loader_arguments. Related to T2400
-
- Dec 02, 2021
-
-
Antoine Lambert authored
For a given package, the debian lister generates a dictionary mapping distribution and version to a list of files to be processed by the debian loader. For each file to process, the debian loader expects to find an URI in order to download it and then use its content to ingest package source code into the archive. However, it turns out these URIs were not computed by the lister in its current implementation making any debian loading task fail due to these missing info. So add the computation of these URIS and ensure they will be provided in the debian loader input parameters. Related to T2400
-
- Dec 01, 2021
-
-
Nicolas Dandrimont authored
In some circumstances, GitHub will return two separate repos with the same html_url in the same page. This makes the lister fail with a cardinality error.
-
- Nov 29, 2021
-
-
Boris Baldassari authored
The Maven lister retrieves the maven central indexes, exports them in a convenient text format, and parse them to identify all src archives and pom files in the maven repository. Then the pom files are downloaded and analysed to find and yield any scm reference. Note: This is a new version of the maven lister diff D6133 which takes into account the initial round of reviews. Related to T1724
-
- Nov 23, 2021
-
-
Antoine R. Dumont authored
This fixes the master build [1] [1] https://jenkins.softwareheritage.org/view/swh-draft/job/DLS/job/tests/1625/console
-
- Nov 10, 2021
-
-
Antoine R. Dumont authored
-
- Nov 09, 2021
-
-
vlorentz authored
It will be used to create a synthetic release message that contains the package's name, like the Debian loader does.
-
- Oct 22, 2021
-
-
Antoine Lambert authored
Related to T3645
-
Antoine Lambert authored
CRAN origins must be loaded with the cran visit type and not the tar one. Related to T3675
-
- Oct 11, 2021
-
-
Antoine R. Dumont authored
Related to T3470
-
- Oct 08, 2021
-
-
Antoine R. Dumont authored
Related to T3629
-
Antoine R. Dumont authored
This does not yet enter into the registration of a new lister. Related to T3629
-
- Sep 24, 2021
-
-
Antoine R. Dumont authored
That avoids having multiple distinct opam root directories per opam lister instance. The current opam commands used by the lister are actually listing specifically per instance. Related to P1171
-
- Sep 21, 2021
-
-
Antoine R. Dumont authored
Any extra state initialization (outside the scheduler scope) is to happen in the get_pages method.
-
Antoine R. Dumont authored
Related to T3590
-
Antoine R. Dumont authored
This matches how it's done for all other multi instances listers. Related to T3590
-
Antoine R. Dumont authored
We should avoid side-effects in the constructor as much as possible. That avoids surprising behavior at object instantiation time. The state if needed must be initialized into the `swh.lister.pattern.Lister.get_pages` method, as preconized in the class docstring. This also fixes the current test that actually bootstrap a real opam local "clone" in /tmp. Related to T3590
-
- Sep 17, 2021
-
-
Antoine R. Dumont authored
This will allow to dedicate the heptapod instances into its their own stats. Related to T3581
-
Antoine R. Dumont authored
Related to T3581#70593
-
- Sep 16, 2021
-
-
Antoine R. Dumont authored
This will allow to list the foss.heptapod.net instance for example. Related to T3581
-
- Jul 23, 2021
-
-
Antoine Lambert authored
GitLab API can return errors 500 when listing projects (see https://gitlab.com/gitlab-org/gitlab/-/issues/262629). To avoid ending the listing prematurely, skip buggy URLs and move to next pages. Related to T3442
-
Antoine Lambert authored
Increase number of origins per page to the maximum value allowed by GitLab API (100) to send less requests. Ask for simple responses to reduce size of JSON data.
-
Antoine Lambert authored
Temporarily server failures can happen when listing a GitLab instance, HTTP status codes 502, 503 or 520 are returned in that case. So adapt lister requests retry policy to execute requests again when such errors are encountered. Related to T3442
-
- Jul 20, 2021
-
-
Antoine R. Dumont authored
This aligns the behavior with the opam loader Related to T3358
-
- Jul 13, 2021
-
-
Antoine Lambert authored
Make the instance parameter of the base pattern lister optional and set lister name to URL network location when not provided. It simplifies lister creation when associated forge type have a lot of instances in the wild (e.g. gitlab or cgit) while giving more details about the listed forge instance. Also process listers for forge with multiple instances (cgit, gitea, gitlab, phabricator and tuleap) to ensure URL network location will be used when instance parameter is not provided. Related to T3403
-
- Jul 09, 2021
-
-
Antoine R. Dumont authored
This rewrote the current implementation to actually use pypi's xml-rpc api which allows to be incremental. It also allows to fetch the last release date per package. This last part actually make it possible to update the "last_update" entry in the ListedOrigin model. Related to T3399
-
Antoine R. Dumont authored
-
- Jul 06, 2021
-
-
zapashcanon authored
-
- Jun 09, 2021
-
-
Antoine Lambert authored
-
David Douard authored
-
- Jun 04, 2021
-
-
Raphaël Gomès authored
See inline comment as to why. This change also adds a Mercurial repo to the test data.
-
- Jun 03, 2021
-
-
Raphaël Gomès authored
I previously forgot to add the `https://` prefix to the cloning URL. Whoops.
-