- Mar 21, 2025
-
-
Pierre-Yves David authored
-
- Feb 10, 2025
-
-
Antoine Lambert authored
Latest beautifulsoup4 release (4.13) seems to have fixed issues related to unexpected encodings in XML files so a test that was passing previously is now failing. Update that test to check origin URL and visit type can be successfully extracted from a POM file with unexpected encoding.
-
- Sep 04, 2024
-
-
Antoine Lambert authored
This new and special lister enables to verify a list of origins to archive provided by users (for instance through the Web API). Its purpose is to avoid polluting the scheduler database with origins that cannot be loaded into the archive. Each origin is identified by an URL and a visit type. For a given visit type the lister is checking if the origin URL can be found and if the visit type is valid. The supported visit types are those for VCS (bzr, cvs, hg, git and svn) plus the one for loading a tarball content into the archive. Accepted origins are inserted or upserted in the scheduler database. Rejected origins are stored in the lister state. Related to #4709
-
- Aug 27, 2024
-
-
Antoine Lambert authored
packaging.version.parse is dedicated to parse Python package version numbers but crate versions do not necessarily respect Python version number conventions and thus some crate versions cannot be parsed. Prefer to use looseversion.LooseVersion2 instead which in a drop-in replacement for deprecated distutils.version.LooseVersion and enables to parse all kind of version numbers.
-
- Jun 28, 2024
-
-
Antoine Lambert authored
Latest tenacity release adds some internal changes that broke the mocking of sleep calls in tests. Fix it by directly mocking time.sleep (was not working previously).
-
- Nov 14, 2023
-
-
Antoine Lambert authored
The CRAN lister improvements introduced in 91e4e33d originally used pyreadr to read a RDS file from Python instead of rpy2. As swh-lister was still packaged for debian at the time, the choice of using rpy2 instead was made as a debian package is available for it while it is not for pyreadr. Now debian packaging was dropped for swh-lister we can reinstate the pyreadr based implementation which has the advantages of being faster and not depending on the R language runtime. Related to swh/meta#1709.
-
- Oct 09, 2023
-
-
Franck Bret authored
This module introduce Julia Lister. It retrieves Julia packages origins from the Julia General Registry, a Git repository made of per package directory with Toml definition files.
-
- Aug 21, 2023
-
-
Antoine Lambert authored
Previously, the lister was relying on the use of the CRANtools R module but it has the drawback to only list the latest version of each registered package in the CRAN registry. In order to get all possible versions for each CRAN package, prefer to exploit the content of the weekly dump of the CRAN database in RDS format. To read the content of the RDS file from Python, the rpy2 package is used as it has the advantage to be packaged in debian. Related to swh/meta#1709.
-
- Aug 17, 2023
-
-
Antoine Lambert authored
-
- Jul 10, 2023
-
-
Antoine R. Dumont authored
Depending on some instances, we have some specific heuristics, some instances: - have summary pages which do not not list metadata_url (so some computation happens to list git:// origins which are cloneable) - have summary page which reference metadata_url as a multiple comma separated urls - lists relative urls of the repository so we need to join it with the main instance url to have a complete cloneable origins (or summary page) - lists "down" http origins (cloning those won't work) so lists those as cloneable https ones (when the main url is behind https). Refs. #1800
-
- Nov 15, 2022
-
-
Kumar Shivendu authored
Summary: Lister to ingest fedora mirrors (.rpm) Reviewers: #reviewers, vlorentz Subscribers: vlorentz, olasd Maniphest Tasks: T4448 Differential Revision: https://forge.softwareheritage.org/D8386
-
- Oct 07, 2022
-
-
Antoine Lambert authored
Instead of using an undocumented rubygems HTTP endpoint that only gives us the names of the gems, prefer to exploit the daily PostgreSQL dump of the rubygems.org database. It enables to list all gems but also all versions of a gem and its release artifacts. For each relase artifact, the following info are extracted: version, download URL, sha256 checksum, release date plus a couple of extra metadata. The lister will now set list of artifacts and list of metadata as extra loader arguments when sending a listed origin to the scheduler database. A last_update date is also computed which should ensure loading tasks for rubygems will be scheduled only when new releases are available since last loadings. To be noted, the lister will spawn a temporary postgres instance so this require the initdb executable from postgres server installation to be available in the execution environment. Related to T1777
-
- Aug 09, 2022
-
-
Antoine Lambert authored
xmltodict cannot parse POM files with multi-byte encoding so prefer to use the XML parser of BeautifulSoup based on lxml instead. Also drop xmltodict requirement as it is no longer used in swh-lister codebase.
-
- Aug 05, 2022
-
-
Franck Bret authored
Add incremental mode support based on a 'last_commit' state, used to get new package versions from git diff range of commits.
-
- Apr 21, 2022
-
-
Antoine Lambert authored
Fix sourceforge origin URL for bzr projects, http://project.bzr.sourceforge.net/bzrroot/project redirects to http://project.bzr.sourceforge.net/bzr/project. Handle bzr projects with multiple branches, one listed origin must be created per branch. Discard bzr projects that no longer exist from listing.
-
- Dec 08, 2021
-
-
Antoine Lambert authored
Now that we have packaged tenacity 6.2 for debian buster and use it in production, we can remove the workarounds to support tenacity < 5.
-
- Nov 29, 2021
-
-
Boris Baldassari authored
The Maven lister retrieves the maven central indexes, exports them in a convenient text format, and parse them to identify all src archives and pom files in the maven repository. Then the pom files are downloaded and analysed to find and yield any scm reference. Note: This is a new version of the maven lister diff D6133 which takes into account the initial round of reviews. Related to T1724
-
- Feb 05, 2021
-
-
Antoine Lambert authored
xmltodict now raises an error while trying to parse the HTML content of https://pypi.org/simple/ page. So use BeautifulSoup HTML parser instead as it is aleady a requirement of swh-lister and it does not fail parsing the PyPI HTML page. Also drop no longer used xmltodict in requirements.
-
- Feb 02, 2021
-
-
Antoine Lambert authored
Legacy Lister classes from the swh.lister.core mdule are no longer used in swh-lister codebase so it is time to remove them. Also remove lister CLI options related to legacy Lister API. As a consequence, the following requirements are no longer needed: arrow, SQLAlchemy, sqlalchemy-stubs and testing.postgresql. Closes T2442
-
Antoine Lambert authored
UTC timezone settings can be obtained from the datetime.timezone module from Python standard library so remove dependency on external pytz module.
-
- Jan 18, 2021
-
-
Antoine Lambert authored
Add swh.lister.utils.throttling_retry decorator enabling to retry a function that performs an HTTP request who can return a 429 status code. The implementation is based on the tenacity module and it is assumed that the requests library is used when querying an URL. The default wait strategy is based on exponential backoff. The default max number of attempts is set to 5, HTTPError exception will then be reraised. All tenacity.retry parameters can also be overridden in client code.
-
- Apr 11, 2020
-
-
Léni Gauffier authored
Summary: Related to T1734 From abandonned D2799 Reviewers: ardumont Reviewed By: ardumont Differential Revision: https://forge.softwareheritage.org/D2974
-
- Nov 04, 2019
-
-
Antoine R. Dumont authored
Related T2023
-
- Jun 28, 2019
-
-
Archit Agrawal authored
Implemented a lister to list the repos for a given CGit instance. Closes T1659
-
- Feb 01, 2019
-
-
David Douard authored
-
- Oct 30, 2017
-
-
Nicolas Dandrimont authored
-
- Sep 05, 2017
-
-
Stefano Zacchiroli authored
-
- Apr 12, 2017
-
-
Antoine Pietri authored
-
- Mar 06, 2017
-
-
Avi Kelman authored
Streamline production of new listers by aggressively moving core functionality into progressively inherited (A->B->C) base classes with the transport layer abstracted. This should make common individual forge listers straightforward to produce with minimal customization. Github and Bitbucket listers can be used as examples of the indexing type.
-
- Feb 09, 2017
-
-
Antoine Pietri authored
-
- Dec 15, 2016
-
-
Antoine R. Dumont authored
Related T613
-
- Oct 20, 2016
-
-
Nicolas Dandrimont authored
-
- Oct 19, 2016
-
-
Nicolas Dandrimont authored
-
Nicolas Dandrimont authored
-
- Sep 13, 2016
-
-
Nicolas Dandrimont authored
-
- Mar 17, 2016
-
-
Nicolas Dandrimont authored
-
Nicolas Dandrimont authored
-
Nicolas Dandrimont authored
-
- Mar 09, 2016
-
-
Nicolas Dandrimont authored
-
- Sep 21, 2015
-
-
Stefano Zacchiroli authored
-