- Aug 03, 2022
-
-
Kumar Shivendu authored
-
- Jun 15, 2022
-
-
Franck Bret authored
After a first attempt with D7812 this one use a different strategy to retrieve origins. Fetch and extract "core.files.tar.gz", "extra.files.tar.gz" and "community.files.tar.gz" from archives.archlinux.org. That step ensure that we have a list of "official" packages. Parse metadata from 'desc' file to build origins url. Scrap the origin url to get artifacts metadata that list all versions of a package. It also fetch and extract unofficial 'arm' packages from archlinuxarm.org but in this case we can not get all versions of an arm package. Related T4233
-
- May 23, 2022
-
-
Antoine R. Dumont authored
That means detected github urls {https,git,http}://github.com/${user_repo}(.git) are canonicalized to https://github.com/${user_repo} format. This avoids duplication of origins. Related to T4232
-
- May 20, 2022
-
-
Antoine R. Dumont authored
Related to T4232
-
- May 09, 2022
-
- May 02, 2022
-
-
Antoine Lambert authored
Pass the raw bytes of pom file content in xmltodict.parse and let it do the string decoding based on the encoding declared in pom file. If the string decoding failed due to an invalid declared encoding, xml.parsers.expat.ExpatError will be raised and will be caught by the lister, ignoring the pom file and continuing listing. Related to T3874
-
- Apr 29, 2022
-
-
Antoine Lambert authored
It exists cases where the modification time for a jar archive in a maven index is null which was leading to a processing error by the lister. So handle that case to avoid premature exit of the listing process. Related to T3874
-
Antoine Lambert authored
When parsing pom files, we are only interested to extract a VCS URL (git, hg, svn) in order to create associated loading tasks. In that case, the groupId and artifactId are not used by the lister so better removing their extraction, plus it will prevent errors when those info are missing in pom files.
-
Antoine Lambert authored
Previously the maven lister was creating an origin for each source archive (jar, zip) it discovered during the listing process. This is not the way Software Heritage decided to archive sources coming from package managers. Instead one origin should be created per package and all its versions should be found as releases in the snapshot produced by the package loader. So modify the maven lister in order to create one origin per package grouping all its versions. This change also modifies the way incremental listing is handled, ListedOrigin instances will be yielded only if we discovered new versions of a package since the last listing. Tests have been updated to reflect these changes. Related to T3874
-
- Apr 28, 2022
-
-
Franck Bret authored
Previously we had as many origins as version for a crate package, url was a link to a specific crate version package. Refactor to have one origin per package name and add an 'artifacts' entry to extra_loader_arguments that list all versions, package url and checksum. Origin url is now a link to the related http api endpoint for a package name. Related to T4104
-
- Apr 26, 2022
- Apr 25, 2022
-
-
Antoine Lambert authored
That processing is already handled in the base Lister class constructor.
-
- Apr 21, 2022
-
-
vlorentz authored
Authentication is handled directly in the session
-
Antoine Lambert authored
Fix sourceforge origin URL for bzr projects, http://project.bzr.sourceforge.net/bzrroot/project redirects to http://project.bzr.sourceforge.net/bzr/project. Handle bzr projects with multiple branches, one listed origin must be created per branch. Discard bzr projects that no longer exist from listing.
-
Antoine Lambert authored
The Attic folder that can sometimes be found in a CVS respository is a special one used by CVS to store RCS files and should not be considered as a valid module name when listing CVS projects.
-
Antoine Lambert authored
That hook can be frustrating as it can discard a long commit message if it finds a typo in it so better removing it.
-
- Apr 14, 2022
-
-
Antoine R. Dumont authored
Related to T3874
-
- Apr 13, 2022
-
-
Antoine R. Dumont authored
This aligns the behavior with other listers (e.g. sourceforge, ...) to continue listing if some information is not retrievable at all. Related to T3874
-
Antoine R. Dumont authored
Without this, the lister legitimately cannot list anything.
-
- Apr 08, 2022
-
-
Antoine Lambert authored
-
Antoine Lambert authored
Related to T3922
-
Antoine Lambert authored
black is considered stable since release 22.1.0 and the version we are currently using is quite outdated and not compatible with click 8.1.0, so it is time to bump it to its latest stable release. Please note that E501 pycodestyle warning related to line length is replaced by B950 one from flake8-bugbear as recommended by black. https://black.readthedocs.io/en/stable/the_black_code_style/current_style.html#line-length Related to T3922
-
- Apr 06, 2022
-
-
Antoine Lambert authored
pytest-postgresql 3.1.3 and pytest-redis 2.4.0 added support for pytest >= 7 so we can now drop the pytest pinning.
-
- Mar 28, 2022
-
-
Franck Bret authored
The Crates lister retrieves crates package for Rust lang. It basically fetches https://github.com/rust-lang/crates.io-index.git to a temp directory and then walks through each file to get the crate's info.
-
- Mar 22, 2022
-
-
Antoine Lambert authored
Due to test modules being copied in subdirectories of the build directory by setuptools, it makes pytest fail by raising ImportPathMismatchError exceptions when invoked from root directory of the module. So ignore the build folder to discover tests.
-
- Mar 11, 2022
-
-
Antoine Lambert authored
Commit 6a747955 modified the origin URLs for CVS projects hosted on SourceForge but it also broke incremental listing due to a no longer valid assertion, so fix that issue.
-
- Feb 18, 2022
-
-
Antoine R. Dumont authored
The decorator is dropped on `get_origins_from_page` as we cannot retry an iterator consumption anyway. Related to T3948
-
- Feb 17, 2022
-
-
Antoine Lambert authored
CVS projects are different from other VCS ones, they use the rsync protocol, a list of modules needs to be fetched from an info page and multiple origin URLs can be produced for a same project. Related to T3789
-
Antoine R. Dumont authored
as the scheduler is now able to deduplicate it when recording listed origins. Related to T3945
-
Antoine R. Dumont authored
Related to T3945
-
Antoine R. Dumont authored
Prior to this commit, the listing could fail when either reading a page or the page of results (lauchpad api raises RestfulError). This now retries when those kind of exceptions happen. If the error persists (after multiple tryouts and exponential backoff), the listing continues nonetheless (with warning logs). Note that if the page ends up being empty, it's no longer accounted for. This actually allows the listing to finish in case of issues. Related to T3945
-
- Feb 16, 2022
-
-
Antoine R. Dumont authored
Related to T3945
-
- Feb 14, 2022
-
-
Raphaël Gomès authored
Bazaar support was removed a long time ago and predates a lot of the new mechanisms in place in the API. Unfortunately, it looks like a lot of the URLs are offline now, but there are still a few projects that can be listed, this is pretty low-effort.
-
- Feb 10, 2022
-
-
Antoine Lambert authored
-
Antoine Lambert authored
To install the new hook: $ pre-commit install -t commit-msg
-
- Feb 09, 2022
-
-
Antoine R. Dumont authored
We need to avoid using naive datetime as this fails during conversion. Related to T3746 Related to P1280
-
- Feb 08, 2022
-
-
Boris Baldassari authored
-