- Dec 14, 2022
- Dec 06, 2022
- Dec 05, 2022
-
-
Nicolas Dandrimont authored
Hopefully one day we'll be able to replace all of this mess with PEP692 TypedDict kwargs, but that's only on track for Python 3.12.
-
Nicolas Dandrimont authored
Some GitLab instances use specific namespaces for transient repositories that it doesn't make sense to archive (for example, gitlab.org has a set of QA namespaces used for integration testing of their production deployments; drupal has an `issues/` namespace with forks of repos that are only used for collaboration on merge requests, and aren't that useful to be archived).
-
Nicolas Dandrimont authored
This cuts down one more manual step in the add forge now validation process: we can add the relevant origins to the staging scheduler without enabling them at all.
-
Nicolas Dandrimont authored
This will allow more automation of the staging add forge now process: for known-good listers, we can limit the number of origins being processed and reduce the amount of manual steps taken for each instance.
-
Nicolas Dandrimont authored
The SQL dump contains ownership instructions that can't be run if you don't have the right users in your database clusters. When someone has a psqlrc with ON_ERROR_STOP, this fails the load of the dump. Use the opportunity to trigger an exception when psql returns a non-zero exit code, rather than continue with an empty/inconsistent database.
-
- Nov 21, 2022
-
-
Antoine Lambert authored
In a similar way to the debian lister, use the following versions in the packages dictionary provided to the generic rpm loader: - dict keys are package versions prefixed by the fedora release and edition they have been found (fedora{release}/{edition}/{version}), they will be used as branch names targeting releases in the snapshot created by the rpm loader - version fields in dict values are the package intrinsic versions parsed from package repository metadata excluding any ".fcXY" suffixes to avoid the loader to create multiple releases targeting the same directory, they will be used as release names in the snapshot created by the rpm loader Related to T4448
-
- Nov 18, 2022
-
-
Franck Bret authored
Use http api lastUpload argument in search query to retrieve new or updated origins since last run Related to T4597
-
- Nov 15, 2022
-
-
Kumar Shivendu authored
Summary: Lister to ingest fedora mirrors (.rpm) Reviewers: #reviewers, vlorentz Subscribers: vlorentz, olasd Maniphest Tasks: T4448 Differential Revision: https://forge.softwareheritage.org/D8386
-
- Nov 14, 2022
-
-
Franck Bret authored
The lister is incremental and based on the value of ``commitTimeStamp`` retrieved on index http api endpoint. Related T1718
-
- Nov 08, 2022
-
-
Franck Bret authored
Use with_release_since api argument to retrieve modules that have been updated since the last date the lister has been executed. Related T4519
-
- Nov 07, 2022
- Nov 04, 2022
-
-
vlorentz authored
- Oct 28, 2022
-
-
Antoine Lambert authored
-
- Oct 26, 2022
-
-
Antoine R. Dumont authored
Deploying the nixguix lister, I realized that even though the credentials configuration is properly set for all listers, the listers actually requiring github origin canonicalization do not have access to the github credentials. It's lost during the constructor to only focus on the lister's credentials. Which currently translates to listers being rate-limited. This commit fixes it by pushing the self.github_session instantiation in the constructor when the lister explicitely requires the github session. Hence lifting the rate limit for maven, packagist, nixguix, and github listers. Related to infra/sysadm-environment#4655
-
Antoine R. Dumont authored
As a last fallback after the content-type check, instead of raising immediately. Related to T3781
-
Antoine R. Dumont authored
Prior to this, some urls were detected as file because their version name were wrongly detected as extension, hence not matching tarball extensions. Related to T3781
- Oct 25, 2022
-
-
Franck Bret authored
-
Antoine R. Dumont authored
Those extensions can be extended through configuration. They default to some binary format already encountered during docker runs. Related to T3781
-
Antoine R. Dumont authored
Next step is to add some extensions filtering so might as well harden the test dataset first. Related to T3781
-
Antoine Lambert authored
swh-scheduler will deduplicate listed origins according to their URL and visit type but not according to their extra loader arguments. Previously, listed origins were yielded after each processed artifact in a page so we could lose some package version info due to the deduplication process. So ensure to yield listed origins once all artifacts in a page have been processed.
-
Antoine R. Dumont authored
This requires to open those extensions to be supported by loaders too (in swh.core.tarball). Related to T3781
-
- Oct 24, 2022
- Oct 21, 2022
-
-
Antoine R. Dumont authored
Prior to this commit, the lister assumed authentication was required. It exists public gogs instances which do not require it. This also updates documentation to mention the usual api location. This is useful when people wants to actually trigger a listing as a pre-check flight. This drops repetitive instruction in the gitea lister as well. Co-authored with Antoine Lambert (@anlambert) <anlambert@softwareheritage.org>. Related to infra/sysadm-environment#4644
-
- Oct 19, 2022
-
-
Antoine Lambert authored
-
- Oct 18, 2022
-
-
David Douard authored
- pre-commit from 4.1.0 to 4.3.0, - codespell from 2.2.1 to 2.2.2, - black from 22.3.0 to 22.10.0 and - flake8 from 4.0.1 to 5.0.4. Also freeze flake8 dependencies. Also change flake8's repo config to github (the gitlab mirror being outdated).
-
- Oct 13, 2022
-
-
vlorentz authored
-
vlorentz authored
By using set equality, pytest can diff both operands; whereas equality comparisons failures are harder to read.
-
vlorentz authored
In particular, there seems to be a negligeable number of origins using SSH instead of HTTPS, which the git loader cannot deal with.
-
Antoine Lambert authored
-
vlorentz authored
Tests implemented roughly the same algorithm as the lister, and compared both values...
- Oct 11, 2022
-
-
Antoine Lambert authored
CPAN API can return versions that are not of str type: either int or float. When version equals 0, it means that version failed to be parsed by CPAN so we try to extract it from release name in that case. Otherwise we ensure to convert the version to str type. Related to T2833
-
Antoine Lambert authored
Instead of querying the metacpan distribution endpoint to list origins, prefer to use the release endpoint instead enabling to list all artifacts associated to CPAN packages by scrolling results. Compared to previous implementation, it enables to compute a last_update date for all CPAN packages but also to obtain artifact sha256 checksums that will be used by the CPAN loader to check downloads integrity. As the multiple versions of a module are spread across multiple pages from the CPAN API, origins are sent to the scheduler once all pages processed, it is also faster to proceed that way. Related to T2833
-