- Oct 26, 2022
-
-
Jenkins for Software Heritage authored
-
Jenkins for Software Heritage authored
Update to upstream version '5.1.0' with Debian dir 46568b9c977eac5642d6fcc7022a69c5c13455b2
- Oct 25, 2022
-
-
Franck Bret authored
As a follow up of Puppet lister evolution D8762, manage artifacts as lists Remove description from release message Related T4580
-
- Oct 21, 2022
-
-
Franck Bret authored
For each origin it takes advantage of 'artifacts' data send through 'extra_loader_arguments' of the conda lister, providing versions, archive url, checksum, etc. Author extracted from intrinsic metadata. Related T4579
-
- Oct 18, 2022
-
-
David Douard authored
- pre-commit from 4.1.0 to 4.3.0, - codespell from 2.2.1 to 2.2.2, - black from 22.3.0 to 22.10.0 and - flake8 from 4.0.1 to 5.0.4. Also freeze flake8 dependencies. Also change flake8's repo config to github (the gitlab mirror being outdated).
-
Jenkins for Software Heritage authored
-
Jenkins for Software Heritage authored
Update to upstream version '5.0.0' with Debian dir b72f205236de754324a7a58c3523bcdb4814d109
-
- Oct 17, 2022
-
-
Antoine Lambert authored
Fetch extrinsic metadata by computing URLs from the metadata provided by the lister and store them as release extrinsic metadata. Related to T2833
-
- Oct 11, 2022
-
-
Antoine Lambert authored
Parsing perl module metadata files trigger a lot of errors due to badly formatted JSON or YAML and module author info is already provided by the cpan lister as extra loader arguments so remove that no longer needed metadata parsing step. Related to T2833
-
Antoine Lambert authored
Artifacts info for a package are now provided as loader arguments so no need to query metacpan Web API anymore to get list of versions and their related info. Related to T2833
-
Antoine Lambert authored
Module description is not related to a particular release so we should not add it in release message.
-
Franck Bret authored
The loader get enough information from extrinsic metadata to build a release object, checking intrinsic metadata was more error prone than useful. It should fix some Sentry reported errors. Remove 'information' and adapt release message Adapt loader specifications documentation Related T4465, T4530, T4583
-
- Oct 07, 2022
-
-
Antoine R. Dumont authored
"nar" computation checks can happen on files too. This also deduplicate tests code on content and directory ones. Related to T3781
-
- Oct 05, 2022
-
-
Antoine R. Dumont authored
Prior to this commit, there was a discrepancy between the hash mismatch computations with "standard" and "nar" computations. This commit fixes the gap between those. When a hash mismatch occurs, either "nar" or "standard", the issue is caught and the next mirror url is checked. At the end of it all, if nothing is loaded and errors exist, this is raised. This fails the visit. This also adds the missing tests. Related to T3781
-
Antoine R. Dumont authored
The lister now provides the "checksums_computation". This is either "standard" (for most cases as in bare checksums on the object retrieved) or "nar" for some edge case. In that case the computation is delegated to the "nix-store" command (which should be present in the system running the loading). This adapts the directory loader to deal with this case. No work has been done for the ContentLoader yet besides failing the case if a call happens with such case. Related to T3781
-
- Oct 04, 2022
-
-
Antoine Lambert authored
Add a dedicated fixture implementing loader task creation check for a given lister and listed origin and use it in tasks tests for available loaders. Also remove redundant tests performing the same checks as that new fixture.
-
- Oct 03, 2022
-
-
Antoine R. Dumont authored
Related to T3781
-
Antoine Lambert authored
Previous regexp does not seem to work anymore so use a simpler one.
-
Antoine Lambert authored
Also fix a debug log template.
-
Antoine Lambert authored
This function downloads a file and computes hashes on it, there is no archive extraction step.
-
Antoine R. Dumont authored
This adapts the content/directory loader implementations to use directly a checksums dict which is now sent by the listers. This improves the loader to check those checksums when retrieving the artifact (content or tarball). Thanks to a bump in the swh.model version, this is now able to deal with sha512 checksums checks as well. This also aligns with the current package loaders which now are also checking the integrity of the tarballs they ingest. Related to T3781
-
Antoine R. Dumont authored
In some marginal listing cases (Nix or Guix for now), we can receive raw tarball to ingest. This commit adds a loader to ingest those. The output of the ingestion is a snapshot with 1 branch, one HEAD branch targetting the ingested directory (contained within the tarball). This expects to receive a mandatory 'integrity' field. It is used to check the tarball received out of the origin. This can also optionally receive a list of mirror urls in case the main origin url is no longer available. Those mirror urls are solely used as fallback to retrieve the tarball. Related to T3781
-
- Sep 30, 2022
-
-
Antoine Lambert authored
When one or multiple tarball checksums are available, either from listers output or from Web APIs calls perfomed by some loaders, use them to check integrity of downloaded tarballs.
-
Antoine R. Dumont authored
In some marginal listing cases (Nix or Guix for now), we can receive raw file to ingest. This commit adds a loader to ingest those. The output of the ingestion is a snapshot with 1 branch, one HEAD branch targetting the file content ingested. This expects to receive a mandatory 'integrity' field. It is used to check the content match the declaration. This can also optionally receive a list of mirror urls in case the main origin url is no longer available. Those mirror urls are solely used as fallback to retrieve the content. Related to T3781
-
- Sep 29, 2022
-
-
https://forge.puppet.comFranck Bret authored
For each origin it takes advantage of 'artifacts' data send through 'extra_loader_arguments' from the Puppet lister, providing versions, archive url, last_update, filename. Author and description are extracted from intrinsic metadata. Related T4580
-
Franck Bret authored
For each origin it calls an http api endpoint to retrieve extrinsic metadata for each version of a module. Author and package description are extracted from intrinsic metadata parsing data from META.json or META.yml at the root of the archive. Related T2833
-
- Sep 28, 2022
-
-
Antoine Lambert authored
Software Heritage homemade RPC layer does not known how to serialize set objects so we need to pass lists as parameters of *_missing methods from storage API.
-
Raphaël Gomès authored
This will allow us to use this interface in async code like ``swh-scanner``. Unfortunately, this means calling ``asyncio.run`` for sync code, but the performance impact should be negligible. The ``swh_storage.*missing*`` APIs are inconsistent for each type, which requires a lot of boilerplate code. This should be addressed in a follow-up.
-
- Sep 26, 2022
-
-
Raphaël Gomès authored
"Discovery" is the term used to find out the differences between two Merkle graphs. Using such an algorithm is useful in that it drastically reduces the amount of data that needs to be transferred. This commit introduces an efficient but simple algorithm that is a good starting point for improved performance: random sampling of directories, the details of which are explained in the docstrings. Mercurial uses a more sophisticated algorithm for its discovery, but it is quite a bit more involved and would introduce too much complexity at once. Also, the constraints for speed that Mercurial has (in the order of milliseconds) don't apply as obviously to this context without further investigation. Benchmarks ========== Setup ----- - With a local postgresql storage (so no network overhead), a local tmpfs obstorage on a fast NVME SSD, all of which should make this improvement look less good than it will be in production - With a tarball of the linux kernel at commit d96d875ef5dd372f533059a44f98e92de9cf0d42 already loaded - Loading a tarball of 20 commits earlier (bf3f401db6cbe010095fe3d1e233a5fde54e8b78) - Only taking into account the loading (not the downloading of the tarball, or its decompression) Result ------ before: ~30s after: ~17s Reproduced 4 times.
-
Antoine Lambert authored
-
- Sep 21, 2022
-
-
Jenkins for Software Heritage authored
-
Jenkins for Software Heritage authored
Update to upstream version '4.2.0' with Debian dir 348bbeb02c820262034bb57ec89213fd01d30581