- Sep 29, 2022
-
-
Franck Bret authored
For each origin it calls an http api endpoint to retrieve extrinsic metadata for each version of a module. Author and package description are extracted from intrinsic metadata parsing data from META.json or META.yml at the root of the archive. Related T2833
-
- Sep 28, 2022
-
-
Antoine Lambert authored
Software Heritage homemade RPC layer does not known how to serialize set objects so we need to pass lists as parameters of *_missing methods from storage API.
-
Raphaël Gomès authored
This will allow us to use this interface in async code like ``swh-scanner``. Unfortunately, this means calling ``asyncio.run`` for sync code, but the performance impact should be negligible. The ``swh_storage.*missing*`` APIs are inconsistent for each type, which requires a lot of boilerplate code. This should be addressed in a follow-up.
-
- Sep 26, 2022
-
-
Raphaël Gomès authored
"Discovery" is the term used to find out the differences between two Merkle graphs. Using such an algorithm is useful in that it drastically reduces the amount of data that needs to be transferred. This commit introduces an efficient but simple algorithm that is a good starting point for improved performance: random sampling of directories, the details of which are explained in the docstrings. Mercurial uses a more sophisticated algorithm for its discovery, but it is quite a bit more involved and would introduce too much complexity at once. Also, the constraints for speed that Mercurial has (in the order of milliseconds) don't apply as obviously to this context without further investigation. Benchmarks ========== Setup ----- - With a local postgresql storage (so no network overhead), a local tmpfs obstorage on a fast NVME SSD, all of which should make this improvement look less good than it will be in production - With a tarball of the linux kernel at commit d96d875ef5dd372f533059a44f98e92de9cf0d42 already loaded - Loading a tarball of 20 commits earlier (bf3f401db6cbe010095fe3d1e233a5fde54e8b78) - Only taking into account the loading (not the downloading of the tarball, or its decompression) Result ------ before: ~30s after: ~17s Reproduced 4 times.
-
Antoine Lambert authored
-
- Sep 21, 2022
-
-
Antoine Lambert authored
This naming is more explicit about what this function is doing.
-
- Sep 20, 2022
-
-
Antoine Lambert authored
This function is used in multiple package loaders so add a throttling retry policy and debug log about fetched URL.
-
Antoine Lambert authored
Some package loaders might encounter errors while attempting to get package info for a given version (HTTP error for instance). So handle that case and ensure partial visit when such an error occurs. Related to T4124
-
Antoine Lambert authored
Those methods were not reimplemented and could return incorrect statuses once the load method called. As they are useful for tests, implement them in PackageLoader class.
-
Antoine Lambert authored
Such test was missing so add a couple of tiny tarballs in tests data directory and simulate a successful visit for origin example. Also update origin URL and remove a couple of hardcoded litterals.
-
- Sep 19, 2022
-
-
Antoine Lambert authored
Some go packages only have a development version not listed by the @v/list endpoint but returned by the @latest endpoint. So ensure to return it in get_versions method or it will be missed by the loader. Related to T4124
-
Antoine Lambert authored
When a go package name contains uppercase characters in it, associated goproxy URLs need to be case-encoded by replacing every uppercase letter with an exclamation mark followed by the corresponding lower-case letter. This fixes the loading of such packages. See https://go.dev/ref/mod#goproxy-protocol. Related to T4124
-
- Sep 13, 2022
-
-
Franck Bret authored
Make use of packaging.versions instead of distutils versions classes that are deprecated Resolves T4528 Related T4465
-
- Sep 09, 2022
-
-
Antoine Lambert authored
It enables to search sentry events related to a particular loaded origin URL or for a given visit type from the sentry Web UI.
-
- Sep 05, 2022
-
-
Antoine Lambert authored
Origin URL for a package listed by the pubdev lister is now in the form https://pub.dev/packages/<package_name> so we need to reconstruct the API URL to get package versions from it.
-
Antoine Lambert authored
Some HTTP download requests might be throttled by remote servers so add retry mechanism with exponential backoff to fix tarball downloads in some loaders.
-
- Aug 30, 2022
-
-
Raphaël Gomès authored
This uses the Golang proxy since that's what `go get` uses, and since it probably offer better performance than most direct sources.
-
vlorentz authored
-
- Aug 29, 2022
-
-
Franck Bret authored
Fix an issue where get_versions failed when version names does not follow semver conventions. Add related test.
-
- Aug 26, 2022
-
-
Franck Bret authored
The method name was wrong replace it with 'load_pubdev'
-
Franck Bret authored
Call an http api endpoint per package to get all its versions and artifacts metadata.
-
- Aug 19, 2022
-
-
Franck Bret authored
Adapt code, test fixture and tests after the lister split its artifacts data to artifacts and arch_metadata (see D8259) Related T4233
-
Franck Bret authored
Add 'aur' package loader
-
- Aug 08, 2022
-
-
vlorentz authored
It seems that despite setting it in the 'except BaseException' block, it is still occasionally undefined in the 'finally' block when triggered by a SystemExit exception. This should hopefully prevent UnboundLocalError from being raised from the 'finally' block from now on
-
- Aug 03, 2022
-
-
vlorentz authored
-
- Jun 29, 2022
-
-
Franck Bret authored
This fix an issue discovered while testing loader on docker environment, error was (load_arch() got an unexpected keyword argument 'lister_name') Related T4233
-
vlorentz authored
-
Franck Bret authored
named args This fix an issue discovered while testing loader on docker environment, error was (load_crates() got an unexpected keyword argument 'lister_name') Related T4104
-
- Jun 21, 2022
- Jun 17, 2022
-
-
Franck Bret authored
Fetch Arch linux packages from lister discovered origins. For each origin it get versions from extra_loader_arguments['artifacts'] Arch Linux package can comes as .xz or .zst file archive. Support for .zst (Zstandard compression) has been requested with D7993. Related to T4233
-
- Jun 07, 2022
-
-
Antoine Lambert authored
An artifact without time info can be provided in the artifacts list parameter of the loader (for instance last modification date is not available for tarballs coming from github releases). That case was not handled by the archive loader wich was resulting in loading error so add fix for it.
-
- May 20, 2022
-
-
vlorentz authored
It will be used by the git loader to run a graph traversal on received packfiles; and we want to monitor its runtime as a separate metric.
-
- May 16, 2022
-
-
Antoine Lambert authored
The post_load method of a loader can raise an exception so we must ensure to turn back the success variable to False in that case. For instance, the subversion loader post_load checks that latest exported revision is consistent with what the official subversion client produces. If it is not an exception will be raised to set the visit status to partial.
-
- May 13, 2022
-
-
vlorentz authored
-
Antoine R. Dumont authored
Related to T4236
-
- May 06, 2022