- Nov 09, 2021
- Nov 08, 2021
-
-
vlorentz authored
The artifacts they load match the semantics of a Release, but we used Revisions so far because of technical details (we needed the 'metadata' field of Revision that Release lacks) that is no longer relevant (thanks to the metadata storage). Packages that were loaded by previous versions of the package loader (as revs) will be converted to releases. In order to avoid fetching them from the origin, the loader will look for an existing extid pointing to a revision (like it used to), fetch that revision, extract some fields (directory id, author, date, ...) and build a new release using this information. This commit is unfortunately very large because of all changes in tests, mostly just new hashes and renaming 'revision' to 'release' (and various abbreviations and capitalizations). The only meaningful changes are in swh/loader/package/tests/test_loader.py and swh/loader/package/loader.py. To keep this commit as short as possible, I did not yet change individual loaders to create releases: they still create revisions, but are converted by the base loader. The next commit will refactor them to remove this conversion layer.
-
- Nov 04, 2021
-
-
vlorentz authored
All the '*_missing' tests are already done automatically by check_snapshot (it recursively checks all objects are present in the storage).
-
vlorentz authored
They clutter the test output because pytest prints the whole code of the function raising the assertionerror. With this magic variable, the error is shown as if it was raised directly in the caller's body.
-
vlorentz authored
Some tests did the following: 1. build a snapshot 2. get the snapshot from the storage 3. compare it with the expected snapshot 4. get the origin visit from the storage and check it If the loader built a wrong snapshot, the test fails at step 2, and the only information displayed is that the expected snapshot id does not exist, which is very unhelpful. Instead, I reordered them as: 1, 4, 2, 3. This way, if a wrong snapshot is build by the loader, it is detected when comparing the visit, and pytest shows the two hashes. Then, the test can be modified to use the hash that is actually generated to show the actual snapshot. This is consistent with what was already done in the pypi loader. Additionally, I made the following changes: 1. always check stats last (because a difference in numbers is hardly actionable without testing other objects) 2. add a few more snapshot id checks in visits 3. deduplicated a hardcoded snapshot id.
-
vlorentz authored
The parent is computed by the deposit as the revision of the latest deposit in the same origin before the current one. Therefore, it is redundant, as it can be recomputed from metadata + revision date. This is a preliminary change needed to make package loaders produce releases instead of revisions, as releases don't have parent relationships
-
vlorentz authored
-
vlorentz authored
-
- Nov 03, 2021
-
-
vlorentz authored
This reverts commit f6905cdf. That commit was a first step toward making loaders write releases instead of revisions. Unfortunately, we will still write revisions for a non-negligeable time, so I prefer to defer the removal of parent deposit revisions to the moment we actually make that switch, so we don't end up with inconsistent revisions.
-
- Oct 21, 2021
-
-
vlorentz authored
extids are used instead now, this is all dead code.
-
vlorentz authored
The parent is computed by the deposit as the revision of the latest deposit in the same origin before the current one. Therefore, it is redundant, as it can be recomputed from metadata + revision date. This is a preliminary change needed to make package loaders produce releases instead of revisions, as releases don't have parent relationships
-
- Oct 07, 2021
-
-
vlorentz authored
-
- Sep 29, 2021
-
-
Antoine R. Dumont authored
-
- Sep 28, 2021
-
-
Antoine R. Dumont authored
-
Antoine R. Dumont authored
When running in production, the workers should expect the opam root directory to be present (externally maintained). The production task should do nothing (initialize_opam_root's default value is False). When running with standalone loader, they should be instantiated with initialize_opam_root to True. They will create the opam root folder if not present so it can work out of the box (e.g. docker worker) Related to T3590
-
- Sep 22, 2021
-
-
Antoine R. Dumont authored
It allow the opam loader to reuse existing opam root with multiple instances. It's the complementary code that goes with the loader adaptation [1]. As the `opam show` (cli) [2] version currently packaged does not support the means to enclose the metadata extraction per opam instance (when sharing the same opam root), we actually work around this by opening internal details to opam. [1] D6316 [2] `opam show` is currently the interface we are using to extract and list information about a package. It does work on standalone opam root folder but it comes short when sharing multiple instances within one opam root (for now).
-
- Sep 21, 2021
-
-
Antoine R. Dumont authored
If it's required at all, this will use the network to fetch and install it. This should be done outside the constructor. Related to T3590
-
- Sep 17, 2021
-
-
Antoine R. Dumont authored
Otherwise, filename may end up being too long [1] ``` OSError: [Errno 36] File name too long: ``` Related to T3468
-
- Sep 16, 2021
-
-
Antoine Lambert authored
Those tests implementation are quite similar so let's put generic test code in a function and use some global variables.
-
Antoine Lambert authored
Some URLs for downloading a file do not contain any filename but rather provide it in the "content-disposition" response header. So ensure to extract the filename from that response header when available to avoid possible file processing issues afterwards.
-
Antoine Lambert authored
requests follows URL redirection by default for GET requests so update input URL to response one to ensure correct filename will be extracted from it.
-
- Sep 15, 2021
-
-
Antoine Lambert authored
Some PyPI origins declare sdist archives that cannot be extracted by swh.core.tarball.uncompress and their content do not match standard sdist layout. This is notably the case for sdist files whose extensions are .deb, .egg, .rpm or .whl. As those artifacts are not of interest to archive and generate errors while loading PyPI origins, filter them out from the sdist files to process. Related to T3575
-
- Sep 14, 2021
-
-
Antoine Lambert authored
Add support to download file using FTP protocol through the use of the urllib.request.urlopen function from Python standard library. Related to T2687
-
- Sep 13, 2021
-
-
Antoine Lambert authored
Some PKG-INFO files are malformed or are missing the Version field which causes error when trying to build the revision associated to a package version. So handle that edge case to fix loading issues.
-
Antoine Lambert authored
Some debian source package metadata are missing md5 sums for their files, for instance those from the buster-proposed-updates suite. So turn the md5sum field from DebianFileMetadata model optional in order to avoid loading errors. Related to T3547
-
- Aug 31, 2021
-
-
vlorentz authored
The in-mem/cass storage used to sort visits by (id, date). The last releases now sort by (date, id) like postgresql, but this test did not expect it. This commit instantiate the loader *after* picking a date for the dummy visit, so the loader's visit always comes after the dummy one.
-
- Aug 12, 2021
-
- Aug 05, 2021
-
-
Antoine R. Dumont authored
This reverts commit dbb18628. This creates issues when uploading to pypi [1]. The type x-rst is not posing problem for example in swh.core. But in that repository the readme is not a symlink... I'm not investigating this right now so a simple revert should do. My focus is on deploying the opam loader on staging for now. [1] https://jenkins.softwareheritage.org/view/swh-draft/job/DLDBASE/job/pypi-upload/111/console
-
- Jul 20, 2021
-
-
Antoine R. Dumont authored
This simplifies the logic behind the parsing of the `opam_read` function to avoid raising outside of the `opam_read` call. Related to T3425
-
zapashcanon authored
Summary: added an opam loader Related to T3425 Test Plan: will add tests later using a local opam repository as it's done in the lister Reviewers: #reviewers, vlorentz, ardumont Reviewed By: #reviewers, ardumont Subscribers: ardumont, vlorentz Maniphest Tasks: T3425 Differential Revision: https://forge.softwareheritage.org/D5975
-
- Jul 07, 2021
-
-
Antoine R. Dumont authored
-
- Jun 25, 2021
-
- Jun 16, 2021
-
-
Nicolas Dandrimont authored
-
- Jun 10, 2021
-
-
Antoine Lambert authored
It exists cases where a tarball to dowload is marked as gzipped in the Content-Encoding HTTP response header while in fact it is not. So handle ContentDecodingError exception that can be raised by the dowload method: try to download tarball raw bytes again without attempting to uncompress the input stream.
-
- Jun 09, 2021
-
-
Antoine Lambert authored
-
- May 27, 2021
-
-
Antoine Lambert authored
It enables to append the latest snapshot content of an origin each time the loader is invoked. The purpose if to keep track of all the origin artifacts loaded so far in each new visit of the origin. Closes T3347
-
- Apr 26, 2021
-
-
Antoine Lambert authored
Enable to check package documentation can be built without producing sphinx warnings. The sphinx environment is designed to be used in continuous integration in order to prevent breaking documentation build when committing changes. The sphinx-dev environment is designed to be used inside a full swh development environment. Related to T3258
-
- Apr 16, 2021
-
-
Antoine Lambert authored
It enables to successfully invoke make in the docs folder.
-