- Sep 16, 2021
-
-
Antoine Lambert authored
requests follows URL redirection by default for GET requests so update input URL to response one to ensure correct filename will be extracted from it.
-
- Sep 15, 2021
-
-
Antoine Lambert authored
Some PyPI origins declare sdist archives that cannot be extracted by swh.core.tarball.uncompress and their content do not match standard sdist layout. This is notably the case for sdist files whose extensions are .deb, .egg, .rpm or .whl. As those artifacts are not of interest to archive and generate errors while loading PyPI origins, filter them out from the sdist files to process. Related to T3575
-
- Sep 14, 2021
-
-
Antoine Lambert authored
Add support to download file using FTP protocol through the use of the urllib.request.urlopen function from Python standard library. Related to T2687
-
- Sep 13, 2021
-
-
Antoine Lambert authored
Some PKG-INFO files are malformed or are missing the Version field which causes error when trying to build the revision associated to a package version. So handle that edge case to fix loading issues.
-
Antoine Lambert authored
Some debian source package metadata are missing md5 sums for their files, for instance those from the buster-proposed-updates suite. So turn the md5sum field from DebianFileMetadata model optional in order to avoid loading errors. Related to T3547
-
- Aug 31, 2021
-
-
vlorentz authored
The in-mem/cass storage used to sort visits by (id, date). The last releases now sort by (date, id) like postgresql, but this test did not expect it. This commit instantiate the loader *after* picking a date for the dummy visit, so the loader's visit always comes after the dummy one.
-
- Aug 12, 2021
-
- Aug 05, 2021
-
-
Antoine R. Dumont authored
This reverts commit dbb18628. This creates issues when uploading to pypi [1]. The type x-rst is not posing problem for example in swh.core. But in that repository the readme is not a symlink... I'm not investigating this right now so a simple revert should do. My focus is on deploying the opam loader on staging for now. [1] https://jenkins.softwareheritage.org/view/swh-draft/job/DLDBASE/job/pypi-upload/111/console
-
- Jul 20, 2021
-
-
Antoine R. Dumont authored
This simplifies the logic behind the parsing of the `opam_read` function to avoid raising outside of the `opam_read` call. Related to T3425
-
zapashcanon authored
Summary: added an opam loader Related to T3425 Test Plan: will add tests later using a local opam repository as it's done in the lister Reviewers: #reviewers, vlorentz, ardumont Reviewed By: #reviewers, ardumont Subscribers: ardumont, vlorentz Maniphest Tasks: T3425 Differential Revision: https://forge.softwareheritage.org/D5975
-
- Jul 07, 2021
-
-
Antoine R. Dumont authored
-
- Jun 25, 2021
-
- Jun 16, 2021
-
-
Nicolas Dandrimont authored
-
- Jun 10, 2021
-
-
Antoine Lambert authored
It exists cases where a tarball to dowload is marked as gzipped in the Content-Encoding HTTP response header while in fact it is not. So handle ContentDecodingError exception that can be raised by the dowload method: try to download tarball raw bytes again without attempting to uncompress the input stream.
-
- Jun 09, 2021
-
-
Antoine Lambert authored
-
- May 27, 2021
-
-
Antoine Lambert authored
It enables to append the latest snapshot content of an origin each time the loader is invoked. The purpose if to keep track of all the origin artifacts loaded so far in each new visit of the origin. Closes T3347
-
- Apr 26, 2021
-
-
Antoine Lambert authored
Enable to check package documentation can be built without producing sphinx warnings. The sphinx environment is designed to be used in continuous integration in order to prevent breaking documentation build when committing changes. The sphinx-dev environment is designed to be used inside a full swh development environment. Related to T3258
-
- Apr 16, 2021
-
-
Antoine Lambert authored
It enables to successfully invoke make in the docs folder.
-
- Apr 13, 2021
-
- Apr 08, 2021
-
-
vlorentz authored
-
- Apr 06, 2021
-
-
Nicolas Dandrimont authored
It has existed, at least at some point in the past, even though I'm currently unable to reproduce a dsc with that field in it. (Hence the lack of test fixture...)
-
vlorentz authored
They already write it with raw_extrinsic_metadata_add/extid_add, and read it with extid_get_*. This code was only kept for compatibility while we were migrating the extids. This is now done, so this code is useless.
-
vlorentz authored
-
vlorentz authored
-
- Apr 02, 2021
- Apr 01, 2021
-
-
vlorentz authored
-
- Mar 30, 2021
-
- Mar 29, 2021
-
-
vlorentz authored
This allows future runs of a loader to know a package was already loaded, without querying each of the revisions individually and parsing their metadata. Eventually, this will allow us to get rid of the 'metadata' column on the 'revision' table entirely.
-
- Mar 26, 2021
-
-
vlorentz authored
To check which packages are already downloaded. For now, this lookup is done in addition to checking the artifacts from the last snapshot's revisions' metadata, because we did not start writing ExtIDs yet. But the ExtID lookup will eventually replace the artifact-based lookup. This will finally allow us to drop the 'metadata' field of Revision objects.
-
vlorentz authored
We used a string instead of a tuple. It doesn't matter much because they are only compared with each other, but let's not intentionally use the wrong types when we don't need to.
-
- Mar 25, 2021
-
-
vlorentz authored
A future commit will introduce resolve_revision_from_extid, so this commit preemptively renames it to avoid any confusion.
-
- Mar 23, 2021
-
-
vlorentz authored
In a future commit, we will need to go through all the PackageInfo objects before running the loop, so we can get their ExtID and fetch them from the storage. So, we need to fetch them all before running the load loop, using this listcomp.
-
vlorentz authored
These three loaders get intrinsic metadata from the archive, and use it to build the revision object (mostly authoring and date), which means they would not load the same revision as an other loader given the same archive.
-
vlorentz authored
In a future commit, we will need to go through all the PackageInfo objects before running the loop, so we can get their ExtID and fetch them from the storage. So, we need to fetch them all before running the load loop, using this listcomp.
-
vlorentz authored
I found the old definition to be quite confusing when refactoring this code.
-
vlorentz authored
This is still a purely internal change for now, but it will be needed to read/write ExtIDs from/to the storage.
-
vlorentz authored
All package loaders but deposit had logic to compute some object from the new packageinfo, some other objects from the known artifacts, and compare them. This commit moves the comparison logic to the base class, and unifies the two computation interfaces, respectively as an extid() method on TPackageInfo and a method on the loader. This unified object for comparison is a byte string, which is internal to each loader for now, but a future commit will read and write it from/to the ExtID storage instead of computing it from the 'original_artifacts' present in revision metadata.
-
vlorentz authored
We want to store these identifiers in the ExtID storage, which expects a (preferably short) bytearray; but the 'artifact_identity' was a list of (possibly long) strings and ints. While this commit does not write them to the ExtID storage yet, it makes these two loaders use them internally. Assuming no sha256 collision, this does not change their behavior when seen from the outside, with two exceptions: * the list of keys to use is now configured with a template string * configuring an unknown key now raises a KeyError instead of silently using a None value. But we never use this configuration setting, so in practice there is no change at all.
-