- Jan 06, 2023
-
-
Kumar Shivendu authored
Some of the URLs don't have a schema (ex: http) and it blocks the loader from downloading the corresponding repos. This diff should fix the issue. Related T3294
-
- Aug 12, 2021
-
- Aug 05, 2021
-
-
Antoine R. Dumont authored
This reverts commit dbb18628. This creates issues when uploading to pypi [1]. The type x-rst is not posing problem for example in swh.core. But in that repository the readme is not a symlink... I'm not investigating this right now so a simple revert should do. My focus is on deploying the opam loader on staging for now. [1] https://jenkins.softwareheritage.org/view/swh-draft/job/DLDBASE/job/pypi-upload/111/console
-
- Jul 20, 2021
-
-
Antoine R. Dumont authored
This simplifies the logic behind the parsing of the `opam_read` function to avoid raising outside of the `opam_read` call. Related to T3425
-
zapashcanon authored
Summary: added an opam loader Related to T3425 Test Plan: will add tests later using a local opam repository as it's done in the lister Reviewers: #reviewers, vlorentz, ardumont Reviewed By: #reviewers, ardumont Subscribers: ardumont, vlorentz Maniphest Tasks: T3425 Differential Revision: https://forge.softwareheritage.org/D5975
-
- Jul 07, 2021
-
-
Antoine R. Dumont authored
-
- Jun 25, 2021
-
- Jun 16, 2021
-
-
Nicolas Dandrimont authored
-
- Jun 10, 2021
-
-
Antoine Lambert authored
It exists cases where a tarball to dowload is marked as gzipped in the Content-Encoding HTTP response header while in fact it is not. So handle ContentDecodingError exception that can be raised by the dowload method: try to download tarball raw bytes again without attempting to uncompress the input stream.
-
- Jun 09, 2021
-
-
Antoine Lambert authored
-
- May 27, 2021
-
-
Antoine Lambert authored
It enables to append the latest snapshot content of an origin each time the loader is invoked. The purpose if to keep track of all the origin artifacts loaded so far in each new visit of the origin. Closes T3347
-
- Apr 26, 2021
-
-
Antoine Lambert authored
Enable to check package documentation can be built without producing sphinx warnings. The sphinx environment is designed to be used in continuous integration in order to prevent breaking documentation build when committing changes. The sphinx-dev environment is designed to be used inside a full swh development environment. Related to T3258
-
- Apr 16, 2021
-
-
Antoine Lambert authored
It enables to successfully invoke make in the docs folder.
-
- Apr 13, 2021
-
- Apr 08, 2021
-
-
vlorentz authored
-
- Apr 06, 2021
-
-
Nicolas Dandrimont authored
It has existed, at least at some point in the past, even though I'm currently unable to reproduce a dsc with that field in it. (Hence the lack of test fixture...)
-
vlorentz authored
They already write it with raw_extrinsic_metadata_add/extid_add, and read it with extid_get_*. This code was only kept for compatibility while we were migrating the extids. This is now done, so this code is useless.
-
vlorentz authored
-
vlorentz authored
-
- Apr 02, 2021
- Apr 01, 2021
-
-
vlorentz authored
-
- Mar 30, 2021
-
- Mar 29, 2021
-
-
vlorentz authored
This allows future runs of a loader to know a package was already loaded, without querying each of the revisions individually and parsing their metadata. Eventually, this will allow us to get rid of the 'metadata' column on the 'revision' table entirely.
-
- Mar 26, 2021
-
-
vlorentz authored
To check which packages are already downloaded. For now, this lookup is done in addition to checking the artifacts from the last snapshot's revisions' metadata, because we did not start writing ExtIDs yet. But the ExtID lookup will eventually replace the artifact-based lookup. This will finally allow us to drop the 'metadata' field of Revision objects.
-
vlorentz authored
We used a string instead of a tuple. It doesn't matter much because they are only compared with each other, but let's not intentionally use the wrong types when we don't need to.
-
- Mar 25, 2021
-
-
vlorentz authored
A future commit will introduce resolve_revision_from_extid, so this commit preemptively renames it to avoid any confusion.
-
- Mar 23, 2021
-
-
vlorentz authored
In a future commit, we will need to go through all the PackageInfo objects before running the loop, so we can get their ExtID and fetch them from the storage. So, we need to fetch them all before running the load loop, using this listcomp.
-
vlorentz authored
These three loaders get intrinsic metadata from the archive, and use it to build the revision object (mostly authoring and date), which means they would not load the same revision as an other loader given the same archive.
-
vlorentz authored
In a future commit, we will need to go through all the PackageInfo objects before running the loop, so we can get their ExtID and fetch them from the storage. So, we need to fetch them all before running the load loop, using this listcomp.
-
vlorentz authored
I found the old definition to be quite confusing when refactoring this code.
-
vlorentz authored
This is still a purely internal change for now, but it will be needed to read/write ExtIDs from/to the storage.
-
vlorentz authored
All package loaders but deposit had logic to compute some object from the new packageinfo, some other objects from the known artifacts, and compare them. This commit moves the comparison logic to the base class, and unifies the two computation interfaces, respectively as an extid() method on TPackageInfo and a method on the loader. This unified object for comparison is a byte string, which is internal to each loader for now, but a future commit will read and write it from/to the ExtID storage instead of computing it from the 'original_artifacts' present in revision metadata.
-
vlorentz authored
We want to store these identifiers in the ExtID storage, which expects a (preferably short) bytearray; but the 'artifact_identity' was a list of (possibly long) strings and ints. While this commit does not write them to the ExtID storage yet, it makes these two loaders use them internally. Assuming no sha256 collision, this does not change their behavior when seen from the outside, with two exceptions: * the list of keys to use is now configured with a template string * configuring an unknown key now raises a KeyError instead of silently using a None value. But we never use this configuration setting, so in practice there is no change at all.
-
vlorentz authored
We will need it independently in a future commit
-
vlorentz authored
We will need it independently in a future commit
-
vlorentz authored
We will need it independently in a future commit
-
vlorentz authored
Instead of the sha256 + name + ... of all the files of a package. This will be needed to transition to ExtID, as we can't reasonably write this large set in the ExtID storage; and the sha256 of the .dsc is good enough, as the .dsc contains hashes and names of other files.
-
vlorentz authored
It's stricter and more readable.
-