- Feb 26, 2025
-
-
This was used at the time we were building debian packages for swh components but we no longer do that.
-
- Dec 18, 2024
-
-
Antoine Lambert authored
Instead of implementing the versions sorting in each package loader prefer to have a base implementation in swh.loader.package.PackageLoader class through the get_sorted_versions method. It relies on the looseversion module enabling to interact with heterogeneous version schemes which works pretty well with a large majority of package loaders. The get_default_version method of the PackageLoader class now also has a base implementation returning the last element from the list returned by the get_sorted_versions method. As a consequence, each snapshot produced by a package loader contains a HEAD alias branch targeting the branch for the highest version number of a package. Both methods can be reimplemented in package loaders for special cases like debian for instance. Also remove the use of the packaging module to parse versions as it is only dedicated to parse Python package versions. Related to swh-lister#4711.
-
- May 22, 2024
-
-
David Douard authored
This is needed to make swh.loader.core not depend on swh.loader.package.
-
- May 15, 2024
-
-
Pierre-Yves David authored
-
- Dec 04, 2023
-
-
David Douard authored
-
- Oct 04, 2022
-
-
Antoine Lambert authored
Add a dedicated fixture implementing loader task creation check for a given lister and listed origin and use it in tasks tests for available loaders. Also remove redundant tests performing the same checks as that new fixture.
-
- Sep 30, 2022
-
-
Antoine Lambert authored
When one or multiple tarball checksums are available, either from listers output or from Web APIs calls perfomed by some loaders, use them to check integrity of downloaded tarballs.
-
- Apr 27, 2022
-
-
Antoine Lambert authored
Recent changes in swh-scheduler add new parameters to the celery tasks produced from swh.scheduler.model.ListedOrigin instances. So ensure to handle any new parameters by not hardcoding the expected ones in task signatures. Remove unsafe use of unnamed task parameters. Add new tests checking task parameters produced from ListedOrigin instances do no raise error when attempting to create a package loader. Related to T4187
-
- Apr 21, 2022
-
-
vlorentz authored
1. Pass **kwargs to the base loader, instead of repeating the args 2. Remove redundant attribute initialization
-
- Apr 08, 2022
-
-
Antoine Lambert authored
Related to T3922
-
- Jan 11, 2022
-
-
vlorentz authored
A future release of swh-model will change its constructor's signature (replace 'offset' and 'negative_utc_offset' with 'offset_bytes). This leaves one occurence of a direct use of the constructor, as from_datetime() does not allow negative UTC.
-
- Nov 22, 2021
-
-
vlorentz authored
To be consistent with Git.
-
vlorentz authored
Authors: use the empty string '' instead of placeholders Message: use the same message format (inspired by the Debian loader) for all loaders, instead of the empty string / the version / something else; except for PyPI and Deposit (which have a better format because we have more metadata available). Additionally, this commit adds test of each release object, instead of only relying on its hash.
-
- Nov 09, 2021
- Nov 08, 2021
-
-
vlorentz authored
The artifacts they load match the semantics of a Release, but we used Revisions so far because of technical details (we needed the 'metadata' field of Revision that Release lacks) that is no longer relevant (thanks to the metadata storage). Packages that were loaded by previous versions of the package loader (as revs) will be converted to releases. In order to avoid fetching them from the origin, the loader will look for an existing extid pointing to a revision (like it used to), fetch that revision, extract some fields (directory id, author, date, ...) and build a new release using this information. This commit is unfortunately very large because of all changes in tests, mostly just new hashes and renaming 'revision' to 'release' (and various abbreviations and capitalizations). The only meaningful changes are in swh/loader/package/tests/test_loader.py and swh/loader/package/loader.py. To keep this commit as short as possible, I did not yet change individual loaders to create releases: they still create revisions, but are converted by the base loader. The next commit will refactor them to remove this conversion layer.
-
- Nov 04, 2021
-
-
vlorentz authored
All the '*_missing' tests are already done automatically by check_snapshot (it recursively checks all objects are present in the storage).
-
vlorentz authored
Some tests did the following: 1. build a snapshot 2. get the snapshot from the storage 3. compare it with the expected snapshot 4. get the origin visit from the storage and check it If the loader built a wrong snapshot, the test fails at step 2, and the only information displayed is that the expected snapshot id does not exist, which is very unhelpful. Instead, I reordered them as: 1, 4, 2, 3. This way, if a wrong snapshot is build by the loader, it is detected when comparing the visit, and pytest shows the two hashes. Then, the test can be modified to use the hash that is actually generated to show the actual snapshot. This is consistent with what was already done in the pypi loader. Additionally, I made the following changes: 1. always check stats last (because a difference in numbers is hardly actionable without testing other objects) 2. add a few more snapshot id checks in visits 3. deduplicated a hardcoded snapshot id.
-
- Apr 06, 2021
-
-
vlorentz authored
They already write it with raw_extrinsic_metadata_add/extid_add, and read it with extid_get_*. This code was only kept for compatibility while we were migrating the extids. This is now done, so this code is useless.
-
- Mar 30, 2021
-
-
vlorentz authored
Like the PyPI and NPM loader. It allows scripts to use the method without creating a loader instance.
-
- Mar 23, 2021
-
-
vlorentz authored
These three loaders get intrinsic metadata from the archive, and use it to build the revision object (mostly authoring and date), which means they would not load the same revision as an other loader given the same archive.
-
vlorentz authored
This is still a purely internal change for now, but it will be needed to read/write ExtIDs from/to the storage.
-
vlorentz authored
All package loaders but deposit had logic to compute some object from the new packageinfo, some other objects from the known artifacts, and compare them. This commit moves the comparison logic to the base class, and unifies the two computation interfaces, respectively as an extid() method on TPackageInfo and a method on the loader. This unified object for comparison is a byte string, which is internal to each loader for now, but a future commit will read and write it from/to the ExtID storage instead of computing it from the 'original_artifacts' present in revision metadata.
-
vlorentz authored
We want to store these identifiers in the ExtID storage, which expects a (preferably short) bytearray; but the 'artifact_identity' was a list of (possibly long) strings and ints. While this commit does not write them to the ExtID storage yet, it makes these two loaders use them internally. Assuming no sha256 collision, this does not change their behavior when seen from the outside, with two exceptions: * the list of keys to use is now configured with a template string * configuring an unknown key now raises a KeyError instead of silently using a None value. But we never use this configuration setting, so in practice there is no change at all.
-
- Feb 16, 2021
-
-
Antoine R. Dumont authored
This unifies and centralizes the instantiation the same way the lister does. This introduces a new base class swh.loader.core.loader.Loader for all loaders whose only concern for now is to instantiate loaders from either a configuration dict or a configuration file. This simplifies instantiation in celery task code and avoids duplicating the configuration load in each loader constructor. The end goal is to simplify the future refactoring on configuration. With the following, we will only have to adapt the Loader class when we start simplifying uniformly the configuration. Also note that I mostly reused the equivalent `swh.lister.pattern.Lister.from_config*`. I did not refactor the common behavior (to avoid throwing another dependency in the mix). That could always be refactored later. (inspired by both the work on listers and the configuration system work) Related to T1410
-
- Feb 05, 2021
-
-
Antoine R. Dumont authored
When: - failure to communicate internally with the storage - absolutely no revision got loaded during a visit Related to T3030
-
- Sep 17, 2020
-
-
vlorentz authored
The deduplication logic of 'person' objects is an internal detail of storage backends, so it's better not to rely on it.
-
Antoine Lambert authored
Related to T2610
-
- Jul 31, 2020
-
-
vlorentz authored
It errors in test_cran_parse_date, I don't understand why it worked so far.
-
- Jul 24, 2020
-
-
vlorentz authored
The rename is to disambiguate with 'raw metadata', which may differ from the raw info. And the base PackageLoader doesn't need to access this field, so removing it from BasePackageInfo.
-
- Jul 23, 2020
-
-
vlorentz authored
This commit does the following: * Move artifact_identity to BasePackageInfo, which uses a class attribute (and is overriden for ArchivePackageInfo, which needs a custom behavior to override keys). Also moved/improved its test * Add attributes to *PackageInfo classes, that can be accessed instead of the raw metadata. * Add a from_metadata class method to all *PackageInfo classes, to parse the raw metadata and build the object from it. * Pass the PackageInfo object to resolve_revision_from and build_revision instead of untyped dicts.
-
vlorentz authored
The benefits are minimal for now, as 'raw' still contains a lot of stuff; but further commits will move data out of 'raw' to a proper attribute.
-
- Jul 16, 2020
-
-
Antoine R. Dumont authored
Related to T2494
-
- Jul 15, 2020
-
-
David Douard authored
-
- Jul 10, 2020
-
-
David Douard authored
branch names and targets are expected to be bytes. This should allow to get rid of the type castings in check_snapshot().
-
- Jul 09, 2020
-
- Jul 06, 2020
-
-
Antoine R. Dumont authored
It's shared amongst both package and core loaders. Related to T2481
-
Antoine R. Dumont authored
It's shared amongst both package and core loaders. Related to T2481
-
- Jun 22, 2020
-
-
Antoine R. Dumont authored
Related to T2310
-
- Jun 03, 2020
-
-
Antoine R. Dumont authored
Related to D3177
-