- Dec 09, 2021
-
- Dec 08, 2021
-
-
vlorentz authored
This solves two problems: 1. if the URL changes but the content doesn't, then the new snapshot would keep using the release with the old URL in its name. 2. if there are two URLs pointing to the same content, the base loader would crash because it cannot know which one to pick.
-
vlorentz authored
-
vlorentz authored
instead of just its netloc, as it is possibly to have multiple maven instances hosted under the same domain but at different paths. The code is also simpler this way.
-
- Dec 07, 2021
-
-
vlorentz authored
Snapshots should only record versions that currently exist; even if they used to exist in a previous visits. If readers of the archive want to access deleted versions, than can look up older snapshots.
-
vlorentz authored
-
vlorentz authored
We don't need it to be ordered; and '.keys()' is redundant.
-
vlorentz authored
-
vlorentz authored
-
vlorentz authored
-
vlorentz authored
It was copied from the Archive Loader, but is not needed here.
-
vlorentz authored
Use only the intrinsic version (eg. 1.0.0) instead of the extrinsic version (eg. stretch/contrib/1.0.0). Releases should only contain data from DSC, not external 'pointers' to them. Additionally, having extrinsic data in releases means the same dsc-sha256 extid can point to different releases, which meant the loader may reuse a release mentioning a specific suite as a release in a different suite. With this commit, this won't be a problem anymore, as releases won't mention the suite at all, so suites can safely share extids.
-
vlorentz authored
'version' was documented as the intrinsic version (eg. '0.7.2-3') and 'full_version' as the one containing the suite name (eg. 'stretch/contrib/0.7.2-3'). In practice, it was the opposite, except in a few incorrect test. This commit fixes said tests, and renamed 'full_version' to 'intrinsic_version'. This is only a refactoring, the behavior is unchanged for now; but a future commit will remove the 'version' (which is extrinsic) from the release name (which should contain only data intrinsic to the DSC).
-
- Dec 06, 2021
-
-
Antoine Lambert authored
In order to check successful download of a package file, the debian loader will compare sha256 or sha1 checksum of the file with the one located in debian dsc file. However for old debian-based distributions (some ubuntu old releases for instance) the only available checksum in the dsc file is a md5 sum. So add a fallback to use md5 sum to check successful download when sha* checksum is missing in the dsc file. Related to T2400
-
Boris Baldassari authored
The maven loader loads jar and zip files as Maven artefacts into the software heritage archive. Note: Supersedes D6158 and addresses the review done in that diff. Related to T1724
-
- Dec 03, 2021
-
-
Antoine R. Dumont authored
Related to T3763
-
Antoine R. Dumont authored
So package loaders can actually finish their ingestion even when multiple releases target the same directory. Related to T3763
-
Antoine Lambert authored
Loading task function must be named load_{visit_type} in order for the scheduler to sucessfully create loading tasks. Visit type name for debian packages is deb so the loading task function must be renamed to load_deb. Related to T2400
-
Antoine Lambert authored
Some debian source package metadata have extra sha1 sums for their files, for instance those from the ubuntu hirsute suite. So add an optional sha1 field in DebianFileMetadata model in order to avoid loading errors. Related to T2400
-
Antoine Lambert authored
-
- Dec 02, 2021
-
-
vlorentz authored
To match the current version of the code.
-
- Dec 01, 2021
- Nov 22, 2021
-
-
vlorentz authored
Authors: use the empty string '' instead of placeholders Message: use the same message format (inspired by the Debian loader) for all loaders, instead of the empty string / the version / something else; except for PyPI and Deposit (which have a better format because we have more metadata available). Additionally, this commit adds test of each release object, instead of only relying on its hash.
- Nov 10, 2021
-
-
Antoine R. Dumont authored
- Nov 09, 2021
- Nov 08, 2021
-
-
vlorentz authored
The artifacts they load match the semantics of a Release, but we used Revisions so far because of technical details (we needed the 'metadata' field of Revision that Release lacks) that is no longer relevant (thanks to the metadata storage). Packages that were loaded by previous versions of the package loader (as revs) will be converted to releases. In order to avoid fetching them from the origin, the loader will look for an existing extid pointing to a revision (like it used to), fetch that revision, extract some fields (directory id, author, date, ...) and build a new release using this information. This commit is unfortunately very large because of all changes in tests, mostly just new hashes and renaming 'revision' to 'release' (and various abbreviations and capitalizations). The only meaningful changes are in swh/loader/package/tests/test_loader.py and swh/loader/package/loader.py. To keep this commit as short as possible, I did not yet change individual loaders to create releases: they still create revisions, but are converted by the base loader. The next commit will refactor them to remove this conversion layer.
-
- Nov 04, 2021
-
-
vlorentz authored
All the '*_missing' tests are already done automatically by check_snapshot (it recursively checks all objects are present in the storage).
-
vlorentz authored
They clutter the test output because pytest prints the whole code of the function raising the assertionerror. With this magic variable, the error is shown as if it was raised directly in the caller's body.
-
vlorentz authored
Some tests did the following: 1. build a snapshot 2. get the snapshot from the storage 3. compare it with the expected snapshot 4. get the origin visit from the storage and check it If the loader built a wrong snapshot, the test fails at step 2, and the only information displayed is that the expected snapshot id does not exist, which is very unhelpful. Instead, I reordered them as: 1, 4, 2, 3. This way, if a wrong snapshot is build by the loader, it is detected when comparing the visit, and pytest shows the two hashes. Then, the test can be modified to use the hash that is actually generated to show the actual snapshot. This is consistent with what was already done in the pypi loader. Additionally, I made the following changes: 1. always check stats last (because a difference in numbers is hardly actionable without testing other objects) 2. add a few more snapshot id checks in visits 3. deduplicated a hardcoded snapshot id.
-
vlorentz authored
The parent is computed by the deposit as the revision of the latest deposit in the same origin before the current one. Therefore, it is redundant, as it can be recomputed from metadata + revision date. This is a preliminary change needed to make package loaders produce releases instead of revisions, as releases don't have parent relationships
-
vlorentz authored
-
vlorentz authored
-
- Nov 03, 2021
-
-
vlorentz authored
This reverts commit f6905cdf. That commit was a first step toward making loaders write releases instead of revisions. Unfortunately, we will still write revisions for a non-negligeable time, so I prefer to defer the removal of parent deposit revisions to the moment we actually make that switch, so we don't end up with inconsistent revisions.
-
- Oct 21, 2021
-
-
vlorentz authored
extids are used instead now, this is all dead code.
-