Commits · v2.1.1 · vlorentz / Generic VCS and Package Loader

Dec 09, 2021
- nixguix: Fix crash when filtering extids on archives that were already loaded,... · 01aaf1bd
  vlorentz authored 3 years ago
```
nixguix: Fix crash when filtering extids on archives that were already loaded, but only from different URLs
```
  v2.1.1
  
  01aaf1bd
Dec 08, 2021

nixguix: Filter out releases with URLs different from the expected one · b211232a

vlorentz authored 3 years ago

This solves two problems:

1. if the URL changes but the content doesn't, then the new snapshot would keep
   using the release with the old URL in its name.
2. if there are two URLs pointing to the same content, the base loader would
   crash because it cannot know which one to pick.

b211232a

nixguix: Deduplicate test data · 636c1ddb
vlorentz authored 3 years ago

636c1ddb

maven: Use the instance base_url as metadata authority URL · 98506af0

vlorentz authored 3 years ago

instead of just its netloc, as it is possibly to have multiple maven instances
hosted under the same domain but at different paths.

The code is also simpler this way.

98506af0

Dec 07, 2021

maven: Don't carry deleted versions over to the next snapshot · a96389f5

vlorentz authored 3 years ago

Snapshots should only record versions that currently exist;
even if they used to exist in a previous visits.

If readers of the archive want to access deleted versions,
than can look up older snapshots.

a96389f5

maven: Make MavenPackageInfo.from_metadata more concise · e8b6ed5a
vlorentz authored 3 years ago

e8b6ed5a
maven: Simplify definition of the 'version_artifact' dict · 5da115b6
vlorentz authored 3 years ago
```
We don't need it to be ordered; and '.keys()' is redundant.
```
5da115b6
maven: Simplify build_extrinsic_directory_metadata. · ccf71383
vlorentz authored 3 years ago

ccf71383
maven: Add typing to the artifacts dict · a76ab288
vlorentz authored 3 years ago

a76ab288
docs: Fix inconsistent example versions · 79b1075e
vlorentz authored 3 years ago

79b1075e
maven: Remove dead code for extid computation · 11ea5f51
vlorentz authored 3 years ago
```
It was copied from the Archive Loader, but is not needed here.
```
11ea5f51
package-loaders: Add support for extid versions, and bump it for Debian · 0fe66b71
vlorentz authored 3 years ago
```
The previous commit updated the format of release, and we must not reuse releases
created before, hence the bump.
```
v2.0.0

0fe66b71

debian: Remove the extrinsic version from release names · c87c8c79

vlorentz authored 3 years ago

Use only the intrinsic version (eg. 1.0.0) instead of the extrinsic version
(eg. stretch/contrib/1.0.0).
Releases should only contain data from DSC, not external 'pointers' to them.

Additionally, having extrinsic data in releases means the same
dsc-sha256 extid can point to different releases, which meant the loader
may reuse a release mentioning a specific suite as a release in a
different suite.
With this commit, this won't be a problem anymore, as releases won't
mention the suite at all, so suites can safely share extids.

c87c8c79

debian: Fix confusion between the two versions · 26996ef3

vlorentz authored 3 years ago

'version' was documented as the intrinsic version (eg. '0.7.2-3') and
'full_version' as the one containing the suite name (eg. 'stretch/contrib/0.7.2-3').

In practice, it was the opposite, except in a few incorrect test.

This commit fixes said tests, and renamed 'full_version' to
'intrinsic_version'.

This is only a refactoring, the behavior is unchanged for now;
but a future commit will remove the 'version' (which is extrinsic) from
the release name (which should contain only data intrinsic to the DSC).

26996ef3

Dec 06, 2021

debian: Add md5 sum fallback when sha* checksum is missing in metadata · 2d9e93a2

Antoine Lambert authored 3 years ago

In order to check successful download of a package file, the debian loader
will compare sha256 or sha1 checksum of the file with the one located in
debian dsc file.

However for old debian-based distributions (some ubuntu old releases for
instance) the only available checksum in the dsc file is a md5 sum.

So add a fallback to use md5 sum to check successful download when sha*
checksum is missing in the dsc file.

Related to T2400

2d9e93a2

loader: add new maven-jar loader · 89f5ccc7

Boris Baldassari authored 3 years ago

The maven loader loads jar and zip files as Maven artefacts into the software heritage archive.

Note:
Supersedes D6158 and addresses the review done in that diff.

Related to T1724

89f5ccc7

Dec 03, 2021

package.loader: Deduplicate extid target · 5d22455c
Antoine R. Dumont authored 3 years ago
```
Related to T3763
```
v1.2.1 Verified

5d22455c

package.loader: Deduplicate target SWHIDs · b3d7632d

Antoine R. Dumont authored 3 years ago

So package loaders can actually finish their ingestion even when multiple releases
target the same directory.

Related to T3763

Verified

b3d7632d

debian/tasks: Rename loading task function to fix scheduling · 3e675d0d

Antoine Lambert authored 3 years ago

Loading task function must be named load_{visit_type} in order for
the scheduler to sucessfully create loading tasks.

Visit type name for debian packages is deb so the loading task
function must be renamed to load_deb.

Related to T2400

3e675d0d

package/debian: Handle extra sha1 sum in source package metadata · 22478fa8

Antoine Lambert authored 3 years ago

Some debian source package metadata have extra sha1 sums for their
files, for instance those from the ubuntu hirsute suite.

So add an optional sha1 field in DebianFileMetadata model in order
to avoid loading errors.

Related to T2400

22478fa8

debian: Remove unused date parameter of DebianLoader · b423b682
Antoine Lambert authored 3 years ago

b423b682

Dec 02, 2021
- package-loader-tutorial: Update to mention releases instead of revisions · 99490933
  vlorentz authored 3 years ago
```
To match the current version of the code.
```
  99490933
Dec 01, 2021
- package-loader-tutorial: Add a checklist · 6ae5f54c
  vlorentz authored 3 years ago
  
  6ae5f54c
- package-loader-tutorial: Highlight the recommendation to submit the loader early. · 8a3df137
  vlorentz authored 3 years ago
  
  8a3df137
Nov 22, 2021

Package loaders: Add a newline at the end of the message · 021880d0
vlorentz authored 3 years ago
```
To be consistent with Git.
```
v1.1.0

021880d0

Package loader: Uniformize author and message · 2ab367ba

vlorentz authored 3 years ago

Authors: use the empty string '' instead of placeholders
Message: use the same message format (inspired by the Debian loader)
 for all loaders, instead of the empty string / the version /
 something else; except for PyPI and Deposit (which have a better
 format because we have more metadata available).

Additionally, this commit adds test of each release object,
instead of only relying on its hash.

2ab367ba

Nov 10, 2021
- Fix tests when run by gbp on Sid. · fbc8ee14
  vlorentz authored 3 years ago
  
  v1.0.1
  
  fbc8ee14
- utils: Add types and let log instruction do the formatting · 1f1bdad8
  Antoine R. Dumont authored 3 years ago
  
  Verified
  
  1f1bdad8
Nov 09, 2021
- Refactor package loaders to make the version part of BasePackageInfo · 059c71b6
  vlorentz authored 3 years ago
```
Half the loaders already had a version field in their PackageInfo class;
and the version name needed to be passed almost everywhere p_info already did.

This removes some duplication and inconsistencies between loaders.
```
  v1.0.0
  
  059c71b6
- Refactor package loaders to remove temporary revision objects · 2af673b6
  vlorentz authored 3 years ago
  
  2af673b6
- Document how each package loader populates fields. · 05496787
  vlorentz authored 3 years ago
  
  05496787
Nov 08, 2021

Make package loaders write releases instead of revisions · 89417bb0

vlorentz authored 3 years ago

The artifacts they load match the semantics of a Release, but we used Revisions
so far because of technical details (we needed the 'metadata' field of Revision
that Release lacks) that is no longer relevant (thanks to the metadata storage).

Packages that were loaded by previous versions of the package loader (as revs)
will be converted to releases. In order to avoid fetching them from the origin,
the loader will look for an existing extid pointing to a revision (like it used
to), fetch that revision, extract some fields (directory id, author, date, ...)
and build a new release using this information.

This commit is unfortunately very large because of all changes in tests, mostly
just new hashes and renaming 'revision' to 'release' (and various abbreviations
and capitalizations).

The only meaningful changes are in swh/loader/package/tests/test_loader.py and
swh/loader/package/loader.py.

To keep this commit as short as possible, I did not yet change individual loaders
to create releases: they still create revisions, but are converted by the base
loader. The next commit will refactor them to remove this conversion layer.

89417bb0

Nov 04, 2021

tests: Remove duplicate checks · c0a98a5c

vlorentz authored 3 years ago

All the '*_missing' tests are already done automatically by check_snapshot
(it recursively checks all objects are present in the storage).

c0a98a5c

tests: Hide utilities from stack traces · 2311ad9b

vlorentz authored 3 years ago

They clutter the test output because pytest prints the whole code
of the function raising the assertionerror.

With this magic variable, the error is shown as if it was raised
directly in the caller's body.

2311ad9b

package loaders: Make test failures more helpful · 551c55ff

vlorentz authored 3 years ago

Some tests did the following:

1. build a snapshot
2. get the snapshot from the storage
3. compare it with the expected snapshot
4. get the origin visit from the storage and check it

If the loader built a wrong snapshot, the test fails at step 2,
and the only information displayed is that the expected snapshot id
does not exist, which is very unhelpful.

Instead, I reordered them as: 1, 4, 2, 3. This way, if a wrong
snapshot is build by the loader, it is detected when comparing
the visit, and pytest shows the two hashes.
Then, the test can be modified to use the hash that is actually
generated to show the actual snapshot.

This is consistent with what was already done in the pypi loader.

Additionally, I made the following changes:

1. always check stats last (because a difference in numbers is
   hardly actionable without testing other objects)
2. add a few more snapshot id checks in visits
3. deduplicated a hardcoded snapshot id.

551c55ff

deposit: Remove 'parent' deposit · 89a0bfee

vlorentz authored 3 years ago

The parent is computed by the deposit as the revision of the latest deposit
in the same origin before the current one.
Therefore, it is redundant, as it can be recomputed from metadata
+ revision date.

This is a preliminary change needed to make package loaders produce
releases instead of revisions, as releases don't have parent relationships

89a0bfee

opam: Write package definitions to the extrinsic metadata storage · aeffe01a
vlorentz authored 3 years ago

aeffe01a
Add missing documentation for `get_metadata_authority`. · 18bbbae7
vlorentz authored 3 years ago

18bbbae7

Nov 03, 2021

Revert "deposit: Remove 'parent' deposit" · 5063082e

vlorentz authored 3 years ago

This reverts commit f6905cdf.

That commit was a first step toward making loaders write releases
instead of revisions.

Unfortunately, we will still write revisions for a non-negligeable time,
so I prefer to defer the removal of parent deposit revisions to the
moment we actually make that switch, so we don't end up with inconsistent
revisions.

5063082e

Oct 21, 2021
- Remove unused 'known_artifacts' code · 9f882793
  vlorentz authored 3 years ago
```
extids are used instead now, this is all dead code.
```
  9f882793