Make package loaders create releases objects instead of revisions
Migrated from T3638 (view on Phabricator)
- Show closed items
- swh/infra/sysadm-environment #3722
- swh/infra/sysadm-environment #3745
Activity
-
Newest first Oldest first
-
Show all activity Show comments only Show history only
- vlorentz mentioned in issue #4297 (closed)
mentioned in issue #4297 (closed)
- vlorentz mentioned in issue #3636 (closed)
mentioned in issue #3636 (closed)
- Phabricator Migration user marked this issue as related to #3636 (closed)
marked this issue as related to #3636 (closed)
- vlorentz added Archive content Data Model Package Loader priority:Normal labels
added Archive content Data Model Package Loader priority:Normal labels
Here is an overview of the fields (+ internal version name + branch name) used by each package loader, after swh/devel/swh-loader-core!237 (closed):
Loader version branch name message target target_type synthetic author date Notes archive passed as arg release_name( version) =version "swh-loader-package: synthetic revision message" ... dir true SWH robot passed as arg cran metadata.get( "Version", passed as arg) release_name( version) =version =version ... dir true metadata.get( "Maintainer", "") metadata.get( "Date") metadata is intrinsic debian passed as arg (eg. stretch/contrib/0.7.2-3
)release_name( version) =version "Synthetic revision for Debian source package %s version %s" ... dir true metadata .changelog .person metadata .changelog .date metadata is intrinsic. no more RevisionType.DSC deposit HEAD only HEAD HEAD {client}: Deposit {id} in collection {collection} ... dir true SWH robot <codemeta: dateCreated>
from SWORD XMLrevisions had parents nixguix URL URL URL "" ... dir true "" None it's the URL of the artifact referenced by the derivation npm metadata["version"] release_name( version) =version =version ... dir true from int metadata or "" from ext metadata or None opam as given by opam "{opam_package}.{version}" =version =version ... dir true from metadata None "{self.opam_package}.{version}" matches the version names used by opam's backend. metadata is extrinsic pypi metadata["version"] release_name( version) or release_name( version, filename) =version "{version}: {metadata[ 'comment_text']}" or just version ... dir true from int metadata or "" from ext metadata or None metadata is intrinsic using this function:
def release_name(version: str, filename: Optional[str] = None) -> str: if filename: return "releases/%s/%s" % (version, filename) return "releases/%s" % version
- Phabricator Migration user marked this issue as related to swh/infra/sysadm-environment#3722 (closed)
marked this issue as related to swh/infra/sysadm-environment#3722 (closed)
- Phabricator Migration user marked this issue as related to #3728 (closed)
marked this issue as related to #3728 (closed)
- Phabricator Migration user marked this issue as related to swh/infra/sysadm-environment#3745 (closed)
marked this issue as related to swh/infra/sysadm-environment#3745 (closed)
- vlorentz closed
closed
Copy of an email I sent on 2021-11-17:
Context
Since their creation, SWH's package loaders create "revision" objects to represent packages rather than "release", even though releases matched their meaning more closely (see https://docs.softwareheritage.org/devel/swh-model/data-model.html#software-artifacts)
This was due to technicalities, that prevented them from storing some metadata they needed on objects other than revisions. Thanks to recent work (the "extrinsic metadata storage" and "extids"), this is no longer a problem, so we are ready to make them write releases instead.
The change
So last Wednesday, we pushed an update to SWH's staging environment to finally do the switch to releases; you can see the results at https://webapp.staging.swh.network/ by searching for packages in your favorite package loader (NPM, OPAM, PyPI, ...) and looking for one visited within the last 7 days. For example: https://webapp.staging.swh.network/browse/origin/directory/?origin_url=https://www.npmjs.com/package/steam-market-manager
The update has four parts:
- new packages will be written as releases;
- the deposit loader will no longer write "parent" relationships between revisions; clients should list visits instead;
- existing revisions are automatically updated to releases (without re-fetching the package from the origin);
- (not deployed yet) we will use the opportunity to tweak values of fields populated in release objects to be more consistent across package loaders https://forge.softwareheritage.org/swh/devel/swh-loader-core!420
Existing visits will remain unchanged, and their snapshot will keep pointing to revision objects.
VCS loaders (Git, Mercurial, SVN, ...) also remain unchanged.
What's next
Our tests show this is all working as intended, so we are going to make the same changes to (the loaders of) the main archive early next week.
This does not fundamentally change Software Heritage's data model. From the API point of view, this just means that the /api/1/snapshot/ endpoint will return more releases, and may now return snapshots that are made of only releases (we did not have any so far, as far as I know).
If you wrote an API client using this endpoint, please make sure this is not an issue.