lookup ingested tarballs (or similar source code containers) by container checksum
Package repositories (Pypi and Hackage for instance) provide a checksum for their package. Unfortunately, this checksum is computed on the tarball itself, and not on the content. A direct consequence is that the checksum of a release downloaded on Software Heritage is not equal to the checksum of the same release exposed by the package repositories (because SWH doesn't preserve file permissions, timestamps,...).
In the context of Nix, we often compare the checksum provided by package repositories to the checksum of the downloaded artifact. In this case, the Nix ckecksum verification fails if we download an artifact from SWH. It would actually be really nice if package repositories could expose a checksum on the content and not on the container (the tarball)!
Do you think it would be possible/pertinent to create a swhid for tarballs?
I'm thinking on something such as swh:1:tar:XXXX
. To compute the hash, the file would first be unpacked and the checksum would be computed on the content. To verify this hash, we would know we have to unpack the file before computing the hash.
Note there are corner cases that could be hard to manage, such as archives without any top level directory.
Migrated from T2430 (view on Phabricator)