Skip to content
GitLab
Projects Groups Snippets
  • /
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
    • Contribute to GitLab
  • Register
  • Sign in
  • S swh-model
  • Project information
    • Project information
    • Activity
    • Labels
    • Members
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributors
    • Graph
    • Compare
  • Issues 23
    • Issues 23
    • List
    • Boards
    • Service Desk
    • Milestones
  • Merge requests 5
    • Merge requests 5
  • CI/CD
    • CI/CD
    • Pipelines
    • Jobs
    • Schedules
  • Deployments
    • Deployments
    • Environments
    • Releases
  • Monitor
    • Monitor
    • Incidents
  • Analytics
    • Analytics
    • Value stream
    • CI/CD
    • Repository
  • Wiki
    • Wiki
  • Snippets
    • Snippets
  • Activity
  • Graph
  • Create a new issue
  • Jobs
  • Commits
  • Issue Boards
Collapse sidebar
  • Platform
  • Development
  • swh-model
  • Issues
  • #2430
Closed
Open
Issue created Jun 02, 2020 by Antoine Eiche@lewo

lookup ingested tarballs (or similar source code containers) by container checksum

Package repositories (Pypi and Hackage for instance) provide a checksum for their package. Unfortunately, this checksum is computed on the tarball itself, and not on the content. A direct consequence is that the checksum of a release downloaded on Software Heritage is not equal to the checksum of the same release exposed by the package repositories (because SWH doesn't preserve file permissions, timestamps,...).

In the context of Nix, we often compare the checksum provided by package repositories to the checksum of the downloaded artifact. In this case, the Nix ckecksum verification fails if we download an artifact from SWH. It would actually be really nice if package repositories could expose a checksum on the content and not on the container (the tarball)!

Do you think it would be possible/pertinent to create a swhid for tarballs?

I'm thinking on something such as swh:1:tar:XXXX. To compute the hash, the file would first be unpacked and the checksum would be computed on the content. To verify this hash, we would know we have to unpack the file before computing the hash.

Note there are corner cases that could be hard to manage, such as archives without any top level directory.


Migrated from T2430 (view on Phabricator)

Assignee
Assign to
Time tracking