Skip to content
GitLab
Projects Groups Snippets
  • /
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
    • Contribute to GitLab
  • Register
  • Sign in
  • M Meta
  • Project information
    • Project information
    • Activity
    • Labels
    • Members
  • Issues 459
    • Issues 459
    • List
    • Boards
    • Service Desk
    • Milestones
  • Snippets
    • Snippets
  • Activity
  • Create a new issue
  • Issue Boards
Collapse sidebar
  • Platform
  • Meta
  • Issues
  • #4550
Closed
Open
Issue created Sep 23, 2022 by Stefano Zacchiroli@zackMaintainer

dataset: document the AWS S3 bucket for content objects

The public Amazon S3 bucket located at s3://softwareheritage/content/ contains copies of all the content objects of the archive. The format is 1 file for each blob, named as its SHA1 (not git salted), containing the actual byte sequence as a gzipped object. We should document this as a dataset, side-by-side with the graph dataset, at https://docs.softwareheritage.org/devel/swh-dataset/


Migrated from T4550 (view on Phabricator)

Assignee
Assign to
Time tracking