Skip to content
Snippets Groups Projects
Forked from Platform / Development / swh-docs
272 commits behind the upstream repository.
  • vlorentz's avatar
    edda8b3a
    Remove second-level headings from the dev doc + hide TOC · edda8b3a
    vlorentz authored and vlorentz's avatar vlorentz committed
    The only two titles at the second level were 'Important documentation links'
    and 'Content', which look weird and are not very useful.
    
    And this level prevented headings inside package documentation from being
    shown in the TOC at all, which means the TOC couldn't be used to navigate
    within a package's documentation.
    edda8b3a
    History
    Remove second-level headings from the dev doc + hide TOC
    vlorentz authored and vlorentz's avatar vlorentz committed
    The only two titles at the second level were 'Important documentation links'
    and 'Content', which look weird and are not very useful.
    
    And this level prevented headings inside package documentation from being
    shown in the TOC at all, which means the TOC couldn't be used to navigate
    within a package's documentation.
index.rst 7.07 KiB

Software Heritage - Development Documentation

Getting started

  • :ref:`getting-started` → deploy a local copy of the Software Heritage software stack in less than 5 minutes, or
  • :ref:`developer-setup` → get a working development setup that allows to hack on the Software Heritage software stack
  • :ref:`faq`

Contributing

  • :ref:`patch-submission` → learn how to submit your patches to the Software Heritage codebase
  • :ref:`code-review` → rules and guidelines to review code in Software Heritage
  • :ref:`python-style-guide` → how to format the Python code you write

Architecture

  • :ref:`architecture-overview` → get a glimpse of the Software Heritage software architecture
  • :ref:`Metadata workflow <architecture-metadata>` → learn how Software Heritage stores and handles metadata

Data Model and Specifications

  • :ref:`persistent-identifiers` Specifications of the SoftWare Heritage persistent IDentifiers (SWHID).
  • :ref:`data-model` Documentation of the main |swh| archive data model.
  • :ref:`journal-specs` Documentation of the Kafka journal of the |swh| archive.

Tutorials

  • :ref:`testing-guide`
  • :doc:`tutorials/issue-debugging-monitoring`
  • :ref:`Listing the content of your favorite forge <lister-tutorial>` and :ref:`running a lister in Docker <run-lister-tutorial>`
  • :ref:`Add a new swh package <tutorial-new-package>`
  • :ref:`doc-contribution`

Roadmap

  • Current roadmap: :ref:`roadmap-current`
  • Previous roadmaps
    • :ref:`roadmap-2021`

System Administration

  • :ref:`Network Infrastructure <infrastructure>`
  • :ref:`mirror` → learn what a Software Heritage mirror is and how to set up one
  • :ref:`Keycloak <keycloak>` → learn how to use Keycloak, the authentication system used by |swh|'s web interface and public APIs

Components

Here is brief overview of the most relevant software components in the Software Heritage stack, in alphabetical order. For a better introduction to the architecture, see the :ref:`architecture-overview`, which presents each of them in a didactical order.

Each component name is linked to the development documentation of the corresponding Python module.

:ref:`swh.auth <swh-auth>`
low-level library used by modules needing keycloak authentication
:ref:`swh.core <swh-core>`
low-level utilities and helpers used by almost all other modules in the stack
:ref:`swh.counters <swh-counters>`
service providing efficient estimates of the number of objects in the SWH archive, using Redis's Hyperloglog
:ref:`swh.dataset <swh-dataset>`
public datasets and periodic data dumps of the archive released by Software Heritage
:ref:`swh.deposit <swh-deposit>`
push-based deposit of software artifacts to the archive
swh.docs
developer documentation (used to generate this doc you are reading)
:ref:`swh.fuse <swh-fuse>`
Virtual file system to browse the Software Heritage archive, based on FUSE
:ref:`swh.graph <swh-graph>`
Fast, compressed, in-memory representation of the archive, with tooling to generate and query it.
:ref:`swh.graphql <swh-graphql>`
GraphQL API to request archive data offering more precise and flexible queries than the REST API.
:ref:`swh.indexer <swh-indexer>`
tools and workers used to crawl the content of the archive and extract derived information from any artifact stored in it
:ref:`swh.journal <swh-journal>`
persistent logger of changes to the archive, with publish-subscribe support
:ref:`swh.lister <swh-lister>`
collection of listers for all sorts of source code hosting and distribution places (forges, distributions, package managers, etc.)
:ref:`swh.loader-core <swh-loader-core>`
low-level loading utilities and helpers used by all other loaders
:ref:`swh.loader-bzr <swh-loader-bzr>`
loader for Bazaar and Breezy repositories
:ref:`swh.loader-git <swh-loader-git>`
loader for Git repositories
:ref:`swh.loader-mercurial <swh-loader-mercurial>`
loader for Mercurial repositories
:ref:`swh.loader-metadata <swh-loader-metadata>`
pseudo-loader, which fetches :term:`extrinsic metadata` from forges instead of software artifacts
:ref:`swh.loader-svn <swh-loader-svn>`
loader for Subversion repositories
:ref:`swh.loader-cvs <swh-loader-cvs>`
loader for CVS repositories
:ref:`swh.model <swh-model>`
implementation of the :ref:`data-model` to archive source code artifacts
:ref:`swh.objstorage <swh-objstorage>`
content-addressable object storage
:ref:`swh.objstorage.replayer <swh-objstorage-replayer>`
Object storage replication tool
:ref:`swh.perfecthash <swh-perfecthash>`
Low level management for read-only content-addressable object storage indexed with a perfect hash table
:ref:`swh.scanner <swh-scanner>`
source code scanner to analyze code bases and compare them with source code artifacts archived by Software Heritage
:ref:`swh.scheduler <swh-scheduler>`
task manager for asynchronous/delayed tasks, used for recurrent (e.g., listing a forge, loading new stuff from a Git repository) and one-off activities (e.g., loading a specific version of a source package)
:ref:`swh.scrubber <swh-scrubber>`
Tooling to check integrity of various data stores (swh.journal, swh.objstorage, swh.storage) and fix corrupt objects they contain.
:ref:`swh.search <swh-search>`
search engine for the archive
:ref:`swh.storage <swh-storage>`
abstraction layer over the archive, allowing to access all stored source code artifacts as well as their metadata
:ref:`swh.vault <swh-vault>`
implementation of the vault service, allowing to retrieve parts of the archive as self-contained bundles (e.g., individual releases, entire repository snapshots, etc.)
:ref:`swh.web <swh-web>`
Web application(s) to browse the archive, for both interactive (HTML UI) and mechanized (REST API) use
:ref:`swh.web.client <swh-web-client>`
Python client for :ref:`swh.web <swh-web>`

Dependencies

The dependency relationships among the various modules are depicted below.

Dependencies among top-level Python modules (click to zoom).

Archive

  • :ref:`Archive ChangeLog <archive-changelog>`: notable changes to the archive over time