From 5e339b44255d6a1e4e9a83e39f90cab8cf495a35 Mon Sep 17 00:00:00 2001
From: David Douard <david.douard@sdfa3.org>
Date: Mon, 12 Nov 2018 16:49:46 +0100
Subject: [PATCH] Add a glossary and begin to use it in the getting-started
 page

---
 docs/getting-started.rst |   6 +-
 docs/glossary.rst        | 169 +++++++++++++++++++++++++++++++++++++++
 docs/index.rst           |   1 +
 3 files changed, 173 insertions(+), 3 deletions(-)
 create mode 100644 docs/glossary.rst

diff --git a/docs/getting-started.rst b/docs/getting-started.rst
index b969b212..2ce6891d 100644
--- a/docs/getting-started.rst
+++ b/docs/getting-started.rst
@@ -119,7 +119,7 @@ Step 3 --- set up storage
 
 Then you will need a local storage service that will archive and serve source
 code artifacts via a REST API. The Software Heritage storage layer comes in two
-parts: a content-addressable object storage on your file system (for file
+parts: a content-addressable :term:`object storage` on your file system (for file
 contents) and a Postgres database (for the graph structure of the archive). See
 the :ref:`data-model` for more information. The storage layer is configured via
 a YAML configuration file, located at
@@ -137,13 +137,13 @@ a YAML configuration file, located at
           root: /srv/softwareheritage/objects/
           slicing: 0:2/2:4
 
-Make sure that the object storage root exists on the filesystem and is writable
+Make sure that the :term:`object storage` root exists on the filesystem and is writable
 to your user, e.g.::
 
   sudo mkdir -p /srv/softwareheritage/objects
   sudo chown "${USER}:" /srv/softwareheritage/objects
 
-You are done with object storage setup! Let's setup the database::
+You are done with :term:`object storage` setup! Let's setup the database::
 
   swh-db-init storage -d softwareheritage-dev
 
diff --git a/docs/glossary.rst b/docs/glossary.rst
new file mode 100644
index 00000000..596f8ffa
--- /dev/null
+++ b/docs/glossary.rst
@@ -0,0 +1,169 @@
+:orphan:
+
+.. _glossary:
+
+Glossary
+========
+
+.. glossary::
+
+   archive
+
+     An instance of the |swh| data store.
+
+   archiver
+
+     A component dedicated at replicating an :term:`archive` and ensure there
+     are enough copies of each element to ensure resiliency.
+
+   ark
+
+     `Archival Resource Key`_ (ARK) is a Uniform Resource Locator (URL) that is
+     a multi-purpose persistent identifier for information objects of any type.
+
+   artifact
+   software artifact
+
+     An artifact is one of many kinds of tangible by-products produced during
+     the development of software.
+
+   content
+   blob
+
+     A (specific version of a) file stored in the archive, identified by its
+     cryptographic hashes (SHA1, "git-like" SHA1, SHA256) and its size. Also
+     known as: :term:`blob`. Note: it is incorrect to refer to Contents as
+     "files", because files are usually considered to be named, whereas
+     Contents are nameless. It is only in the context of specific
+     :term:`directories <directory>` that :term:`contents <content>` acquire
+     (local) names.
+
+   directory
+
+     A set of named pointers to contents (file entries), directories (directory
+     entries) and revisions (revision entries). All entries are associated to
+     the local name of the entry (i.e., a relative path without any path
+     separator) and permission metadata (e.g., ``chmod`` value or equivalent).
+
+   doi
+
+     A Digital Object Identifier or DOI_ is a persistent identifier or handle
+     used to uniquely identify objects, standardized by the International
+     Organization for Standardization (ISO).
+
+   journal
+
+     The :ref:`journal <swh-journal>` is the persistent logger of the |swh| architecture in charge
+     of logging changes of the archive, with publish-subscribe_ support.
+
+   lister
+
+     A :ref:`lister <swh-lister>` is a component of the |swh| architecture that is in charge of
+     enumerating the :term:`software origin` (e.g., VCS, packages, etc.)
+     available at a source code distribution place.
+
+   loader
+
+     A :ref:`loader <swh-loader-core>` is a component of the |swh| architecture
+     responsible for reading a source code :term:`origin` (typically a git
+     reposiitory) and import or update its content in the :term:`archive` (ie.
+     add new file contents int :term:`object storage` and repository structure
+     in the :term:`storage database`).
+
+   hash
+   cryptographic hash
+   checksum
+   digest
+
+     A fixed-size "summary" of a stream of bytes that is easy to compute, and
+     hard to reverse. (Cryptographic hash function Wikipedia article) also
+     known as: :term:`checksum`, :term:`digest`.
+
+   indexer
+
+     A component of the |swh| architecture dedicated to producing metadata
+     linked to the known :term:`blobs <blob>` in the :term:`archive`.
+
+   objstore
+   objstorage
+   object store
+   object storage
+
+     Content-addressable object storage. It is the place where actual object
+     :term:`blobs <blob>` objects are stored.
+
+   origin
+   software origin
+   data source
+
+     A location from which a coherent set of sources has been obtained, like a
+     git repository, a directory containing tarballs, etc.
+
+   person
+
+     An entity referenced by a revision as either the author or the committer
+     of the corresponding change. A person is associated to a full name and/or
+     an email address.
+
+   release
+   tag
+   milestone
+
+     a revision that has been marked as noteworthy with a specific name (e.g.,
+     a version number), together with associated development metadata (e.g.,
+     author, timestamp, etc).
+
+   revision
+   commit
+   changeset
+
+     A point in time snapshot of the content of a directory, together with
+     associated development metadata (e.g., author, timestamp, log message,
+     etc).
+
+   scheduler
+
+     The component of the |swh| architecture dedicated to the management and
+     the prioritization of the many tasks.
+
+   snapshot
+
+     the state of all visible branches during a specific visit of an origin
+
+   storage
+   storage database
+
+     The main database of the |swh| platform in which the all the elements of
+     the :ref:`data-model` but the :term:`content` are stored as a :ref:`Merkle
+     DAG <swh-merkle-dag>`.
+
+   type of origin
+
+     Information about the kind of hosting, e.g., whether it is a forge, a
+     collection of repositories, an homepage publishing tarball, or a one shot
+     source code repository. For all kind of repositories please specify which
+     VCS system is in use (Git, SVN, CVS, etc.) object.
+
+   vault
+   vault service
+
+     User-facing service that allows to retrieve parts of the :term:`archive`
+     as self-contained bundles (e.g., individual releases, entire repository
+     snapshots, etc.)
+
+   visit
+
+     The passage of |swh| on a given :term:`origin`, to retrieve all source
+     code and metadata available there at the time. A visit object stores the
+     state of all visible branches (if any) available at the origin at visit
+     time; each of them points to a revision object in the archive. Future
+     visits of the same origin will create new visit objects, without removing
+     previous ones.
+
+
+
+.. _blob: https://en.wikipedia.org/wiki/Binary_large_object
+.. _DOI: https://www.doi.org
+.. _`persistent identifier`: https://docs.softwareheritage.org/devel/swh-model/persistent-identifiers.html#persistent-identifiers
+.. _`Archival Resource Key`: http://n2t.net/e/ark_ids.html
+.. _publish-subscribe: https://en.wikipedia.org/wiki/Publish%E2%80%93subscribe_pattern
diff --git a/docs/index.rst b/docs/index.rst
index 696e94b0..b572672e 100644
--- a/docs/index.rst
+++ b/docs/index.rst
@@ -116,6 +116,7 @@ Indices and tables
 * :ref:`modindex`
 * `URLs index <http-routingtable.html>`_
 * :ref:`search`
+* :ref:`glossary`
 
 
 .. ensure sphinx does not complain about index files not being included
-- 
GitLab