Compare revisions

bf2aa56d · bf2aa56d · bf2aa56d · bf2aa56d · bf2aa56d · 895f2092
--- a/docs/devel/archive-changelog.rst
+++ b/docs/devel/archive-changelog.rst
@@ -12,6 +12,81 @@ in this document for historical reasons.
 2023
 ----

+* **2023-04-18** Completed first archival of `annas-software.org gitea
+  forge <https://annas-software.org/>`_. Regular crawling of their
+  repositories enabled (tracking: `#4855
+  <https://gitlab.softwareheritage.org/swh/infra/sysadm-environment/-/issues/4855>`_)
+
+* **2023-04-18** Completed first archival of `Internet Systems Consortium's gitlab
+  forge <https://gitlab.isc.org/>`_. Regular crawling of their
+  repositories enabled (tracking: `#4854
+  <https://gitlab.softwareheritage.org/swh/infra/sysadm-environment/-/issues/4854>`_)
+
+* **2023-04-13** Completed first archival of `dev.sanctum.geek.nz cgit forge
+  <https://dev.sanctum.geek.nz/>`_. Regular crawling of their
+  repositories enabled (tracking: `#4852
+  <https://gitlab.softwareheritage.org/swh/infra/sysadm-environment/-/issues/4852>`_)
+
+* **2023-04-13** Completed first archival of `trueelena.org cgit forge
+  <https://git.trueelena.org/>`_. Regular crawling of their
+  repositories enabled (tracking: `#4851
+  <https://gitlab.softwareheritage.org/swh/infra/sysadm-environment/-/issues/4851>`_)
+
+* **2023-04-12** Completed first archival of `Epita infra gitlab forge
+  <https://gitlab.cri.epita.fr/cri>`_. Regular crawling of their
+  repositories enabled (tracking: `#4845
+  <https://gitlab.softwareheritage.org/swh/infra/sysadm-environment/-/issues/4845>`_)
+
+* **2023-04-11** Completed first archival of `INRAE MathNum department gitlab forge
+  <https://forgemia.inra.fr/>`_. Regular crawling of their
+  repositories enabled (tracking: `#4842
+  <https://gitlab.softwareheritage.org/swh/infra/sysadm-environment/-/issues/4842>`_)
+
+* **2023-04-11** Completed first archival of `Montpellier Bioinformatics Biodiversity
+  platform gitlab forge <https://gitlab.mbb.cnrs.fr/>`_. Regular crawling of their
+  repositories enabled (tracking: `#4843
+  <https://gitlab.softwareheritage.org/swh/infra/sysadm-environment/-/issues/4843>`_)
+
+* **2023-04-07** Completed first archival of `Garbaye gitea forge
+  <https://git.garbaye.fr/>`_. Regular crawling of their
+  repositories enabled (tracking: `#4841
+  <https://gitlab.softwareheritage.org/swh/infra/sysadm-environment/-/issues/4841>`_)
+
+* **2023-04-07** Completed first archival of `Alaryso's personal projects gitea forge
+  <https://git.alarsyo.net/>`_. Regular crawling of their
+  repositories enabled (tracking: `#4833
+  <https://gitlab.softwareheritage.org/swh/infra/sysadm-environment/-/issues/4833>`_)
+
+* **2023-04-05** Completed first archival of `Replicant git repositories (split
+  in 7 forges) <https://git.replicant.us/>`_. Regular crawling of their
+  repositories enabled (tracking: `#4685
+  <https://gitlab.softwareheritage.org/swh/infra/sysadm-environment/-/issues/4685>`_)
+
+* **2023-04-03** Completed first archival of `Software Heritage gitlab forge
+  <https://gitlab.softwareheritage.org/>`_. Regular crawling of their
+  repositories enabled (tracking: `#4683
+  <https://gitlab.softwareheritage.org/swh/infra/sysadm-environment/-/issues/4683>`_)
+
+* **2023-03-30** Completed first archival of `CodeAurora, The Global Gathering for
+  Mobile Open Source <https://source.codeaurora.org>`_. Regular crawling of its
+  repositories enabled (tracking: `#4813
+  <https://gitlab.softwareheritage.org/swh/infra/sysadm-environment/-/issues/4813>`_)
+
+* **2023-02-13** Completed first archival of `AFPY (Association Francophone Python)
+  git repositories <https://git.afpy.org>`_. Regular crawling of its
+  repositories enabled (tracking: `#4674
+  <https://gitlab.softwareheritage.org/swh/infra/sysadm-environment/-/issues/4674>`_)
+
+* **2023-01-05** Completed first archival of `the University of Stuttgart gitlab forge
+  <https://git.iws.uni-stuttgart.de/>`_. Regular crawling of
+  their repositories enabled (tracking: `#4712
+  <https://gitlab.softwareheritage.org/swh/infra/sysadm-environment/-/issues/4712>`_)
+
+* **2023-01-05** Completed first archival of `FFDN (Fédération des Fournisseurs
+  d'Accès Internet Associatifs) <https://code.ffdn.org/>`_. Regular crawling of
+  their repositories enabled (tracking: `#4687
+  <https://gitlab.softwareheritage.org/swh/infra/sysadm-environment/-/issues/4687>`_)
+
 * **2023-01-03** Completed first archival of `DeuxFleurs's gitea forge
  <https://git.deuxfleurs.fr>`_, as requested via `Add forge now`_.
  Regular crawling of its repositories enabled
@@ -25,6 +100,11 @@ in this document for historical reasons.
 2022
 ----

+* **2022-12-14** Completed first archival of `Université Gustave Eiffel git repositories
+  <https://gitlab.univ-eiffel.fr/>`_, as requested via `Add forge now`_.
+  Regular crawling of its repositories enabled
+  (tracking: `#4675 <https://gitlab.softwareheritage.org/infra/sysadm-environment/-/issues/4675>`_)
+
 * **2022-11-13** Completed first archival of `Jey Hess's git repositories
  <https://git.joeyh.name/>`_, as requested via `Add forge now`_.
  Regular crawling of its repositories enabled

--- a/docs/devel/bin/ln-sphinx-subprojects
+++ b/docs/devel/bin/ln-sphinx-subprojects
@@ -4,18 +4,29 @@ set -e
 create_links () {
    mkdir -p sources
    for pymodule in $(cd ../../../ && bin/ls-py-modules) ; do
-        if [ "$pymodule" = 'swh-docs' ] ; then
-            continue
-        fi
+        case "$pymodule" in
+            "swh-docs"|"swh-icinga-plugins") continue;;
+        esac
        if [ ! -e "$pymodule" -a -d "../../../${pymodule}/docs" ] ; then
            ln -s "../../../${pymodule}/docs" "$pymodule"
        fi
        if [ -f "$pymodule/images/Makefile" ] ; then
-            make -C $pymodule images
+	    echo "Build images in $pymodule"
+            make -I $PWD/../.. -C $pymodule images
        fi
        if [ -d "../../../${pymodule}/swh" ] ; then
            cp -r -f --symbolic-link $(realpath ../../../${pymodule}/swh/*) sources/swh/
+        elif [ -d "../../../${pymodule}/src/swh" ] ; then
+            cp -r -f --symbolic-link $(realpath ../../../${pymodule}/src/swh/*) sources/swh/
        fi
+        pushd ../../../${pymodule}
+        for EXT in rst md; do
+            if [ -f README.$EXT -a ! -f docs/README.$EXT ] ; then
+                ln -s ../README.$EXT docs
+                break
+            fi
+        done
+        popd
    done
 }

@@ -25,7 +36,7 @@ remove_links () {
            continue
        fi
        if [ -L "$pymodule" ] ; then
-            make -C $pymodule clean
+            make -I $PWD/../.. -C $pymodule clean
            rm "$pymodule"
        fi
    done

--- a/docs/devel/configuration.rst
+++ b/docs/devel/configuration.rst
+.. _cli-config:
+
+Configuration reference
+=======================
+
+.. highlight:: yaml
+
+|swh| components are all configured with a YAML file, made of multiple blocks,
+most of which describe how to connect to other components/services.
+
+Most services are composable, so they can be either instantiated locally or
+accessed through |swh|'s HTTP-based RPC protocol (``cls: remote``).
+
+For example, a possible configuration for swh-vault is::
+
+    graph:
+      url: http://graph.internal.softwareheritage.org:5009/
+
+    storage:
+      cls: pipeline
+      steps:
+      - cls: retry
+      - cls: remote
+        url: http://webapp.internal.staging.swh.network:5002/
+
+    objstorage:
+      cls: s3
+      compression: gzip
+      container_name: softwareheritage
+      path_prefix: content
+
+
+All URLs in this document are examples, see :ref:`service-url` for actual values.
+
+
+.. _cli-config-celery:
+
+celery
+------
+
+The :ref:`scheduler <swh-scheduler>` uses Celery for running some tasks. This
+configuration key is used for parameters passed directly to Celery, e.g. the URI
+of the RabbitMQ broker used for distribution of tasks, for both scheduler
+commands as well as Celery workers.
+
+The contents of this configuration key follow the `"lowercase settings" schema from
+Celery upstream
+<https://docs.celeryq.dev/en/stable/userguide/configuration.html#new-lowercase-settings>`_.
+
+Some default values can be found in :mod:`swh.scheduler.celery_backend.config`.
+
+
+.. _cli-config-graph:
+
+graph
+-----
+
+The :ref:`graph <swh-graph>` can only be accessed as a remote service, and
+its configuration block is a single key: ``url``, which is the URL to its
+HTTP endpoint; usually on port 5009 or at the path ``/graph/``.
+
+
+.. _cli-config-journal:
+
+journal
+-------
+
+The :ref:`journal <swh-journal>` can only be locally instantiated to consume
+directly from Kafka::
+
+    journal:
+      brokers:
+        - broker1.journal.softwareheritage.org:9093
+        - broker2.journal.softwareheritage.org:9093
+        - broker3.journal.softwareheritage.org:9093
+        - broker4.journal.softwareheritage.org:9093
+      prefix: swh.journal.objects
+      sasl.mechanism: "SCRAM-SHA-512"
+      security.protocol: "sasl_ssl"
+      sasl.username: "..."
+      sasl.password: "..."
+      privileged: false
+      group_id: "..."
+
+
+.. _cli-config-metadata_fetcher_credentials:
+
+metadata_fetcher_credentials
+----------------------------
+
+Nested dictionary of strings.
+
+The first level identifies a :term:`metadata <extrinsic metadata>` fetcher's name
+(eg. ``gitea`` ``github``), the second level the lister instance (eg. ``codeberg.org``
+or ``github``). The final level is a list of dicts containing the expected API
+credentials for the given instance of that fetcher. For example::
+
+    metadata_fetcher_credentials:
+      github:
+        github:
+        - username: ...
+          password: ...
+        - ...
+
+
+.. _cli-config-scheduler:
+
+scheduler
+---------
+
+The :ref:`scheduler <swh-scheduler>` can only be accessed as a remote service, and
+its configuration block is a single key: ``url``, which is the URL to its
+HTTP endpoint; usually on port 5008 or at the path ``/scheduler/``.::
+
+    scheduler:
+      cls: remote
+      url: http://saatchi.internal.softwareheritage.org:5008
+
+.. _cli-config-storage:
+
+storage
+-------
+
+Backends
+^^^^^^^^
+
+The :ref:`storage <swh-storage>` has four possible classes:
+
+* ``cassandra``, see :class:`swh.storage.cassandra.storage.CassandraStorage`::
+
+    storage:
+      cls: cassandra
+      hosts: [...]
+      keyspace: swh
+      port: 9042
+      journal_writer:
+        # ...
+      # ...
+
+* ``postgresql``, which takes a `libpq connection string <https://www.postgresql.org/docs/current/libpq-connect.html#LIBPQ-CONNSTRING>`_::
+
+    storage:
+      cls: postgresql
+      db: service=swh
+      journal_writer:
+        # ...
+
+  For optional arguments, see :class:`swh.storage.postgresql.storage.Storage`
+
+* ``memory``, which stores data in-memory instead of persisting it somewhere;
+  this should only be used for debugging::
+
+    storage:
+      cls: memory
+      journal_writer:
+        # ...
+
+* ``remote``, which takes a URL to a remote service's HTTP endpoint;
+  usually on port 5002 or at the path ``/storage/``::
+
+    storage:
+      cls: remote
+      url: http://webapp.internal.staging.swh.network:5002/
+
+
+The ``journal_writer`` key is optional. If provided, it will be used to write all
+additions to some sort of log (usually Kafka) before any write to the main database.
+
+:mod:`swh.journal.writer.kafka`::
+
+    cls: kafka
+    brokers:
+      - broker1.journal.softwareheritage.org:9093
+      - broker2.journal.softwareheritage.org:9093
+      - broker3.journal.softwareheritage.org:9093
+      - broker4.journal.softwareheritage.org:9093
+    prefix: swh.journal.objects
+    anonymize: true
+    client_id: ...
+    producer_config: ...
+
+:mod:`swh.journal.writer.stream`, which writes directly to a file
+(or stdout if set to ``-``)::
+
+    cls: stream
+    output_stream: /tmp/messages.msgpack
+
+:mod:`swh.journal.writer.inmemory`, which does not actually persist anywhere,
+and should only be used for tests::
+
+    cls: memory
+    anonymize: false
+
+
+Proxies
+^^^^^^^
+
+In addition to these three backends, "storage proxies" can be used and chained in order
+to change the behavior of accesses to it. They usually do not change the semantics,
+but perform optimizations such as batching calls, stripping redundant operations,
+and retrying on error.
+They are invoked through the special ``pipeline`` class, which takes as parameter
+a list of proxy configurations, ending with a backend configuration as seen above::
+
+    storage:
+      cls: pipeline
+      steps:
+        - cls: buffer
+          min_batch_size:
+            content: 10000
+            directory: 5000
+        - cls: filter
+        - cls: retry
+        - cls: remote
+          url: http://webapp1.internal.softwareheritage.org:5002/
+
+which is equivalent to this nested configuration::
+
+    storage:
+      cls: buffer
+      min_batch_size:
+        content: 10000
+        directory: 5000
+      storage:
+        cls: filter
+        storage:
+          cls: retry
+          storage:
+            cls: remote
+            url: http://webapp1.internal.softwareheritage.org:5002/
+
+See :mod:`swh.storage.proxies` for the list of proxies.
--- a/docs/devel/contributing/git-style-guide.rst
+++ b/docs/devel/contributing/git-style-guide.rst
@@ -12,7 +12,7 @@ Good commit messages are essentials in a project as large as Software Heritage.
 They are crucial to those who will review your changes and important to anyone else
 who will interact with the codebase at a later time. This includes your future self!

-Make sure to follow the recommandations from `How to write a Git
+Make sure to follow the recommendations from `How to write a Git
 commit message <http://chris.beams.io/posts/git-commit/>`_

 Closing or referencing issues

--- a/docs/devel/contributing/gitlab.rst
+++ b/docs/devel/contributing/gitlab.rst
@@ -96,6 +96,7 @@ Run the script by
   $ cd swh-environment
   $ bin/update   # Used to update all the repos under the environment to their latest version
   $ bin/fork-gitlab-repo -g swh swh-objstorage
+   $ bin/fork-gitlab-repo -g swh .  # To contribute to swh-environment itself (eg. add a repository)

 This will create a new fork of the SWH repository in your namespace and
 add a jenkins user to perform automatic builds. You can view the forked
@@ -152,3 +153,42 @@ If you plan to
 you may also want to
 `upload your GPG key <https://gitlab.softwareheritage.org/-/profile/gpg_keys>`__
 as well.
+
+Make a release
+--------------
+
+.. warning:: Only staff members are allowed to make new releases
+
+Releases are made automatically by Jenkins when a tag is pushed to a module repository.
+
+We are using the `semantic versioning <https://semver.org>`_ scheme to name our
+releases, please ensure that the name of your tag correctly indicates its compatibility
+with the previous version.
+
+Tags themselves should be signed and provide a meaningful annotation with, for example,
+an itemized summary of changes (rather than rehashing the whole git log), breaking
+changes in a separate section, etc.
+
+First, create the tag:
+
+.. code-block::
+
+   # get the latest version number
+   git describe --tags  # returns v1.2.3-x-yyy
+   # list changes between master and v1.2.3
+   git range-diff v1.2.3...master
+   # use the output to write your annotation and create a new signed tag, here for a
+   # minor version upgrade
+   git tag -a -s v1.3.0
+   # push it
+   git push origin tag v1.3.0
+
+Then you'll see jobs on Jenkins (Incoming tag, GitLab builds, Upload to PyPI)
+indicating that the release process is ongoing.
+
+Next, deployment container images are updated.
+
+And finally a new merge request will automatically be created in
+`Helm charts for swh packages`_ so that the devops team can proceed with deployment.
+
+.. _Helm charts for swh packages: https://gitlab.softwareheritage.org/swh/infra/sysadm-environment
--- a/docs/devel/contributing/phabricator.rst
+++ b/docs/devel/contributing/phabricator.rst
-:orphan:
-
-.. highlight:: bash
-
-.. admonition:: Intended audience
-   :class: important
-
-   Contributors
-
-Important
-=========
-
-We have moved our development from Phabricator to a GitLab instance at
-https://gitlab.softwareheritage.org/
-
-The content below is no longer relevant and will be updated soon.
-
-Submitting patches
-==================
-
-`Phabricator`_ is the tool that Software Heritage uses as its
-coding/collaboration forge.
-
-Software Heritage's Phabricator instance can be found at
-https://forge.softwareheritage.org/
-
-.. _Phabricator: http://phabricator.org/
-
-Code Review in Phabricator
--------------------------
-
-We use the Differential application of Phabricator to perform
-:ref:`code reviews <code-review>` in the context of Software Heritage.
-
-* we use Git and ``history.immutable=true``
-  (but beware as that is partly a Phabricator misnomer, read on)
-* when code reviews are required, developers will be allowed to push
-  directly to master once an accepted Differential diff exists
-
-Configuration
-+++++++++++++
-
-.. _arcanist-configuration:
-
-Arcanist configuration
-^^^^^^^^^^^^^^^^^^^^^^
-
-Authentication
-~~~~~~~~~~~~~~
-
-First, you should install Arcanist and authenticate it to Phabricator::
-
-   sudo apt-get install arcanist
-   arc set-config default https://forge.softwareheritage.org/
-   arc install-certificate
-
-arc will prompt you to login into Phabricator via web
-(which will ask your personal Phabricator credentials).
-You will then have to copy paste the API token from the web page to arc,
-and hit Enter to complete the certificate installation.
-
-Immutability
-~~~~~~~~~~~~
-
-When using git, Arcanist by default mess with the local history,
-rewriting commits at the time of first submission.
-To avoid that we use so called `history immutability`_
-
-.. _history immutability: https://secure.phabricator.com/book/phabricator/article/arcanist_new_project/#history-mutability-git
-
-To that end, you shall configure your ``arc`` accordingly::
-
-   arc set-config history.immutable true
-
-Note that this does **not** mean that you are forbidden to rewrite
-your local branches (e.g., with ``git rebase``).
-Quite the contrary: you are encouraged to locally rewrite branches
-before pushing to ensure that commits are logically separated
-and your commit history easy to bisect.
-The above setting just means that *arc* will not rewrite commit
-history under your nose.
-
-Enabling ``git push`` to our forge
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-
-The way we've configured our review setup for continuous integration
-needs you to configure git to allow pushes to our forge.
-There's two ways you can do this : setting a ssh key to push over ssh,
-or setting a specific password for git pushes over https.
-
-SSH key for pushes
-~~~~~~~~~~~~~~~~~~
-
-In your forge User settings page (On the top right, click on your avatar,
-then click *Settings*), you have access to a *Authentication* >
-*SSH Public Keys* section (Direct link:
-``hxxps://forge.softwareheritage.org/settings/user/<your username>/page/ssh/``).
-You then have the option to upload a SSH public key,
-which will authenticate your pushes.
-
-You then need to configure ssh/git to use that key pair,
-for instance by editing the ``~/.ssh/config`` file.
-
-Finally, you should configure git to push over ssh when pushing to
-https://forge.softwareheritage.org, by running the following command::
-
-   git config --global url.git@forge.softwareheritage.org:.pushInsteadOf https://forge.softwareheritage.org
-
-This lets git know that it should use ``git@forge.softwareheritage.org:``
-as a base url when pushing repositories cloned from
-forge.softwareheritage.org over https.
-
-VCS password for pushes
-~~~~~~~~~~~~~~~~~~~~~~~
-
-.. warning:: Please, only use this if you're completely unable to use ssh.
-
-As a fallback to the ssh setup, you have the option of setting a VCS password. This
-password, *separate from your account password*, allows Phabricator to authenticate your
-uploads over HTTPS.
-
-In your forge User settings page (On the top right, click on your avatar, then click
-*Settings*), you need to use the *Authentication* > *VCS Password* section to set your
-VCS password (Direct link: ``hxxps://forge.softwareheritage.org/settings/user/<your
-username>/page/vcspassword/``).
-
-If you still get a 403 error on push, this means you need a forge administrator to
-enable HTTPS pushes for the repository (which wasn't done by default in historical
-repositories). Please drop by on IRC and let us know!
-
-Workflow
-++++++++
-
-* work in a feature branch: ``git checkout -b my-feat``
-* initial review request: hack/commit/hack/commit ;
-  ``arc diff origin/master``
-* react to change requests: hack/commit/hack/commit ;
-  ``arc diff --update Dxx origin/master``
-* landing change: ``git checkout master ; git merge my-feat ; git push``
-
-Starting a new feature and submit it for review
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-
-Use a **one branch per feature** workflow, with well-separated
-**logical commits** (:ref:`following those conventions <git-style-guide>`).
-Please open one diff per logical commit to keep the diff size to a minimum.
-
-.. code-block::
-
-   git checkout -b my-shiny-feature
-   ... hack hack hack ...
-   git commit -m 'architecture skeleton for my-shiny-feature'
-   ... hack hack hack ...
-   git commit -m 'my-shiny-feature: implement module foo'
-   ... etc ...
-
-Please, follow the
-To **submit your code for review** the first time::
-
-   arc diff origin/master
-
-arc will prompt for a **code review message**. Provide the following information:
-
-* first line: *short description* of the overall work
-  (i.e., the feature you're working on).
-  This will become the title of the review
-* *Summary* field (optional): *long description* of the overall work;
-  the field can continue in subsequent lines, up to the next field.
-  This will become the "Summary" section of the review
-* *Test Plan* field (optional): write here if something special is needed
-  to test your change
-* *Reviewers* field (optional): the (Phabricator) name(s) of
-  desired reviewers.
-  If you don't specify one (recommended) the default reviewers will be chosen
-* *Subscribers* field (optional): the (Phabricator) name(s) of people that
-  will be notified about changes to this review request.
-  In most cases it should be left empty
-
-For example::
-
-   mercurial loader
-
-   Summary: first stab at a mercurial loader (T329)
-
-   The implementation follows the plan detailed in F2F discussion with @foo.
-
-   Performances seem decent enough for a first trial (XXX seconds for YYY repository
-   that contains ZZZ patches).
-
-   Test plan:
-
-   Reviewers:
-
-   Subscribers: foo
-
-After completing the message arc will submit the review request
-and tell you its number and URL::
-
-   [...]
-   Created a new Differential revision:
-           Revision URI: https://forge.softwareheritage.org/Dxx
-
-.. _arc-update:
-
-Updating your branch to reflect requested changes
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-
-Your feature might get accepted as is, YAY!
-Or, reviewers might request changes; no big deal!
-
-Use the Differential web UI to follow-up to received comments, if needed.
-
-To implement requested changes in the code, hack on your branch as usual by:
-
-* adding new commits, and/or
-* rewriting old commits with git rebase (to preserve a nice, easy to bisect history)
-* pulling on master and rebasing your branch against it if meanwhile someone
-  landed commits on master:
-
-.. code-block::
-
-   git checkout master
-   git pull
-   git checkout my-shiny-feature
-   git rebase master
-
-
-When you're ready to **update your review request**::
-
-   arc diff --update Dxx HEAD~
-
-Arc will prompt you for a message: **describe what you've changed
-w.r.t. the previous review request**, free form.
-This means you should not repeat the title of your diff (which is
-often the default if you squashed/amended your commits)
-
-Your message will become the changelog entry in Differential
-for this new version of the diff, and will help reviewers
-understand what changes you made since they last read your diff.
-
-Differential only care about the code diff, and not about the commits
-or their order.
-Therefore each "update" can be a completely different series of commits,
-possibly rewritten from the previous submission.
-
-Dependencies between diffs
-^^^^^^^^^^^^^^^^^^^^^^^^^^
-
-Note that you can manage diff dependencies within the same module
-with the following keyword in the diff description::
-
-   Depends on Dxx
-
-That allows to keep a logical view in your diff.
-It's not strictly necessary (because the tooling now deals with it properly)
-but it might help reviewers or yourself to do so.
-
-Landing your change onto master
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-
-Once your change has been approved in Differential,
-you will be able to land it onto the master branch.
-
-Before doing so, you're encouraged to **clean up your git commit history**,
-reordering/splitting/merging commits as needed to have separate
-logical commits and an easy to bisect history.
-Update the diff :ref:`following the prior section <arc-update>`
-(It'd be good to let the CI build finish to make sure everything is still green).
-
-Once you're happy you can **push to origin/master** directly, e.g.::
-
-   git checkout master
-   git merge --ff-only my-shiny-feature
-   git push
-
-``--ff-only`` is optional, and makes sure you don't unintentionally
-create a merge commit.
-
-Optionally you can then delete your local feature branch::
-
-   git branch -d my-shiny-feature
-
-Reviewing locally / landing someone else's changes
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-
-You can do local reviews of code with arc patch::
-
-   arc patch Dxyz
-
-This will create a branch **arcpatch-Dxyz** containing the changes
-on your local checkout.
-
-You can then merge those changes upstream with::
-
-   git checkout master
-   git merge --ff arcpatch-Dxyz
-   git push origin master
-
-or, alternatively::
-
-   arc land --squash
-
-
-See also
--------
-
-* :ref:`code-review` for guidelines on how code is reviewed
-  when developing for Software Heritage
--- a/docs/devel/contributing/sphinx.rst
+++ b/docs/devel/contributing/sphinx.rst
@@ -3,7 +3,12 @@
 Sphinx gotchas
 ==============

-Here is a list of common gotchas when formatting Python docstrings for `Sphinx <https://www.sphinx-doc.org/en/stable/>`_ and the `Napoleon <https://www.sphinx-doc.org/en/stable/ext/napoleon.html>`_ style.
+Here is a list of common gotchas when formatting Python docstrings for `Sphinx
+<https://www.sphinx-doc.org/en/stable/>`_ and the `Napoleon
+<https://www.sphinx-doc.org/en/stable/ext/napoleon.html>`_ style.
+
+.. highlight:: rst
+

 Sphinx
 ------
@@ -11,12 +16,12 @@ Sphinx
 Lists
 +++++

-All sorts of `lists <https://www.sphinx-doc.org/en/stable/rest.html#lists-and-quote-like-blocks>`_
-require an empty line before the first bullet and after the last one,
-to be properly interpreted as list.
-No indentation is required for list elements w.r.t. surrounding text,
-and line continuations should be indented like the first character
-after the bullet.
+All sorts of `lists
+<https://www.sphinx-doc.org/en/stable/rest.html#lists-and-quote-like-blocks>`_
+require an empty line before the first bullet and after the last one, to be
+properly interpreted as list. No indentation is required for list elements
+w.r.t. surrounding text, and line continuations should be indented like the
+first character after the bullet.

 Bad::

@@ -177,10 +182,13 @@ Good::

   Args:
       foo (int): first argument
+
       bar: second argument, which happen to have a fairly
-           long description of what it does
+            long description of what it does
+
       baz (bool): third argument

+
 Returns
 +++++++

@@ -232,6 +240,7 @@ Good::
       ValueError: if you botched it
       RuntimeError: if we botched it

+
 See also
 --------


--- a/docs/devel/contributing/tutorial-docs-contribution.rst
+++ b/docs/devel/contributing/tutorial-docs-contribution.rst
@@ -99,7 +99,7 @@ and/or `Daniele Procida's presentation <https://www.youtube.com/watch?v=t4vKPhjc

 .. note::
    We propose using in the following naming scheme depending on the type of document:
-        * Tutorial: Tutorial name]
+        * Tutorial: [Tutorial name]
        * How to ...
        * Reference: [Reference name]
        * Explanation: [Explanation name]

--- a/docs/devel/developer-setup.rst
+++ b/docs/devel/developer-setup.rst
@@ -22,58 +22,152 @@ Install required dependencies
 -----------------------------

 Software Heritage requires some dependencies that are usually packaged by your
-package manager. On Debian/Ubuntu-based distributions::
-
-  sudo apt install lsb-release wget apt-transport-https
-  sudo wget https://www.postgresql.org/media/keys/ACCC4CF8.asc -O /etc/apt/trusted.gpg.d/postgresql.asc
-  echo "deb https://apt.postgresql.org/pub/repos/apt/ $(lsb_release -cs)-pgdg main" | sudo tee -a /etc/apt/sources.list.d/pgdg.list
-  sudo wget https://downloads.apache.org/cassandra/KEYS -O /etc/apt/trusted.gpg.d/cassandra.asc
-  echo "deb https://downloads.apache.org/cassandra/debian 40x main" | sudo tee -a /etc/apt/sources.list.d/cassandra.list
-  sudo apt update
-  sudo apt install \
-      build-essential pkg-config lzip rsync \
-      python3 python3-pip python3-venv virtualenvwrapper \
-      libpython3-dev libsystemd-dev libsvn-dev libffi-dev librdkafka-dev \
-      fuse3 libfuse3-dev libcmph-dev libleveldb-dev \
-      git myrepos \
-      graphviz plantuml inkscape \
-      postgresql libpq-dev cassandra
-
-.. Note:: Python 3.7 or newer is required
+package manager.
+
+.. tab-set::
+
+  .. tab-item:: Debian/Ubuntu
+
+    .. code-block:: console
+
+      sudo apt install lsb-release wget apt-transport-https
+
+      sudo wget https://www.postgresql.org/media/keys/ACCC4CF8.asc -O /etc/apt/trusted.gpg.d/postgresql.asc
+
+      echo "deb https://apt.postgresql.org/pub/repos/apt/ $(lsb_release -cs)-pgdg main" | sudo tee -a /etc/apt/sources.list.d/pgdg.list
+
+      sudo wget https://downloads.apache.org/cassandra/KEYS -O /etc/apt/trusted.gpg.d/cassandra.asc
+
+      echo "deb https://debian.cassandra.apache.org 41x main" | sudo tee -a /etc/apt/sources.list.d/cassandra.list
+
+      sudo apt update
+
+      sudo apt install \
+          build-essential pkg-config lzip rsync \
+          python3 python3-pip python3-venv virtualenvwrapper \
+          libpython3-dev libsystemd-dev libsvn-dev libffi-dev librdkafka-dev \
+          fuse3 libfuse3-dev libcmph-dev libleveldb-dev \
+          git myrepos \
+          graphviz plantuml inkscape \
+          postgresql libpq-dev cassandra redis-server
+
+  .. tab-item:: Fedora
+
+    .. code-block:: console
+
+      sudo dnf install java-17-openjdk-headless
+
+      # Make sure the path is correct. If not, choose the alternative corresponding to java-17
+      sudo update-alternatives --set java /usr/lib/jvm/java-17-openjdk-17.0.13.0.11-3.fc41.x86_64/bin/java
+
+      sudo rpm --import https://downloads.apache.org/cassandra/KEYS
+
+      echo "[cassandra]
+      name=Apache Cassandra
+      baseurl=https://redhat.cassandra.apache.org/50x/
+      gpgcheck=1
+      repo_gpgcheck=0
+      gpgkey=https://downloads.apache.org/cassandra/KEYS" | sudo tee /etc/yum.repos.d/cassandra.repo
+
+      sudo dnf -y update
+
+      sudo dnf -y install cassandra
+
+      sudo dnf -y group install c-development
+
+      sudo dnf -y install \
+          pkgconf-pkg-config lzip rsync python3.11 python3-virtualenvwrapper \
+          python3.11-devel systemd-devel subversion-devel libffi-devel \
+          librdkafka fuse3 fuse3-devel leveldb-devel git myrepos graphviz \
+          plantuml inkscape postgresql-server postgresql-contrib libpq \
+          libpq-devel redis
+
+      # You will also need to install CMPH manually, as it is not (yet?) included in the Fedora repositories
+      wget https://sourceforge.net/projects/cmph/files/v2.0.2/cmph-2.0.2.tar.gz
+      tar -xvf cmph-2.0.2.tar.gz
+      cd cmph-2.0.2
+      ./configure && make && sudo make install
+      cd ..
+
+.. Note:: Python 3.10 or newer is required

 This installs basic system utilities, Python library dependencies, development tools,
 documentation tools and our main database management systems.

 Cassandra and PostgreSQL will be started by tests when they need it, so you
-don't need them started globally (this will save you some RAM)::
+don't need them started globally (this will save you some RAM):
+
+.. code-block:: console

  sudo systemctl disable --now cassandra postgresql

-If you intend to hack on the frontend part of |swh| Web Applications, you will also
-need to have ``nodejs >= 14`` in your development environment. If the version in your
-Debian-based distribution is lower, you can install node 14 using these commands::
+You must also have ``nodejs >= 20`` in your development environment.
+You can install node 18 using these commands:
+
+.. tab-set::
+
+  .. tab-item:: Debian/Ubuntu
+
+    .. code-block:: console
+
+      curl -fsSL https://deb.nodesource.com/setup_20.x | sudo bash -
+      sudo apt install -y nodejs
+
+  .. tab-item:: Fedora
+
+    .. code-block:: console

-  sudo wget https://deb.nodesource.com/gpgkey/nodesource.gpg.key -O /etc/apt/trusted.gpg.d/nodesource.asc
-  echo "deb https://deb.nodesource.com/node_14.x $(lsb_release -cs) main" | sudo tee -a /etc/apt/sources.list.d/nodesource.list
-  sudo apt update
-  sudo apt install nodejs
+       sudo dnf -y install nodejs

-Also related to Web Applications development, |swh| uses the ``yarn`` package manager
-to retrieve frontend dependencies and development tools. It is recommended to install its
-latest classic version using these commands::
+|swh| uses the ``yarn`` package manager to retrieve frontend dependencies and development tools.
+You must install its latest classic version using this command:

-  sudo wget https://dl.yarnpkg.com/debian/pubkey.gpg -O /etc/apt/trusted.gpg.d/yarn.asc
-  echo "deb https://dl.yarnpkg.com/debian/ stable main" | sudo tee /etc/apt/sources.list.d/yarn.list
-  sudo apt update
-  sudo apt install yarn
+.. tab-set::
+
+  .. tab-item:: Debian/Ubuntu
+
+    .. code-block:: console
+
+       sudo corepack enable
+
+  .. tab-item:: Fedora
+
+    .. code-block:: console
+
+       sudo dnf -y install yarnpkg

 If you intend to work on |swh| archive search features, Elasticsearch must also be
-present in your development environment. Proceed as follows to install it::
+present in your development environment. Proceed as follows to install it:
+
+.. tab-set::
+
+  .. tab-item:: Debian/Ubuntu
+
+    .. code-block:: console
+
+      sudo wget https://artifacts.elastic.co/GPG-KEY-elasticsearch -O /etc/apt/trusted.gpg.d/elasticsearch.asc
+
+      echo "deb https://artifacts.elastic.co/packages/7.x/apt stable main" | sudo tee -a /etc/apt/sources.list.d/elasticsearch.list
+
+      sudo apt update
+
+      sudo apt install elasticsearch
+
+  .. tab-item:: Fedora

-  sudo wget https://artifacts.elastic.co/GPG-KEY-elasticsearch -O /etc/apt/trusted.gpg.d/elasticsearch.asc
-  echo "deb https://artifacts.elastic.co/packages/7.x/apt stable main" | sudo tee -a /etc/apt/sources.list.d/elasticsearch.list
-  sudo apt update
-  sudo apt install elasticsearch
+    .. code-block:: console
+
+      echo "[elasticsearch]
+      name=Elasticsearch repository for 8.x packages
+      baseurl=https://artifacts.elastic.co/packages/8.x/yum
+      gpgcheck=1
+      gpgkey=https://artifacts.elastic.co/GPG-KEY-elasticsearch
+      autorefresh=1
+      type=rpm-md" | sudo tee /etc/yum.repos.d/elasticsearch.repo
+
+      sudo dnf -y update
+
+      sudo dnf -y install elasticsearch

 If you intend to build the full |swh| documentation, the ``postgresql-autodoc`` utility must
 also be installed, follow these `instructions <https://github.com/cbbrowne/autodoc#installation>`_
@@ -84,30 +178,34 @@ to do so.
 Checkout the source code
 ------------------------

-Clone the |swh| environment repository::
+Clone the |swh| environment repository:
+
+.. code-block:: console

    ~$ git clone https://gitlab.softwareheritage.org/swh/devel/swh-environment.git
    [...]
    ~$ cd swh-environment
    ~/swh-environment$

-Create a virtualenv::
+Create a virtualenv:
+
+.. code-block:: console

    ~/swh-environment$ source /usr/share/virtualenvwrapper/virtualenvwrapper.sh
    ~/swh-environment$ mkvirtualenv -p /usr/bin/python3 -a $PWD swh
    [...]
    (swh) ~/swh-environment$

-Checkout all the swh packages source repositories::
+Checkout all the swh packages source repositories:
+
+.. code-block:: console

    (swh) ~/swh-environment$ pip install pre-commit
    (swh) ~/swh-environment$ ./bin/update

-Use the same mypy version our tox containers use::
-
-    (swh) ~/swh-environment$ pip install mypy==1.0.1
+In the future you can re-activate the created virtualenv with:

-In the future you can re-activate the created virtualenv with::
+.. code-block:: console

   $ workon swh
   (swh) ~/swh-environment$
@@ -122,19 +220,11 @@ In the future you can re-activate the created virtualenv with::
 .. _pipenv: https://pipenv.readthedocs.io/


-Install all the swh packages (in development mode, with testing dependencies)::
+Install all the swh packages (in development mode, with testing dependencies):

-    (swh) ~/swh-environment$ bin/install
-
-.. note::
+.. code-block:: console

-   If you experience issues with :program:`pip` dependency resolution, try with
-   ``bin/install --use-deprecated=legacy-resolver`` (the flag will be passed on
-   to ``pip install``). The same flag can also be set globally in
-   :file:`~/.config/pip/pip.conf`::
-
-      [install]
-      use-deprecated=legacy-resolver
+    (swh) ~/swh-environment$ bin/install


 Executing unit tests
@@ -151,7 +241,9 @@ tox_. The main difference between these 2 test execution environments is:
  current virtualenv, installed from the git repositories: you test your
  modification against the HEAD of every swh package.

-For example, running unit tests for the swh-loader-git_ package::
+For example, running unit tests for the swh-loader-git_ package:
+
+.. code-block:: console

    (swh) ~/swh-environment$ cd swh-loader-git
    (swh) ~/swh-environment/swh-loader-git$ pytest
@@ -171,7 +263,9 @@ For example, running unit tests for the swh-loader-git_ package::
 	[...]
 	================== 25 passed, 12 warnings in 6.66 seconds ==================

-Running the same test, plus code linting and static analysis, using tox::
+Running the same test, plus code linting and static analysis, using tox:
+
+.. code-block:: console

    (swh) ~/swh-environment/swh-loader-git$ tox
    GLOB sdist-make: ~/swh-environment/swh-loader-git/setup.py
@@ -235,7 +329,9 @@ Running the same test, plus code linting and static analysis, using tox::
 Beware that some swh packages require a postgresql server properly configured
 to execute the tests. In this case, you will want to use pifpaf_, which will
 spawn a temporary instance of postgresql, to encapsulate the call to pytest.
-For example, running pytest in the swh-core package::
+For example, running pytest in the swh-core package:
+
+.. code-block:: console

    (swh) ~/swh-environment$ cd swh-core
 	(swh) ~/swh-environment/swh-core$ pifpaf run postgresql -- pytest

--- a/docs/devel/faq/index.rst
+++ b/docs/devel/faq/index.rst
@@ -29,7 +29,7 @@ specific skills needed to work on any topic of your interest.
 What are the minimum system requirements (hardware/software) to run SWH locally?
 --------------------------------------------------------------------------------

-Python 3.7 or newer is required. See the :ref:`developer setup documentation
+Python 3.10 or newer is required. See the :ref:`developer setup documentation
 <developer-setup>` for more details.


@@ -126,8 +126,8 @@ Getting sample datasets
 Is there a way to connect to SWH archived (production) database from my local machine?
 --------------------------------------------------------------------------------------

-We provide the archive as a dataset on public clouds, see the :ref:`swh-dataset
-documentation <swh-dataset>`. We can
+We provide the archive as a dataset on public clouds, see the :ref:`swh-export
+documentation <swh-export>`. We can
 also provide read access to one of the main databases on request, `contact us`_.

 .. _faq_error_bugs:

--- a/docs/devel/getting-started.rst
+++ b/docs/devel/getting-started.rst
@@ -23,20 +23,30 @@ Dependencies
 The easiest way to run a Software Heritage instance is to use Docker.
 Please `ensure that you have a working recent installation first
 <https://docs.docker.com/engine/install/>`_ (including the
-`Compose <https://docs.docker.com/compose/>`_ plugin.
+`Compose <https://docs.docker.com/compose/>`_ plugin).

 Quick start
 -----------

 First, retrieve Software Heritage development environment to get the
-Docker configuration::
+Docker configuration:

-   ~$ git clone https://gitlab.softwareheritage.org/swh/devel/swh-environment.git
-   ~$ cd swh-environment/docker
+.. code-block:: console

-Then, start containers::
+   ~$ git clone https://gitlab.softwareheritage.org/swh/devel/docker.git swh-docker
+   ~$ cd swh-docker

-   ~/swh-environment/docker$ docker compose up -d
+.. note::
+
+   If you intend to hack on Software Heritage source code and test your changes with docker,
+   you should rather follow the instructions in section :ref:`checkout-source-code` to
+   install the full Software Heritage development environment that includes Docker configuration.
+
+Then, start containers:
+
+.. code-block:: console
+
+   ~/swh-docker$ docker compose up -d
   [...]
   Creating docker_amqp_1               ... done
   Creating docker_zookeeper_1          ... done
@@ -46,9 +56,11 @@ Then, start containers::
   [...]

 This will build Docker images and run them. Check everything is running
-fine with::
+fine with:
+
+.. code-block:: console

-   ~/swh-environment/docker$ docker compose ps
+   ~/swh-docker$ docker compose ps
                            Name                                       Command               State                                      Ports
   -----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
   docker_amqp_1                                    docker-entrypoint.sh rabbi ...   Up      15671/tcp, 0.0.0.0:5018->15672/tcp, 25672/tcp, 4369/tcp, 5671/tcp, 5672/tcp
@@ -63,9 +75,11 @@ dependency-related problems. If some containers failed to start, just
 run the ``docker compose up -d`` command again.

 If a container really refuses to start properly, you can check why using
-the ``docker compose logs`` command. For example::
+the ``docker compose logs`` command. For example:

-   ~/swh-environment/docker$ docker compose logs swh-lister
+.. code-block:: console
+
+   ~/swh-docker$ docker compose logs swh-lister
   Attaching to docker_swh-lister_1
   [...]
   swh-lister_1                      | Processing /src/swh-scheduler
@@ -77,19 +91,37 @@ the ``docker compose logs`` command. For example::
  For details on the various Docker images and how to work with them,
  see the full :ref:`docker-environment` documentation.

-Once all containers are running, you can use the web interface by
-opening http://localhost:5080/ in your web browser.
+Once all containers are running, you can use the web interface by opening
+http://localhost:<nginx-port>/ in your web browser. ``<nginx-port>`` is the
+port on which nginx is exposed to the host. By default, it is randomly
+attributed by docker. Use:
+
+.. code-block:: console
+
+   ~/swh-docker$ docker compose port nginx 80
+
+To find which port is actually used.
+
+.. note::
+
+   Please read the "Exposed Ports" section of the README file in the
+   `swh-docker`_ repository for more details and options on this topic.
+
+.. _`swh-docker`:  https://gitlab.softwareheritage.org/swh/devel/docker.git
+

 At this point, the archive is empty and needs to be filled with some
 content. The simplest way to start loading software is to use the
 *Save Code Now* feature of the archive web interface:

-  http://localhost:5080/browse/origin/save/
+  http://localhost:<nginx-port>/browse/origin/save/

 You can also use the command line interface to inject code. For
-example to retrieve projects hossted on the https://0xacab.org GitLab forge::
+example to retrieve projects hossted on the https://0xacab.org GitLab forge:
+
+.. code-block:: console

-   ~/swh-environment/docker$ docker compose exec swh-scheduler \
+   ~/swh-docker$ docker compose exec swh-scheduler \
       swh scheduler task add list-gitlab-full \
         -p oneshot url=https://0xacab.org/api/v4

@@ -108,17 +140,21 @@ This task will scrape the forge’s project list and register origins to the sch
 This takes at most a couple of minutes.

 Then, you must tell the scheduler to create loading tasks for these origins.
-For example, to create tasks for 100 of these origins::
+For example, to create tasks for 100 of these origins:

-   ~/swh-environment/docker$ docker compose exec swh-scheduler \
+.. code-block:: console
+
+   ~/swh-docker$ docker compose exec swh-scheduler \
       swh scheduler origin schedule-next git 100

 This will take a bit of time to complete.

 To increase the speed at which git repositories are imported, you can
-spawn more ``swh-loader-git`` workers::
+spawn more ``swh-loader-git`` workers:
+
+.. code-block:: console

-   ~/swh-environment/docker$ docker compose exec swh-scheduler \
+   ~/swh-docker$ docker compose exec swh-scheduler \
       celery status
   listers@50ac2185c6c9: OK
   loader@b164f9055637: OK
@@ -126,18 +162,20 @@ spawn more ``swh-loader-git`` workers::
   vault@c9fef1bbfdc1: OK

   4 nodes online.
-   ~/swh-environment/docker$ docker compose exec swh-scheduler \
+   ~/swh-docker$ docker compose exec swh-scheduler \
       celery control pool_grow 3 -d loader@b164f9055637
   -> loader@b164f9055637: OK
           pool will grow
-   ~/swh-environment/docker$ docker compose exec swh-scheduler \
+   ~/swh-docker$ docker compose exec swh-scheduler \
       celery inspect -d loader@b164f9055637 stats | grep prefetch_count
          "prefetch_count": 4

 Now there are 4 workers ingesting git repositories. You can also
-increase the number of ``swh-loader-git`` containers::
+increase the number of ``swh-loader-git`` containers:
+
+.. code-block:: console

-   ~/swh-environment/docker$ docker compose up -d --scale swh-loader=4
+   ~/swh-docker$ docker compose up -d --scale swh-loader=4
   [...]
   Creating docker_swh-loader_2        ... done
   Creating docker_swh-loader_3        ... done
@@ -153,24 +191,28 @@ Heritage. When new versions of these components are released, the docker
 image will not be automatically updated. In order to update all Software
 Heritage components to their latest version, the docker image needs to
 be explicitly rebuilt by issuing the following command from within the
-``docker`` directory::
+``docker`` directory:

-   ~/swh-environment/docker$ docker build --no-cache -t swh/stack .
+.. code-block:: console
+
+   ~/swh-docker$ docker build --no-cache -t swh/stack .

 Monitor your local installation
 -------------------------------

 You can monitor your local installation by looking at:

- http://localhost:5080/rabbitmq to access the rabbitmq dashboard (guest/guest),
- http://localhost:5080/grafana to explore the platform's metrics (admin/admin),
+- http://localhost:<nginx-port>/rabbitmq to access the rabbitmq dashboard (guest/guest),
+- http://localhost:<nginx-port>/grafana to explore the platform's metrics (admin/admin),

 Shut down your local installation
 ---------------------------------

-To shut down your SoftWare Heritage, just run::
+To shut down your SoftWare Heritage, just run:
+
+.. code-block:: console

-   ~/swh-environment/docker$ docker compose down
+   ~/swh-docker$ docker compose down

 Hacking the archive
 -------------------

--- a/docs/devel/getting-started/api.rst
+++ b/docs/devel/getting-started/api.rst
--- a/docs/devel/glossary.rst
+++ b/docs/devel/glossary.rst
@@ -22,6 +22,14 @@ Glossary
     An artifact is one of many kinds of tangible by-products produced during
     the development of software.

+   bulk on-demand archival
+
+     A |swh| service allowing a partner to ask the archival for (possibly
+     large) number of origins. It consists in an authenticated API endpoint
+     allowing the user to upload a list of origins (as a CSV file) to be
+     ingested as soon as possible. The service allows to get feedback from the
+     |swh| archive about the ongoing ingestion process.
+
   content
   blob

@@ -94,6 +102,13 @@ Glossary
     add new file contents int :term:`object storage` and repository structure
     in the :term:`storage database`).

+   loading task
+
+     A celery_ task doing the actual ingestion process; its implementation is
+     provided by a :term:`loader`, and it is executed by celery_ workers. They
+     used to be backed by Scheduler Tasks instances in the :term:`scheduler`
+     database, but it's not the case any more (for performance reasons).
+
   hash
   cryptographic hash
   checksum
@@ -149,6 +164,25 @@ Glossary
     of the corresponding change. A person is associated to a full name and/or
     an email address.

+   raw extrinsic metadata
+   REMD
+
+     A piece of metadata concerning an objects stored in the |swh| archive that
+     is not part of the source code from an :term:`origin`. It can come from a
+     software forge (information about a project that is not the source code
+     repository for this project), a deposited metadata file (for a
+     :term:`deposit`), etc. These pieces of information are kept in their
+     original raw format -- for archiving purpose -- but are also converted
+     into a minimal format (currently a subset of CodeMeta) allowing them to be
+     indexed and searchable.
+
+   raw extrinsic metadata storage
+   REMD Storage
+
+     The |swh| storage dedicated to store all the gathered extrinsic metadata
+     documents verbatim, in their original format. Currently, this service is
+     part of the main :term:`storage`.
+
   release
   tag
   milestone
@@ -165,11 +199,27 @@ Glossary
     associated development metadata (e.g., author, timestamp, log message,
     etc).

+   save code now
+
+     A publicly accessible service allowing users to ask for immediate save of
+     a given source code origin. The request can be automatically accepted and
+     processed if the origin is from a well known domain, or may require manual
+     validation. Note that a save code now request can only concern a supported
+     origin type.
+
   scheduler

     The component of the |swh| architecture dedicated to the management and
     the prioritization of the many tasks.

+   Scheduler Task
+
+     :py:class:`The object <swh.scheduler.model.Task>` (stored in the
+     :term:`scheduler` database) representing a background (celery_) task to be
+     regularly scheduled for execution. Note that not all the background tasks
+     are backed by a Scheduler Task instance; one-shot :term:`loading task`
+     are most of the time not represented and model as Scheduler Task.
+
   snapshot

     the state of all visible branches during a specific visit of an origin
@@ -211,3 +261,4 @@ Glossary
 .. _`persistent identifier`: https://docs.softwareheritage.org/devel/swh-model/persistent-identifiers.html#persistent-identifiers
 .. _`Archival Resource Key`: http://n2t.net/e/ark_ids.html
 .. _publish-subscribe: https://en.wikipedia.org/wiki/Publish%E2%80%93subscribe_pattern
+.. _celery: https://docs.celeryq.dev
--- a/docs/devel/images/general-architecture-indexation.svg
+++ b/docs/devel/images/general-architecture-indexation.svg
--- a/docs/devel/images/general-architecture-ingestion.svg
+++ b/docs/devel/images/general-architecture-ingestion.svg
--- a/docs/devel/images/general-architecture-read.svg
+++ b/docs/devel/images/general-architecture-read.svg
--- a/docs/devel/images/lister-loader-scheduling-architecture.svg
+++ b/docs/devel/images/lister-loader-scheduling-architecture.svg
@@ -2209,7 +2209,7 @@
     id="g16490"
     transform="translate(-210.0923,-72.740013)">
    <title
-       id="title31653">The loader Celery worker execute loader Celery tasks, retrieveing software artifacts from the origin and inserting them in the Storage, including OriginVisit and OriginVisitStatus storage objects.</title>
+       id="title31653">The loader Celery worker execute loader Celery tasks, retrieving software artifacts from the origin and inserting them in the Storage, including OriginVisit and OriginVisitStatus storage objects.</title>
    <g
       transform="translate(4.8328727,82.923866)"
       id="g25463">

--- a/docs/devel/images/object-storage.svg
+++ b/docs/devel/images/object-storage.svg
--- a/docs/devel/index.rst
+++ b/docs/devel/index.rst
@@ -11,8 +11,9 @@ Development
   contributing/index
   tutorials/index
   faq/index
-   roadmap/roadmap-2022
+   roadmap/roadmap-2024
   roadmap/index
+   configuration
   api-reference
   archive-changelog
   journal
@@ -47,7 +48,7 @@ Architecture
 Data Model and Specifications
 -----------------------------

-* :ref:`persistent-identifiers` Specifications of the SoftWare Heritage persistent IDentifiers (SWHID).
+* :ref:`persistent-identifiers` Specifications of the SoftWare Hash persistent IDentifiers (SWHID).
 * :ref:`data-model` Documentation of the main |swh| archive data model.
 * :ref:`journal-specs` Documentation of the Kafka journal of the |swh| archive.

@@ -67,6 +68,7 @@ Roadmap
 * Current roadmap: :ref:`roadmap-current`
 * Previous roadmaps

+  * :ref:`roadmap-2022`
  * :ref:`roadmap-2021`

 System Administration
@@ -86,11 +88,14 @@ Components
 Here is brief overview of the most relevant software components in the Software
 Heritage stack, in alphabetical order.
 For a better introduction to the architecture, see the :ref:`architecture-overview`,
-which presents each of them in a didactical order.
+which presents each of them in a didactic order.

 Each component name is linked to the development documentation
 of the corresponding Python module.

+:ref:`swh.alter <swh-alter>`
+    archive alteration facilities
+
 :ref:`swh.auth <swh-auth>`
    low-level library used by modules needing keycloak authentication

@@ -102,9 +107,8 @@ of the corresponding Python module.
    service providing efficient estimates of the number of objects in the SWH archive,
    using Redis's Hyperloglog

-:ref:`swh.dataset <swh-dataset>`
-    public datasets and periodic data dumps of the archive released by Software
-    Heritage
+:ref:`swh.datasets <swh-datasets>`
+    datasets derived from periodic data dumps created by swh.export

 :ref:`swh.deposit <swh-deposit>`
    push-based deposit of software artifacts to the archive
@@ -112,6 +116,10 @@ of the corresponding Python module.
 swh.docs
    developer documentation (used to generate this doc you are reading)

+:ref:`swh.export <swh-export>`
+    public datasets and periodic data dumps of the archive released by Software
+    Heritage
+
 :ref:`swh.fuse <swh-fuse>`
    Virtual file system to browse the Software Heritage archive, based on
    `FUSE <https://github.com/libfuse/libfuse>`_
@@ -171,6 +179,11 @@ swh.docs
     Low level management for read-only content-addressable object storage
     indexed with a perfect hash table

+:ref:`swh.provenance <swh-provenance>`
+     query service for questions like: “where does this given object come
+     from?” or “what it the oldest revision in which this object has been
+     found?”
+
 :ref:`swh.scanner <swh-scanner>`
    source code scanner to analyze code bases and compare them with source code
    artifacts archived by Software Heritage
@@ -234,4 +247,3 @@ Archive
   * :ref:`routingtable`
   * :ref:`search`
   * :ref:`glossary`
-
--- a/docs/devel/journal.rst
+++ b/docs/devel/journal.rst
@@ -420,7 +420,7 @@ Message format:
 - ``visit`` [int] number of the visit for this ``origin`` this status concerns
 - ``date`` [timestamp] date of the visit status update
 - ``status`` [string] status (can be "created", "ongoing", "full" or "partial"),
- ``snapshot`` [bytes] identifier of the :py:class:`swh.model.model.Snaphot` this
+- ``snapshot`` [bytes] identifier of the :py:class:`swh.model.model.Snapshot` this
  visit resulted in (if ``status`` is "full" or "partial")
 - ``metadata``: deprecated
No results found