Skip to content
Snippets Groups Projects

Compare revisions

Changes are shown as if the source revision was being merged into the target revision. Learn more about comparing revisions.

Source

Select target project
No results found

Target

Select target project
  • ardumont/swh-docs
  • anlambert/swh-docs
  • douardda/swh-docs
  • vlorentz/swh-docs
  • vsellier/swh-docs
  • lunar/swh-docs
  • cmatrix/swh-docs
  • bchauvet/swh-docs
  • guillaume/swh-docs
  • HarshvMahawar/swh-docs
  • swh/devel/swh-docs
  • olasd/swh-docs
  • pabs/swh-docs
  • rboyer/swh-docs
  • marmoute/swh-docs
  • varasterix/swh-docs
16 results
Show changes
Showing
with 14747 additions and 642 deletions
......@@ -12,6 +12,81 @@ in this document for historical reasons.
2023
----
* **2023-04-18** Completed first archival of `annas-software.org gitea
forge <https://annas-software.org/>`_. Regular crawling of their
repositories enabled (tracking: `#4855
<https://gitlab.softwareheritage.org/swh/infra/sysadm-environment/-/issues/4855>`_)
* **2023-04-18** Completed first archival of `Internet Systems Consortium's gitlab
forge <https://gitlab.isc.org/>`_. Regular crawling of their
repositories enabled (tracking: `#4854
<https://gitlab.softwareheritage.org/swh/infra/sysadm-environment/-/issues/4854>`_)
* **2023-04-13** Completed first archival of `dev.sanctum.geek.nz cgit forge
<https://dev.sanctum.geek.nz/>`_. Regular crawling of their
repositories enabled (tracking: `#4852
<https://gitlab.softwareheritage.org/swh/infra/sysadm-environment/-/issues/4852>`_)
* **2023-04-13** Completed first archival of `trueelena.org cgit forge
<https://git.trueelena.org/>`_. Regular crawling of their
repositories enabled (tracking: `#4851
<https://gitlab.softwareheritage.org/swh/infra/sysadm-environment/-/issues/4851>`_)
* **2023-04-12** Completed first archival of `Epita infra gitlab forge
<https://gitlab.cri.epita.fr/cri>`_. Regular crawling of their
repositories enabled (tracking: `#4845
<https://gitlab.softwareheritage.org/swh/infra/sysadm-environment/-/issues/4845>`_)
* **2023-04-11** Completed first archival of `INRAE MathNum department gitlab forge
<https://forgemia.inra.fr/>`_. Regular crawling of their
repositories enabled (tracking: `#4842
<https://gitlab.softwareheritage.org/swh/infra/sysadm-environment/-/issues/4842>`_)
* **2023-04-11** Completed first archival of `Montpellier Bioinformatics Biodiversity
platform gitlab forge <https://gitlab.mbb.cnrs.fr/>`_. Regular crawling of their
repositories enabled (tracking: `#4843
<https://gitlab.softwareheritage.org/swh/infra/sysadm-environment/-/issues/4843>`_)
* **2023-04-07** Completed first archival of `Garbaye gitea forge
<https://git.garbaye.fr/>`_. Regular crawling of their
repositories enabled (tracking: `#4841
<https://gitlab.softwareheritage.org/swh/infra/sysadm-environment/-/issues/4841>`_)
* **2023-04-07** Completed first archival of `Alaryso's personal projects gitea forge
<https://git.alarsyo.net/>`_. Regular crawling of their
repositories enabled (tracking: `#4833
<https://gitlab.softwareheritage.org/swh/infra/sysadm-environment/-/issues/4833>`_)
* **2023-04-05** Completed first archival of `Replicant git repositories (split
in 7 forges) <https://git.replicant.us/>`_. Regular crawling of their
repositories enabled (tracking: `#4685
<https://gitlab.softwareheritage.org/swh/infra/sysadm-environment/-/issues/4685>`_)
* **2023-04-03** Completed first archival of `Software Heritage gitlab forge
<https://gitlab.softwareheritage.org/>`_. Regular crawling of their
repositories enabled (tracking: `#4683
<https://gitlab.softwareheritage.org/swh/infra/sysadm-environment/-/issues/4683>`_)
* **2023-03-30** Completed first archival of `CodeAurora, The Global Gathering for
Mobile Open Source <https://source.codeaurora.org>`_. Regular crawling of its
repositories enabled (tracking: `#4813
<https://gitlab.softwareheritage.org/swh/infra/sysadm-environment/-/issues/4813>`_)
* **2023-02-13** Completed first archival of `AFPY (Association Francophone Python)
git repositories <https://git.afpy.org>`_. Regular crawling of its
repositories enabled (tracking: `#4674
<https://gitlab.softwareheritage.org/swh/infra/sysadm-environment/-/issues/4674>`_)
* **2023-01-05** Completed first archival of `the University of Stuttgart gitlab forge
<https://git.iws.uni-stuttgart.de/>`_. Regular crawling of
their repositories enabled (tracking: `#4712
<https://gitlab.softwareheritage.org/swh/infra/sysadm-environment/-/issues/4712>`_)
* **2023-01-05** Completed first archival of `FFDN (Fédération des Fournisseurs
d'Accès Internet Associatifs) <https://code.ffdn.org/>`_. Regular crawling of
their repositories enabled (tracking: `#4687
<https://gitlab.softwareheritage.org/swh/infra/sysadm-environment/-/issues/4687>`_)
* **2023-01-03** Completed first archival of `DeuxFleurs's gitea forge
<https://git.deuxfleurs.fr>`_, as requested via `Add forge now`_.
Regular crawling of its repositories enabled
......@@ -25,6 +100,11 @@ in this document for historical reasons.
2022
----
* **2022-12-14** Completed first archival of `Université Gustave Eiffel git repositories
<https://gitlab.univ-eiffel.fr/>`_, as requested via `Add forge now`_.
Regular crawling of its repositories enabled
(tracking: `#4675 <https://gitlab.softwareheritage.org/infra/sysadm-environment/-/issues/4675>`_)
* **2022-11-13** Completed first archival of `Jey Hess's git repositories
<https://git.joeyh.name/>`_, as requested via `Add forge now`_.
Regular crawling of its repositories enabled
......
......@@ -4,18 +4,29 @@ set -e
create_links () {
mkdir -p sources
for pymodule in $(cd ../../../ && bin/ls-py-modules) ; do
if [ "$pymodule" = 'swh-docs' ] ; then
continue
fi
case "$pymodule" in
"swh-docs"|"swh-icinga-plugins") continue;;
esac
if [ ! -e "$pymodule" -a -d "../../../${pymodule}/docs" ] ; then
ln -s "../../../${pymodule}/docs" "$pymodule"
fi
if [ -f "$pymodule/images/Makefile" ] ; then
make -C $pymodule images
echo "Build images in $pymodule"
make -I $PWD/../.. -C $pymodule images
fi
if [ -d "../../../${pymodule}/swh" ] ; then
cp -r -f --symbolic-link $(realpath ../../../${pymodule}/swh/*) sources/swh/
elif [ -d "../../../${pymodule}/src/swh" ] ; then
cp -r -f --symbolic-link $(realpath ../../../${pymodule}/src/swh/*) sources/swh/
fi
pushd ../../../${pymodule}
for EXT in rst md; do
if [ -f README.$EXT -a ! -f docs/README.$EXT ] ; then
ln -s ../README.$EXT docs
break
fi
done
popd
done
}
......@@ -25,7 +36,7 @@ remove_links () {
continue
fi
if [ -L "$pymodule" ] ; then
make -C $pymodule clean
make -I $PWD/../.. -C $pymodule clean
rm "$pymodule"
fi
done
......
.. _cli-config:
Configuration reference
=======================
.. highlight:: yaml
|swh| components are all configured with a YAML file, made of multiple blocks,
most of which describe how to connect to other components/services.
Most services are composable, so they can be either instantiated locally or
accessed through |swh|'s HTTP-based RPC protocol (``cls: remote``).
For example, a possible configuration for swh-vault is::
graph:
url: http://graph.internal.softwareheritage.org:5009/
storage:
cls: pipeline
steps:
- cls: retry
- cls: remote
url: http://webapp.internal.staging.swh.network:5002/
objstorage:
cls: s3
compression: gzip
container_name: softwareheritage
path_prefix: content
All URLs in this document are examples, see :ref:`service-url` for actual values.
.. _cli-config-celery:
celery
------
The :ref:`scheduler <swh-scheduler>` uses Celery for running some tasks. This
configuration key is used for parameters passed directly to Celery, e.g. the URI
of the RabbitMQ broker used for distribution of tasks, for both scheduler
commands as well as Celery workers.
The contents of this configuration key follow the `"lowercase settings" schema from
Celery upstream
<https://docs.celeryq.dev/en/stable/userguide/configuration.html#new-lowercase-settings>`_.
Some default values can be found in :mod:`swh.scheduler.celery_backend.config`.
.. _cli-config-graph:
graph
-----
The :ref:`graph <swh-graph>` can only be accessed as a remote service, and
its configuration block is a single key: ``url``, which is the URL to its
HTTP endpoint; usually on port 5009 or at the path ``/graph/``.
.. _cli-config-journal:
journal
-------
The :ref:`journal <swh-journal>` can only be locally instantiated to consume
directly from Kafka::
journal:
brokers:
- broker1.journal.softwareheritage.org:9093
- broker2.journal.softwareheritage.org:9093
- broker3.journal.softwareheritage.org:9093
- broker4.journal.softwareheritage.org:9093
prefix: swh.journal.objects
sasl.mechanism: "SCRAM-SHA-512"
security.protocol: "sasl_ssl"
sasl.username: "..."
sasl.password: "..."
privileged: false
group_id: "..."
.. _cli-config-metadata_fetcher_credentials:
metadata_fetcher_credentials
----------------------------
Nested dictionary of strings.
The first level identifies a :term:`metadata <extrinsic metadata>` fetcher's name
(eg. ``gitea`` ``github``), the second level the lister instance (eg. ``codeberg.org``
or ``github``). The final level is a list of dicts containing the expected API
credentials for the given instance of that fetcher. For example::
metadata_fetcher_credentials:
github:
github:
- username: ...
password: ...
- ...
.. _cli-config-scheduler:
scheduler
---------
The :ref:`scheduler <swh-scheduler>` can only be accessed as a remote service, and
its configuration block is a single key: ``url``, which is the URL to its
HTTP endpoint; usually on port 5008 or at the path ``/scheduler/``.::
scheduler:
cls: remote
url: http://saatchi.internal.softwareheritage.org:5008
.. _cli-config-storage:
storage
-------
Backends
^^^^^^^^
The :ref:`storage <swh-storage>` has four possible classes:
* ``cassandra``, see :class:`swh.storage.cassandra.storage.CassandraStorage`::
storage:
cls: cassandra
hosts: [...]
keyspace: swh
port: 9042
journal_writer:
# ...
# ...
* ``postgresql``, which takes a `libpq connection string <https://www.postgresql.org/docs/current/libpq-connect.html#LIBPQ-CONNSTRING>`_::
storage:
cls: postgresql
db: service=swh
journal_writer:
# ...
For optional arguments, see :class:`swh.storage.postgresql.storage.Storage`
* ``memory``, which stores data in-memory instead of persisting it somewhere;
this should only be used for debugging::
storage:
cls: memory
journal_writer:
# ...
* ``remote``, which takes a URL to a remote service's HTTP endpoint;
usually on port 5002 or at the path ``/storage/``::
storage:
cls: remote
url: http://webapp.internal.staging.swh.network:5002/
The ``journal_writer`` key is optional. If provided, it will be used to write all
additions to some sort of log (usually Kafka) before any write to the main database.
:mod:`swh.journal.writer.kafka`::
cls: kafka
brokers:
- broker1.journal.softwareheritage.org:9093
- broker2.journal.softwareheritage.org:9093
- broker3.journal.softwareheritage.org:9093
- broker4.journal.softwareheritage.org:9093
prefix: swh.journal.objects
anonymize: true
client_id: ...
producer_config: ...
:mod:`swh.journal.writer.stream`, which writes directly to a file
(or stdout if set to ``-``)::
cls: stream
output_stream: /tmp/messages.msgpack
:mod:`swh.journal.writer.inmemory`, which does not actually persist anywhere,
and should only be used for tests::
cls: memory
anonymize: false
Proxies
^^^^^^^
In addition to these three backends, "storage proxies" can be used and chained in order
to change the behavior of accesses to it. They usually do not change the semantics,
but perform optimizations such as batching calls, stripping redundant operations,
and retrying on error.
They are invoked through the special ``pipeline`` class, which takes as parameter
a list of proxy configurations, ending with a backend configuration as seen above::
storage:
cls: pipeline
steps:
- cls: buffer
min_batch_size:
content: 10000
directory: 5000
- cls: filter
- cls: retry
- cls: remote
url: http://webapp1.internal.softwareheritage.org:5002/
which is equivalent to this nested configuration::
storage:
cls: buffer
min_batch_size:
content: 10000
directory: 5000
storage:
cls: filter
storage:
cls: retry
storage:
cls: remote
url: http://webapp1.internal.softwareheritage.org:5002/
See :mod:`swh.storage.proxies` for the list of proxies.
......@@ -12,7 +12,7 @@ Good commit messages are essentials in a project as large as Software Heritage.
They are crucial to those who will review your changes and important to anyone else
who will interact with the codebase at a later time. This includes your future self!
Make sure to follow the recommandations from `How to write a Git
Make sure to follow the recommendations from `How to write a Git
commit message <http://chris.beams.io/posts/git-commit/>`_
Closing or referencing issues
......
......@@ -96,6 +96,7 @@ Run the script by
$ cd swh-environment
$ bin/update # Used to update all the repos under the environment to their latest version
$ bin/fork-gitlab-repo -g swh swh-objstorage
$ bin/fork-gitlab-repo -g swh . # To contribute to swh-environment itself (eg. add a repository)
This will create a new fork of the SWH repository in your namespace and
add a jenkins user to perform automatic builds. You can view the forked
......@@ -152,3 +153,42 @@ If you plan to
you may also want to
`upload your GPG key <https://gitlab.softwareheritage.org/-/profile/gpg_keys>`__
as well.
Make a release
--------------
.. warning:: Only staff members are allowed to make new releases
Releases are made automatically by Jenkins when a tag is pushed to a module repository.
We are using the `semantic versioning <https://semver.org>`_ scheme to name our
releases, please ensure that the name of your tag correctly indicates its compatibility
with the previous version.
Tags themselves should be signed and provide a meaningful annotation with, for example,
an itemized summary of changes (rather than rehashing the whole git log), breaking
changes in a separate section, etc.
First, create the tag:
.. code-block::
# get the latest version number
git describe --tags # returns v1.2.3-x-yyy
# list changes between master and v1.2.3
git range-diff v1.2.3...master
# use the output to write your annotation and create a new signed tag, here for a
# minor version upgrade
git tag -a -s v1.3.0
# push it
git push origin tag v1.3.0
Then you'll see jobs on Jenkins (Incoming tag, GitLab builds, Upload to PyPI)
indicating that the release process is ongoing.
Next, deployment container images are updated.
And finally a new merge request will automatically be created in
`Helm charts for swh packages`_ so that the devops team can proceed with deployment.
.. _Helm charts for swh packages: https://gitlab.softwareheritage.org/swh/infra/sysadm-environment
:orphan:
.. highlight:: bash
.. admonition:: Intended audience
:class: important
Contributors
Important
=========
We have moved our development from Phabricator to a GitLab instance at
https://gitlab.softwareheritage.org/
The content below is no longer relevant and will be updated soon.
Submitting patches
==================
`Phabricator`_ is the tool that Software Heritage uses as its
coding/collaboration forge.
Software Heritage's Phabricator instance can be found at
https://forge.softwareheritage.org/
.. _Phabricator: http://phabricator.org/
Code Review in Phabricator
--------------------------
We use the Differential application of Phabricator to perform
:ref:`code reviews <code-review>` in the context of Software Heritage.
* we use Git and ``history.immutable=true``
(but beware as that is partly a Phabricator misnomer, read on)
* when code reviews are required, developers will be allowed to push
directly to master once an accepted Differential diff exists
Configuration
+++++++++++++
.. _arcanist-configuration:
Arcanist configuration
^^^^^^^^^^^^^^^^^^^^^^
Authentication
~~~~~~~~~~~~~~
First, you should install Arcanist and authenticate it to Phabricator::
sudo apt-get install arcanist
arc set-config default https://forge.softwareheritage.org/
arc install-certificate
arc will prompt you to login into Phabricator via web
(which will ask your personal Phabricator credentials).
You will then have to copy paste the API token from the web page to arc,
and hit Enter to complete the certificate installation.
Immutability
~~~~~~~~~~~~
When using git, Arcanist by default mess with the local history,
rewriting commits at the time of first submission.
To avoid that we use so called `history immutability`_
.. _history immutability: https://secure.phabricator.com/book/phabricator/article/arcanist_new_project/#history-mutability-git
To that end, you shall configure your ``arc`` accordingly::
arc set-config history.immutable true
Note that this does **not** mean that you are forbidden to rewrite
your local branches (e.g., with ``git rebase``).
Quite the contrary: you are encouraged to locally rewrite branches
before pushing to ensure that commits are logically separated
and your commit history easy to bisect.
The above setting just means that *arc* will not rewrite commit
history under your nose.
Enabling ``git push`` to our forge
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
The way we've configured our review setup for continuous integration
needs you to configure git to allow pushes to our forge.
There's two ways you can do this : setting a ssh key to push over ssh,
or setting a specific password for git pushes over https.
SSH key for pushes
~~~~~~~~~~~~~~~~~~
In your forge User settings page (On the top right, click on your avatar,
then click *Settings*), you have access to a *Authentication* >
*SSH Public Keys* section (Direct link:
``hxxps://forge.softwareheritage.org/settings/user/<your username>/page/ssh/``).
You then have the option to upload a SSH public key,
which will authenticate your pushes.
You then need to configure ssh/git to use that key pair,
for instance by editing the ``~/.ssh/config`` file.
Finally, you should configure git to push over ssh when pushing to
https://forge.softwareheritage.org, by running the following command::
git config --global url.git@forge.softwareheritage.org:.pushInsteadOf https://forge.softwareheritage.org
This lets git know that it should use ``git@forge.softwareheritage.org:``
as a base url when pushing repositories cloned from
forge.softwareheritage.org over https.
VCS password for pushes
~~~~~~~~~~~~~~~~~~~~~~~
.. warning:: Please, only use this if you're completely unable to use ssh.
As a fallback to the ssh setup, you have the option of setting a VCS password. This
password, *separate from your account password*, allows Phabricator to authenticate your
uploads over HTTPS.
In your forge User settings page (On the top right, click on your avatar, then click
*Settings*), you need to use the *Authentication* > *VCS Password* section to set your
VCS password (Direct link: ``hxxps://forge.softwareheritage.org/settings/user/<your
username>/page/vcspassword/``).
If you still get a 403 error on push, this means you need a forge administrator to
enable HTTPS pushes for the repository (which wasn't done by default in historical
repositories). Please drop by on IRC and let us know!
Workflow
++++++++
* work in a feature branch: ``git checkout -b my-feat``
* initial review request: hack/commit/hack/commit ;
``arc diff origin/master``
* react to change requests: hack/commit/hack/commit ;
``arc diff --update Dxx origin/master``
* landing change: ``git checkout master ; git merge my-feat ; git push``
Starting a new feature and submit it for review
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Use a **one branch per feature** workflow, with well-separated
**logical commits** (:ref:`following those conventions <git-style-guide>`).
Please open one diff per logical commit to keep the diff size to a minimum.
.. code-block::
git checkout -b my-shiny-feature
... hack hack hack ...
git commit -m 'architecture skeleton for my-shiny-feature'
... hack hack hack ...
git commit -m 'my-shiny-feature: implement module foo'
... etc ...
Please, follow the
To **submit your code for review** the first time::
arc diff origin/master
arc will prompt for a **code review message**. Provide the following information:
* first line: *short description* of the overall work
(i.e., the feature you're working on).
This will become the title of the review
* *Summary* field (optional): *long description* of the overall work;
the field can continue in subsequent lines, up to the next field.
This will become the "Summary" section of the review
* *Test Plan* field (optional): write here if something special is needed
to test your change
* *Reviewers* field (optional): the (Phabricator) name(s) of
desired reviewers.
If you don't specify one (recommended) the default reviewers will be chosen
* *Subscribers* field (optional): the (Phabricator) name(s) of people that
will be notified about changes to this review request.
In most cases it should be left empty
For example::
mercurial loader
Summary: first stab at a mercurial loader (T329)
The implementation follows the plan detailed in F2F discussion with @foo.
Performances seem decent enough for a first trial (XXX seconds for YYY repository
that contains ZZZ patches).
Test plan:
Reviewers:
Subscribers: foo
After completing the message arc will submit the review request
and tell you its number and URL::
[...]
Created a new Differential revision:
Revision URI: https://forge.softwareheritage.org/Dxx
.. _arc-update:
Updating your branch to reflect requested changes
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Your feature might get accepted as is, YAY!
Or, reviewers might request changes; no big deal!
Use the Differential web UI to follow-up to received comments, if needed.
To implement requested changes in the code, hack on your branch as usual by:
* adding new commits, and/or
* rewriting old commits with git rebase (to preserve a nice, easy to bisect history)
* pulling on master and rebasing your branch against it if meanwhile someone
landed commits on master:
.. code-block::
git checkout master
git pull
git checkout my-shiny-feature
git rebase master
When you're ready to **update your review request**::
arc diff --update Dxx HEAD~
Arc will prompt you for a message: **describe what you've changed
w.r.t. the previous review request**, free form.
This means you should not repeat the title of your diff (which is
often the default if you squashed/amended your commits)
Your message will become the changelog entry in Differential
for this new version of the diff, and will help reviewers
understand what changes you made since they last read your diff.
Differential only care about the code diff, and not about the commits
or their order.
Therefore each "update" can be a completely different series of commits,
possibly rewritten from the previous submission.
Dependencies between diffs
^^^^^^^^^^^^^^^^^^^^^^^^^^
Note that you can manage diff dependencies within the same module
with the following keyword in the diff description::
Depends on Dxx
That allows to keep a logical view in your diff.
It's not strictly necessary (because the tooling now deals with it properly)
but it might help reviewers or yourself to do so.
Landing your change onto master
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Once your change has been approved in Differential,
you will be able to land it onto the master branch.
Before doing so, you're encouraged to **clean up your git commit history**,
reordering/splitting/merging commits as needed to have separate
logical commits and an easy to bisect history.
Update the diff :ref:`following the prior section <arc-update>`
(It'd be good to let the CI build finish to make sure everything is still green).
Once you're happy you can **push to origin/master** directly, e.g.::
git checkout master
git merge --ff-only my-shiny-feature
git push
``--ff-only`` is optional, and makes sure you don't unintentionally
create a merge commit.
Optionally you can then delete your local feature branch::
git branch -d my-shiny-feature
Reviewing locally / landing someone else's changes
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
You can do local reviews of code with arc patch::
arc patch Dxyz
This will create a branch **arcpatch-Dxyz** containing the changes
on your local checkout.
You can then merge those changes upstream with::
git checkout master
git merge --ff arcpatch-Dxyz
git push origin master
or, alternatively::
arc land --squash
See also
--------
* :ref:`code-review` for guidelines on how code is reviewed
when developing for Software Heritage
......@@ -3,7 +3,12 @@
Sphinx gotchas
==============
Here is a list of common gotchas when formatting Python docstrings for `Sphinx <https://www.sphinx-doc.org/en/stable/>`_ and the `Napoleon <https://www.sphinx-doc.org/en/stable/ext/napoleon.html>`_ style.
Here is a list of common gotchas when formatting Python docstrings for `Sphinx
<https://www.sphinx-doc.org/en/stable/>`_ and the `Napoleon
<https://www.sphinx-doc.org/en/stable/ext/napoleon.html>`_ style.
.. highlight:: rst
Sphinx
------
......@@ -11,12 +16,12 @@ Sphinx
Lists
+++++
All sorts of `lists <https://www.sphinx-doc.org/en/stable/rest.html#lists-and-quote-like-blocks>`_
require an empty line before the first bullet and after the last one,
to be properly interpreted as list.
No indentation is required for list elements w.r.t. surrounding text,
and line continuations should be indented like the first character
after the bullet.
All sorts of `lists
<https://www.sphinx-doc.org/en/stable/rest.html#lists-and-quote-like-blocks>`_
require an empty line before the first bullet and after the last one, to be
properly interpreted as list. No indentation is required for list elements
w.r.t. surrounding text, and line continuations should be indented like the
first character after the bullet.
Bad::
......@@ -177,10 +182,13 @@ Good::
Args:
foo (int): first argument
bar: second argument, which happen to have a fairly
long description of what it does
long description of what it does
baz (bool): third argument
Returns
+++++++
......@@ -232,6 +240,7 @@ Good::
ValueError: if you botched it
RuntimeError: if we botched it
See also
--------
......
......@@ -99,7 +99,7 @@ and/or `Daniele Procida's presentation <https://www.youtube.com/watch?v=t4vKPhjc
.. note::
We propose using in the following naming scheme depending on the type of document:
* Tutorial: Tutorial name]
* Tutorial: [Tutorial name]
* How to ...
* Reference: [Reference name]
* Explanation: [Explanation name]
......
......@@ -22,58 +22,152 @@ Install required dependencies
-----------------------------
Software Heritage requires some dependencies that are usually packaged by your
package manager. On Debian/Ubuntu-based distributions::
sudo apt install lsb-release wget apt-transport-https
sudo wget https://www.postgresql.org/media/keys/ACCC4CF8.asc -O /etc/apt/trusted.gpg.d/postgresql.asc
echo "deb https://apt.postgresql.org/pub/repos/apt/ $(lsb_release -cs)-pgdg main" | sudo tee -a /etc/apt/sources.list.d/pgdg.list
sudo wget https://downloads.apache.org/cassandra/KEYS -O /etc/apt/trusted.gpg.d/cassandra.asc
echo "deb https://downloads.apache.org/cassandra/debian 40x main" | sudo tee -a /etc/apt/sources.list.d/cassandra.list
sudo apt update
sudo apt install \
build-essential pkg-config lzip rsync \
python3 python3-pip python3-venv virtualenvwrapper \
libpython3-dev libsystemd-dev libsvn-dev libffi-dev librdkafka-dev \
fuse3 libfuse3-dev libcmph-dev libleveldb-dev \
git myrepos \
graphviz plantuml inkscape \
postgresql libpq-dev cassandra
.. Note:: Python 3.7 or newer is required
package manager.
.. tab-set::
.. tab-item:: Debian/Ubuntu
.. code-block:: console
sudo apt install lsb-release wget apt-transport-https
sudo wget https://www.postgresql.org/media/keys/ACCC4CF8.asc -O /etc/apt/trusted.gpg.d/postgresql.asc
echo "deb https://apt.postgresql.org/pub/repos/apt/ $(lsb_release -cs)-pgdg main" | sudo tee -a /etc/apt/sources.list.d/pgdg.list
sudo wget https://downloads.apache.org/cassandra/KEYS -O /etc/apt/trusted.gpg.d/cassandra.asc
echo "deb https://debian.cassandra.apache.org 41x main" | sudo tee -a /etc/apt/sources.list.d/cassandra.list
sudo apt update
sudo apt install \
build-essential pkg-config lzip rsync \
python3 python3-pip python3-venv virtualenvwrapper \
libpython3-dev libsystemd-dev libsvn-dev libffi-dev librdkafka-dev \
fuse3 libfuse3-dev libcmph-dev libleveldb-dev \
git myrepos \
graphviz plantuml inkscape \
postgresql libpq-dev cassandra redis-server
.. tab-item:: Fedora
.. code-block:: console
sudo dnf install java-17-openjdk-headless
# Make sure the path is correct. If not, choose the alternative corresponding to java-17
sudo update-alternatives --set java /usr/lib/jvm/java-17-openjdk-17.0.13.0.11-3.fc41.x86_64/bin/java
sudo rpm --import https://downloads.apache.org/cassandra/KEYS
echo "[cassandra]
name=Apache Cassandra
baseurl=https://redhat.cassandra.apache.org/50x/
gpgcheck=1
repo_gpgcheck=0
gpgkey=https://downloads.apache.org/cassandra/KEYS" | sudo tee /etc/yum.repos.d/cassandra.repo
sudo dnf -y update
sudo dnf -y install cassandra
sudo dnf -y group install c-development
sudo dnf -y install \
pkgconf-pkg-config lzip rsync python3.11 python3-virtualenvwrapper \
python3.11-devel systemd-devel subversion-devel libffi-devel \
librdkafka fuse3 fuse3-devel leveldb-devel git myrepos graphviz \
plantuml inkscape postgresql-server postgresql-contrib libpq \
libpq-devel redis
# You will also need to install CMPH manually, as it is not (yet?) included in the Fedora repositories
wget https://sourceforge.net/projects/cmph/files/v2.0.2/cmph-2.0.2.tar.gz
tar -xvf cmph-2.0.2.tar.gz
cd cmph-2.0.2
./configure && make && sudo make install
cd ..
.. Note:: Python 3.10 or newer is required
This installs basic system utilities, Python library dependencies, development tools,
documentation tools and our main database management systems.
Cassandra and PostgreSQL will be started by tests when they need it, so you
don't need them started globally (this will save you some RAM)::
don't need them started globally (this will save you some RAM):
.. code-block:: console
sudo systemctl disable --now cassandra postgresql
If you intend to hack on the frontend part of |swh| Web Applications, you will also
need to have ``nodejs >= 14`` in your development environment. If the version in your
Debian-based distribution is lower, you can install node 14 using these commands::
You must also have ``nodejs >= 20`` in your development environment.
You can install node 18 using these commands:
.. tab-set::
.. tab-item:: Debian/Ubuntu
.. code-block:: console
curl -fsSL https://deb.nodesource.com/setup_20.x | sudo bash -
sudo apt install -y nodejs
.. tab-item:: Fedora
.. code-block:: console
sudo wget https://deb.nodesource.com/gpgkey/nodesource.gpg.key -O /etc/apt/trusted.gpg.d/nodesource.asc
echo "deb https://deb.nodesource.com/node_14.x $(lsb_release -cs) main" | sudo tee -a /etc/apt/sources.list.d/nodesource.list
sudo apt update
sudo apt install nodejs
sudo dnf -y install nodejs
Also related to Web Applications development, |swh| uses the ``yarn`` package manager
to retrieve frontend dependencies and development tools. It is recommended to install its
latest classic version using these commands::
|swh| uses the ``yarn`` package manager to retrieve frontend dependencies and development tools.
You must install its latest classic version using this command:
sudo wget https://dl.yarnpkg.com/debian/pubkey.gpg -O /etc/apt/trusted.gpg.d/yarn.asc
echo "deb https://dl.yarnpkg.com/debian/ stable main" | sudo tee /etc/apt/sources.list.d/yarn.list
sudo apt update
sudo apt install yarn
.. tab-set::
.. tab-item:: Debian/Ubuntu
.. code-block:: console
sudo corepack enable
.. tab-item:: Fedora
.. code-block:: console
sudo dnf -y install yarnpkg
If you intend to work on |swh| archive search features, Elasticsearch must also be
present in your development environment. Proceed as follows to install it::
present in your development environment. Proceed as follows to install it:
.. tab-set::
.. tab-item:: Debian/Ubuntu
.. code-block:: console
sudo wget https://artifacts.elastic.co/GPG-KEY-elasticsearch -O /etc/apt/trusted.gpg.d/elasticsearch.asc
echo "deb https://artifacts.elastic.co/packages/7.x/apt stable main" | sudo tee -a /etc/apt/sources.list.d/elasticsearch.list
sudo apt update
sudo apt install elasticsearch
.. tab-item:: Fedora
sudo wget https://artifacts.elastic.co/GPG-KEY-elasticsearch -O /etc/apt/trusted.gpg.d/elasticsearch.asc
echo "deb https://artifacts.elastic.co/packages/7.x/apt stable main" | sudo tee -a /etc/apt/sources.list.d/elasticsearch.list
sudo apt update
sudo apt install elasticsearch
.. code-block:: console
echo "[elasticsearch]
name=Elasticsearch repository for 8.x packages
baseurl=https://artifacts.elastic.co/packages/8.x/yum
gpgcheck=1
gpgkey=https://artifacts.elastic.co/GPG-KEY-elasticsearch
autorefresh=1
type=rpm-md" | sudo tee /etc/yum.repos.d/elasticsearch.repo
sudo dnf -y update
sudo dnf -y install elasticsearch
If you intend to build the full |swh| documentation, the ``postgresql-autodoc`` utility must
also be installed, follow these `instructions <https://github.com/cbbrowne/autodoc#installation>`_
......@@ -84,30 +178,34 @@ to do so.
Checkout the source code
------------------------
Clone the |swh| environment repository::
Clone the |swh| environment repository:
.. code-block:: console
~$ git clone https://gitlab.softwareheritage.org/swh/devel/swh-environment.git
[...]
~$ cd swh-environment
~/swh-environment$
Create a virtualenv::
Create a virtualenv:
.. code-block:: console
~/swh-environment$ source /usr/share/virtualenvwrapper/virtualenvwrapper.sh
~/swh-environment$ mkvirtualenv -p /usr/bin/python3 -a $PWD swh
[...]
(swh) ~/swh-environment$
Checkout all the swh packages source repositories::
Checkout all the swh packages source repositories:
.. code-block:: console
(swh) ~/swh-environment$ pip install pre-commit
(swh) ~/swh-environment$ ./bin/update
Use the same mypy version our tox containers use::
(swh) ~/swh-environment$ pip install mypy==1.0.1
In the future you can re-activate the created virtualenv with:
In the future you can re-activate the created virtualenv with::
.. code-block:: console
$ workon swh
(swh) ~/swh-environment$
......@@ -122,19 +220,11 @@ In the future you can re-activate the created virtualenv with::
.. _pipenv: https://pipenv.readthedocs.io/
Install all the swh packages (in development mode, with testing dependencies)::
Install all the swh packages (in development mode, with testing dependencies):
(swh) ~/swh-environment$ bin/install
.. note::
.. code-block:: console
If you experience issues with :program:`pip` dependency resolution, try with
``bin/install --use-deprecated=legacy-resolver`` (the flag will be passed on
to ``pip install``). The same flag can also be set globally in
:file:`~/.config/pip/pip.conf`::
[install]
use-deprecated=legacy-resolver
(swh) ~/swh-environment$ bin/install
Executing unit tests
......@@ -151,7 +241,9 @@ tox_. The main difference between these 2 test execution environments is:
current virtualenv, installed from the git repositories: you test your
modification against the HEAD of every swh package.
For example, running unit tests for the swh-loader-git_ package::
For example, running unit tests for the swh-loader-git_ package:
.. code-block:: console
(swh) ~/swh-environment$ cd swh-loader-git
(swh) ~/swh-environment/swh-loader-git$ pytest
......@@ -171,7 +263,9 @@ For example, running unit tests for the swh-loader-git_ package::
[...]
================== 25 passed, 12 warnings in 6.66 seconds ==================
Running the same test, plus code linting and static analysis, using tox::
Running the same test, plus code linting and static analysis, using tox:
.. code-block:: console
(swh) ~/swh-environment/swh-loader-git$ tox
GLOB sdist-make: ~/swh-environment/swh-loader-git/setup.py
......@@ -235,7 +329,9 @@ Running the same test, plus code linting and static analysis, using tox::
Beware that some swh packages require a postgresql server properly configured
to execute the tests. In this case, you will want to use pifpaf_, which will
spawn a temporary instance of postgresql, to encapsulate the call to pytest.
For example, running pytest in the swh-core package::
For example, running pytest in the swh-core package:
.. code-block:: console
(swh) ~/swh-environment$ cd swh-core
(swh) ~/swh-environment/swh-core$ pifpaf run postgresql -- pytest
......
......@@ -29,7 +29,7 @@ specific skills needed to work on any topic of your interest.
What are the minimum system requirements (hardware/software) to run SWH locally?
--------------------------------------------------------------------------------
Python 3.7 or newer is required. See the :ref:`developer setup documentation
Python 3.10 or newer is required. See the :ref:`developer setup documentation
<developer-setup>` for more details.
......@@ -126,8 +126,8 @@ Getting sample datasets
Is there a way to connect to SWH archived (production) database from my local machine?
--------------------------------------------------------------------------------------
We provide the archive as a dataset on public clouds, see the :ref:`swh-dataset
documentation <swh-dataset>`. We can
We provide the archive as a dataset on public clouds, see the :ref:`swh-export
documentation <swh-export>`. We can
also provide read access to one of the main databases on request, `contact us`_.
.. _faq_error_bugs:
......
......@@ -23,20 +23,30 @@ Dependencies
The easiest way to run a Software Heritage instance is to use Docker.
Please `ensure that you have a working recent installation first
<https://docs.docker.com/engine/install/>`_ (including the
`Compose <https://docs.docker.com/compose/>`_ plugin.
`Compose <https://docs.docker.com/compose/>`_ plugin).
Quick start
-----------
First, retrieve Software Heritage development environment to get the
Docker configuration::
Docker configuration:
~$ git clone https://gitlab.softwareheritage.org/swh/devel/swh-environment.git
~$ cd swh-environment/docker
.. code-block:: console
Then, start containers::
~$ git clone https://gitlab.softwareheritage.org/swh/devel/docker.git swh-docker
~$ cd swh-docker
~/swh-environment/docker$ docker compose up -d
.. note::
If you intend to hack on Software Heritage source code and test your changes with docker,
you should rather follow the instructions in section :ref:`checkout-source-code` to
install the full Software Heritage development environment that includes Docker configuration.
Then, start containers:
.. code-block:: console
~/swh-docker$ docker compose up -d
[...]
Creating docker_amqp_1 ... done
Creating docker_zookeeper_1 ... done
......@@ -46,9 +56,11 @@ Then, start containers::
[...]
This will build Docker images and run them. Check everything is running
fine with::
fine with:
.. code-block:: console
~/swh-environment/docker$ docker compose ps
~/swh-docker$ docker compose ps
Name Command State Ports
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
docker_amqp_1 docker-entrypoint.sh rabbi ... Up 15671/tcp, 0.0.0.0:5018->15672/tcp, 25672/tcp, 4369/tcp, 5671/tcp, 5672/tcp
......@@ -63,9 +75,11 @@ dependency-related problems. If some containers failed to start, just
run the ``docker compose up -d`` command again.
If a container really refuses to start properly, you can check why using
the ``docker compose logs`` command. For example::
the ``docker compose logs`` command. For example:
~/swh-environment/docker$ docker compose logs swh-lister
.. code-block:: console
~/swh-docker$ docker compose logs swh-lister
Attaching to docker_swh-lister_1
[...]
swh-lister_1 | Processing /src/swh-scheduler
......@@ -77,19 +91,37 @@ the ``docker compose logs`` command. For example::
For details on the various Docker images and how to work with them,
see the full :ref:`docker-environment` documentation.
Once all containers are running, you can use the web interface by
opening http://localhost:5080/ in your web browser.
Once all containers are running, you can use the web interface by opening
http://localhost:<nginx-port>/ in your web browser. ``<nginx-port>`` is the
port on which nginx is exposed to the host. By default, it is randomly
attributed by docker. Use:
.. code-block:: console
~/swh-docker$ docker compose port nginx 80
To find which port is actually used.
.. note::
Please read the "Exposed Ports" section of the README file in the
`swh-docker`_ repository for more details and options on this topic.
.. _`swh-docker`: https://gitlab.softwareheritage.org/swh/devel/docker.git
At this point, the archive is empty and needs to be filled with some
content. The simplest way to start loading software is to use the
*Save Code Now* feature of the archive web interface:
http://localhost:5080/browse/origin/save/
http://localhost:<nginx-port>/browse/origin/save/
You can also use the command line interface to inject code. For
example to retrieve projects hossted on the https://0xacab.org GitLab forge::
example to retrieve projects hossted on the https://0xacab.org GitLab forge:
.. code-block:: console
~/swh-environment/docker$ docker compose exec swh-scheduler \
~/swh-docker$ docker compose exec swh-scheduler \
swh scheduler task add list-gitlab-full \
-p oneshot url=https://0xacab.org/api/v4
......@@ -108,17 +140,21 @@ This task will scrape the forge’s project list and register origins to the sch
This takes at most a couple of minutes.
Then, you must tell the scheduler to create loading tasks for these origins.
For example, to create tasks for 100 of these origins::
For example, to create tasks for 100 of these origins:
~/swh-environment/docker$ docker compose exec swh-scheduler \
.. code-block:: console
~/swh-docker$ docker compose exec swh-scheduler \
swh scheduler origin schedule-next git 100
This will take a bit of time to complete.
To increase the speed at which git repositories are imported, you can
spawn more ``swh-loader-git`` workers::
spawn more ``swh-loader-git`` workers:
.. code-block:: console
~/swh-environment/docker$ docker compose exec swh-scheduler \
~/swh-docker$ docker compose exec swh-scheduler \
celery status
listers@50ac2185c6c9: OK
loader@b164f9055637: OK
......@@ -126,18 +162,20 @@ spawn more ``swh-loader-git`` workers::
vault@c9fef1bbfdc1: OK
4 nodes online.
~/swh-environment/docker$ docker compose exec swh-scheduler \
~/swh-docker$ docker compose exec swh-scheduler \
celery control pool_grow 3 -d loader@b164f9055637
-> loader@b164f9055637: OK
pool will grow
~/swh-environment/docker$ docker compose exec swh-scheduler \
~/swh-docker$ docker compose exec swh-scheduler \
celery inspect -d loader@b164f9055637 stats | grep prefetch_count
"prefetch_count": 4
Now there are 4 workers ingesting git repositories. You can also
increase the number of ``swh-loader-git`` containers::
increase the number of ``swh-loader-git`` containers:
.. code-block:: console
~/swh-environment/docker$ docker compose up -d --scale swh-loader=4
~/swh-docker$ docker compose up -d --scale swh-loader=4
[...]
Creating docker_swh-loader_2 ... done
Creating docker_swh-loader_3 ... done
......@@ -153,24 +191,28 @@ Heritage. When new versions of these components are released, the docker
image will not be automatically updated. In order to update all Software
Heritage components to their latest version, the docker image needs to
be explicitly rebuilt by issuing the following command from within the
``docker`` directory::
``docker`` directory:
~/swh-environment/docker$ docker build --no-cache -t swh/stack .
.. code-block:: console
~/swh-docker$ docker build --no-cache -t swh/stack .
Monitor your local installation
-------------------------------
You can monitor your local installation by looking at:
- http://localhost:5080/rabbitmq to access the rabbitmq dashboard (guest/guest),
- http://localhost:5080/grafana to explore the platform's metrics (admin/admin),
- http://localhost:<nginx-port>/rabbitmq to access the rabbitmq dashboard (guest/guest),
- http://localhost:<nginx-port>/grafana to explore the platform's metrics (admin/admin),
Shut down your local installation
---------------------------------
To shut down your SoftWare Heritage, just run::
To shut down your SoftWare Heritage, just run:
.. code-block:: console
~/swh-environment/docker$ docker compose down
~/swh-docker$ docker compose down
Hacking the archive
-------------------
......
This diff is collapsed.
......@@ -22,6 +22,14 @@ Glossary
An artifact is one of many kinds of tangible by-products produced during
the development of software.
bulk on-demand archival
A |swh| service allowing a partner to ask the archival for (possibly
large) number of origins. It consists in an authenticated API endpoint
allowing the user to upload a list of origins (as a CSV file) to be
ingested as soon as possible. The service allows to get feedback from the
|swh| archive about the ongoing ingestion process.
content
blob
......@@ -94,6 +102,13 @@ Glossary
add new file contents int :term:`object storage` and repository structure
in the :term:`storage database`).
loading task
A celery_ task doing the actual ingestion process; its implementation is
provided by a :term:`loader`, and it is executed by celery_ workers. They
used to be backed by Scheduler Tasks instances in the :term:`scheduler`
database, but it's not the case any more (for performance reasons).
hash
cryptographic hash
checksum
......@@ -149,6 +164,25 @@ Glossary
of the corresponding change. A person is associated to a full name and/or
an email address.
raw extrinsic metadata
REMD
A piece of metadata concerning an objects stored in the |swh| archive that
is not part of the source code from an :term:`origin`. It can come from a
software forge (information about a project that is not the source code
repository for this project), a deposited metadata file (for a
:term:`deposit`), etc. These pieces of information are kept in their
original raw format -- for archiving purpose -- but are also converted
into a minimal format (currently a subset of CodeMeta) allowing them to be
indexed and searchable.
raw extrinsic metadata storage
REMD Storage
The |swh| storage dedicated to store all the gathered extrinsic metadata
documents verbatim, in their original format. Currently, this service is
part of the main :term:`storage`.
release
tag
milestone
......@@ -165,11 +199,27 @@ Glossary
associated development metadata (e.g., author, timestamp, log message,
etc).
save code now
A publicly accessible service allowing users to ask for immediate save of
a given source code origin. The request can be automatically accepted and
processed if the origin is from a well known domain, or may require manual
validation. Note that a save code now request can only concern a supported
origin type.
scheduler
The component of the |swh| architecture dedicated to the management and
the prioritization of the many tasks.
Scheduler Task
:py:class:`The object <swh.scheduler.model.Task>` (stored in the
:term:`scheduler` database) representing a background (celery_) task to be
regularly scheduled for execution. Note that not all the background tasks
are backed by a Scheduler Task instance; one-shot :term:`loading task`
are most of the time not represented and model as Scheduler Task.
snapshot
the state of all visible branches during a specific visit of an origin
......@@ -211,3 +261,4 @@ Glossary
.. _`persistent identifier`: https://docs.softwareheritage.org/devel/swh-model/persistent-identifiers.html#persistent-identifiers
.. _`Archival Resource Key`: http://n2t.net/e/ark_ids.html
.. _publish-subscribe: https://en.wikipedia.org/wiki/Publish%E2%80%93subscribe_pattern
.. _celery: https://docs.celeryq.dev
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
......@@ -2209,7 +2209,7 @@
id="g16490"
transform="translate(-210.0923,-72.740013)">
<title
id="title31653">The loader Celery worker execute loader Celery tasks, retrieveing software artifacts from the origin and inserting them in the Storage, including OriginVisit and OriginVisitStatus storage objects.</title>
id="title31653">The loader Celery worker execute loader Celery tasks, retrieving software artifacts from the origin and inserting them in the Storage, including OriginVisit and OriginVisitStatus storage objects.</title>
<g
transform="translate(4.8328727,82.923866)"
id="g25463">
......
This diff is collapsed.
......@@ -11,8 +11,9 @@ Development
contributing/index
tutorials/index
faq/index
roadmap/roadmap-2022
roadmap/roadmap-2024
roadmap/index
configuration
api-reference
archive-changelog
journal
......@@ -47,7 +48,7 @@ Architecture
Data Model and Specifications
-----------------------------
* :ref:`persistent-identifiers` Specifications of the SoftWare Heritage persistent IDentifiers (SWHID).
* :ref:`persistent-identifiers` Specifications of the SoftWare Hash persistent IDentifiers (SWHID).
* :ref:`data-model` Documentation of the main |swh| archive data model.
* :ref:`journal-specs` Documentation of the Kafka journal of the |swh| archive.
......@@ -67,6 +68,7 @@ Roadmap
* Current roadmap: :ref:`roadmap-current`
* Previous roadmaps
* :ref:`roadmap-2022`
* :ref:`roadmap-2021`
System Administration
......@@ -86,11 +88,14 @@ Components
Here is brief overview of the most relevant software components in the Software
Heritage stack, in alphabetical order.
For a better introduction to the architecture, see the :ref:`architecture-overview`,
which presents each of them in a didactical order.
which presents each of them in a didactic order.
Each component name is linked to the development documentation
of the corresponding Python module.
:ref:`swh.alter <swh-alter>`
archive alteration facilities
:ref:`swh.auth <swh-auth>`
low-level library used by modules needing keycloak authentication
......@@ -102,9 +107,8 @@ of the corresponding Python module.
service providing efficient estimates of the number of objects in the SWH archive,
using Redis's Hyperloglog
:ref:`swh.dataset <swh-dataset>`
public datasets and periodic data dumps of the archive released by Software
Heritage
:ref:`swh.datasets <swh-datasets>`
datasets derived from periodic data dumps created by swh.export
:ref:`swh.deposit <swh-deposit>`
push-based deposit of software artifacts to the archive
......@@ -112,6 +116,10 @@ of the corresponding Python module.
swh.docs
developer documentation (used to generate this doc you are reading)
:ref:`swh.export <swh-export>`
public datasets and periodic data dumps of the archive released by Software
Heritage
:ref:`swh.fuse <swh-fuse>`
Virtual file system to browse the Software Heritage archive, based on
`FUSE <https://github.com/libfuse/libfuse>`_
......@@ -171,6 +179,11 @@ swh.docs
Low level management for read-only content-addressable object storage
indexed with a perfect hash table
:ref:`swh.provenance <swh-provenance>`
query service for questions like: “where does this given object come
from?” or “what it the oldest revision in which this object has been
found?”
:ref:`swh.scanner <swh-scanner>`
source code scanner to analyze code bases and compare them with source code
artifacts archived by Software Heritage
......@@ -234,4 +247,3 @@ Archive
* :ref:`routingtable`
* :ref:`search`
* :ref:`glossary`
......@@ -420,7 +420,7 @@ Message format:
- ``visit`` [int] number of the visit for this ``origin`` this status concerns
- ``date`` [timestamp] date of the visit status update
- ``status`` [string] status (can be "created", "ongoing", "full" or "partial"),
- ``snapshot`` [bytes] identifier of the :py:class:`swh.model.model.Snaphot` this
- ``snapshot`` [bytes] identifier of the :py:class:`swh.model.model.Snapshot` this
visit resulted in (if ``status`` is "full" or "partial")
- ``metadata``: deprecated
......