Antoine Pietri
--- a/docs/memory.rst 0 → 100644
+++ b/docs/memory.rst 0 → 100644
+virtual address space. The Linux kernel will then be free to arbitrarily cache
+the file, either partially or in its entirety, depending on the available
+memory space.
+
+In our experiments, memory-mapping a small graph from a SSD only incurs a
+relatively small slowdown (about 15-20%). However, when the graph is too big to
+fit in RAM, the kernel has to constantly invalidate pages to cache newly
+accessed sections, which incurs a very large performance penalty. A full
+traversal of a large graph that usually takes about 20 hours when loaded in
+main memory could take more than a year when mapped from a hard drive!
+
+When deciding what to direct-load and what to memory-map, here are a few rules
+of thumb:
+
+- If you don't need random access to the graph edges, you can consider using
+  the "offline" loading mode. The offsets won't be loaded which will save
--- a/docs/memory.rst 0 → 100644
+++ b/docs/memory.rst 0 → 100644
+we load the graph using the memory-mapped loading mode, which makes it use the
+shared memory stored in the tmpfs under the hood.
+
+Here is a systemd service that can be used to perform this task automatically:
+
+.. code-block:: ini
+
+    [Unit]
+    Description=swh-graph memory sharing in tmpfs
+
+    [Service]
+    Type=oneshot
+    RemainAfterExit=yes
+    ExecStart=mkdir -p /dev/shm/swh-graph/default
+    ExecStart=sh -c "ln -s /.../compressed/* /dev/shm/swh-graph/default"
+    ExecStart=cp --remove-destination /.../compressed/graph.graph /dev/shm/swh-graph/default
--- a/docs/quickstart.rst
+++ b/docs/quickstart.rst
+.. code:: console

+    (venv) $ pip install awscli
+    [...]
+    (venv) $ mkdir -p 2021-03-23-popular-3k-python/compressed
+    (venv) $ cd 2021-03-23-popular-3k-python/
+    (venv) $ aws s3 cp --recursive s3://softwareheritage/graph/2021-03-23-popular-3k-python/compressed/ compressed

-   (swhenv) ~/t/swh-graph-tests$ swh graph compress --graph swh/graph/tests/dataset/example --outdir output/

-   [...]
+You can also retrieve larger graphs, but note that these graphs are generally
+intended to be loaded fully in RAM, and do not fit on ordinary desktop
+machines. The server we use in production to run the graph service has more
+than 700 GiB of RAM. These memory considerations are discussed in more details
+in :ref:`swh-graph-memory`.
--- a/docs/quickstart.rst
+++ b/docs/quickstart.rst
-to install it if you want to hack the code or install it from this git
-repository. To compress a graph, you will need zstd_ compression tools.
-
-It is highly recommended to install this package in a virtualenv.
-
-On a Debian stable (buster) system:
-
-.. code:: bash
-
-   $ sudo apt install python3-virtualenv default-jre zstd
+JRE. On a Debian system:

+.. code:: console

-.. _zstd: https://facebook.github.io/zstd/
+   $ sudo apt install python3 python3-venv default-jre
--- a/docs/quickstart.rst
+++ b/docs/quickstart.rst
-
-Install
-------
+Installing swh.graph
+--------------------

 Create a virtualenv and activate it:

-.. code:: bash
+.. code:: console

-   ~/tmp$ mkdir swh-graph-tests
-   ~/tmp$ cd swh-graph-tests
-   ~/t/swh-graph-tests$ virtualenv swhenv
-   ~/t/swh-graph-tests$ . swhenv/bin/activate
+   $ python3 -m venv .venv
--- a/docs/quickstart.rst
+++ b/docs/quickstart.rst
+.. code:: console

-   ~/tmp$ mkdir swh-graph-tests
-   ~/tmp$ cd swh-graph-tests
-   ~/t/swh-graph-tests$ virtualenv swhenv
-   ~/t/swh-graph-tests$ . swhenv/bin/activate
+   $ python3 -m venv .venv
+   $ source .venv/bin/activate

 Install the ``swh.graph`` python package:

-.. code:: bash
+.. code:: console

-   (swhenv) ~/t/swh-graph-tests$ pip install swh.graph
+   (venv) $ pip install swh.graph
--- a/docs/quickstart.rst
+++ b/docs/quickstart.rst
-.. code:: bash
+In our example:
+
+.. code:: console

-   (swhenv) ~/t/swh-graph-tests$ swh graph rpc-serve -g output/example
-   Loading graph output/example ...
+   (venv) $ swh graph rpc-serve -g compressed/graph
+   Loading graph compressed/graph ...
   Graph loaded.
   ======== Running on http://0.0.0.0:5009 ========
   (Press CTRL+C to quit)

 From there you can use this endpoint to query the compressed graph, for example
-with httpie_ (``sudo apt install``) from another terminal:
+with httpie_ (``sudo apt install httpie``):
--- a/docs/quickstart.rst
+++ b/docs/quickstart.rst
+queried by the ``swh-graph`` library.

-Own datasets
-^^^^^^^^^^^^
+All the publicly available datasets are documented on this page:
+https://docs.softwareheritage.org/devel/swh-dataset/graph/dataset.html

-A graph is described as both its adjacency list and the set of nodes
-identifiers in plain text format. Such graph example can be found in the
-``swh/graph/tests/dataset/`` folder.
+A good way of retrieving these datasets is to use the `AWS S3 CLI
+<https://docs.aws.amazon.com/cli/latest/reference/s3/>`_.

-You can compress the example graph on the command line like this:
+Here is an example with the dataset ``2021-03-23-popular-3k-python``, which has
+a relatively reasonable size (~15 GiB including property data, with
--- a/docs/api.rst
+++ b/docs/api.rst
                "avg": 0.6107127825377487
            }
        }
+
+
+Use-case examples
+-----------------
+
+This section showcases how to leverage the endpoints of the HTTP API described
--- a/docs/java-api.rst 0 → 100644
+++ b/docs/java-api.rst 0 → 100644
+`ImmutableGraph
+<https://webgraph.di.unimi.it/docs/it/unimi/dsi/webgraph/ImmutableGraph.html>`_,
+the abstract class providing the core API to manipulate and iterate on graphs.
+Under the hood, compressed graphs are stored as the `BVGraph
+<https://webgraph.di.unimi.it/docs/it/unimi/dsi/webgraph/BVGraph.html>`_
+class, which contains the actual codec used to compress and decompress
+adjacency lists.
+
+Graphs **nodes** are mapped to a contiguous set of integers :math:`[0, n - 1]`
+where *n* is the total number of nodes in the graph.
+Each node has an associated *adjacency list*, i.e., a list of nodes going from
+that source node to a destination node. This list represents the **edges** (or
+**arcs**) of the graph.
+
+**Note**: edges are always directed. Undirected graphs are internally stored as
+a pair of directed edges (src → dst, dst → src), and are called "symmetric"
--- a/docs/java-api.rst 0 → 100644
+++ b/docs/java-api.rst 0 → 100644
+**Note**: edges are always directed. Undirected graphs are internally stored as
+a pair of directed edges (src → dst, dst → src), and are called "symmetric"
+graphs.
+
+On disk, a simple BVGraph with the basename ``graph`` would be represented as
+the following set of files:
+
+- ``graph.graph``: contains the compressed adjacency lists of each node, which
+  can be decompressed by the BVGraph codec.
+- ``graph.properties``: contains metadata on the graph, such as the number of
+  nodes and arcs, as well as additional loading information needed by the
+  BVGraph codec.
+- ``graph.offsets``: a list of offsets of where the adjacency list of each node
+  is stored in the main graph file.
+- ``graph.obl``: optionally, an "offset big-list file" which can be used to
+  load graphs faster.