Compare revisions

bf2aa56d · bf2aa56d · bf2aa56d · bf2aa56d · bf2aa56d · bf2aa56d
--- a/docs/devel/roadmap/roadmap-2024.rst
+++ b/docs/devel/roadmap/roadmap-2024.rst
--- a/docs/devel/services.txt
+++ b/docs/devel/services.txt
@@ -15,6 +15,7 @@ swh-search: 5010
 swh-counters: 5011
 swh-winery: 5012
 swh-graphql: 5013
+swh-provenance: 5014

 # Edit this line with your nick to get a merge conflict if there's an overlap
-# Next available ID (jayeshv) : 5014
+# Next available ID (ardumont) : 5015
--- a/docs/devel/tutorials/add-new-package.rst
+++ b/docs/devel/tutorials/add-new-package.rst
@@ -15,65 +15,29 @@ project to the documentation.
 Create a project
 ----------------

-Creating the project should be done using the ``gitlab`` command-line tool
-provided by the `python-gitlab <https://python-gitlab.readthedocs.io/>`_ module.
-Make sure to have the configuration working and an access token with ``api`` as
-scope.
-
-To create the project:
-
-.. code:: bash
-
-   PROJECT_NAME=swh-foo
-   DESCRIPTION="Software Heritage Foo management library"
-   NAMESPACE_ID="$(gitlab --output json namespace get --id 'swh/devel' | jq .id)"
-   gitlab project create \
-       --name "$PROJECT_NAME" \
-       --path "$PROJECT_NAME" \
-       --namespace "$NAMESPACE_ID" \
-       --description "$DESCRIPTION" \
-       --issues-access-level enabled \
-       --auto-devops-enabled false \
-       --wiki-access-level disabled \
-       --requirements-access-level disabled \
-       --pages-access-level disabled \
-       --operations-access-level disabled \
-       --container-registry-access-level disabled \
-       --visibility public
-
-Initialize the repository with our template
-------------------------------------------
+Creating the project from swh-py-template_ can be done using the
+``bin/init-py-repo`` tool. This script uses the ``gitlab`` command-line tool
+provided by the `python-gitlab <https://python-gitlab.readthedocs.io/>`_
+module. Before running ``init-py-repo``, please make sure that the ``gitlab``
+command is working and configured with an access token of scope ``api``.

 The following commands need to run from the base directory
 ``swh-environment``.

-1. Clone the new repository:
-
-   .. code:: bash
-
-      git clone https://gitlab.softwareheritage.org/swh/devel/swh-foo.git
+1. Use ``bin/init-py-repo`` to initialize the repository with a project
+   template and create the corresponding gitlab project:

-2. Use ``bin/init-py-repo`` to initialize the repository with a project template:
-
-   .. code:: bash
+   .. code-block:: console

+      pip install -r requirements.txt
      bin/init-py-repo swh-foo

-3. Install the pre-commit hook:
+2. Install the pre-commit hook:

-   .. code:: bash
+   .. code-block:: console

      pre-commit install

-Customize the template
----------------------
-
-Now look for the string ``foo`` in all files and file names and replace it with
-the name of the new package. Push these commits directly to the repository as
-initial content.
-
-For an example, you can see `what was done for swh-counters <https://gitlab.softwareheritage.org/swh/devel/swh-counters/-/commit/142fff84305b>`__.
-
 Add the repo on the swh-environment project
 -------------------------------------------

@@ -99,13 +63,25 @@ Install CI jobs
   ``swh-jenkins-jobs`` repository. See `Jenkins documentation <ci_jenkins>`_
   for details.

+Hack hack hack
+--------------
+
+The generated project should have everything needed to start hacking in. You
+should typically start with:
+
+- fill the README file
+- write some code in ``swh/foo``
+- write tests in ``swh/foo/tests``
+- add yourself in ``CONTRIBUTORS`` if needed
+- add some sphinx documentation in ``docs``
+
 Make an initial release
 -----------------------

 Releases are made automatically by Jenkins when a tag is pushed to a module
 repository. Making an initial release is thus done by doing:

-.. code:: bash
+.. code-block:: console

   git tag v0.0.0
   git push origin --tags v0.0.0
@@ -132,7 +108,7 @@ To add a new module to the documentation:
 - Add the package with a concise description to the index of the development part, located in
  ``docs/devel/index.rst``.

-  ::
+  .. code-block:: rst

     :ref:`swh.foo <swh-foo>`
         short description of the repository
@@ -143,5 +119,6 @@ To add a new module to the documentation:


 .. _`Continuous Integration (CI)`: https://jenkins.softwareheritage.org
+.. _swh-py-template: https://gitlab.softwareheritage.org/swh/devel/swh-py-template
 .. _swh-jenkins-jobs: https://gitlab.softwareheritage.org/swh/infra/ci-cd/swh-jenkins-jobs
 .. _swh-docs: https://gitlab.softwareheritage.org/swh/devel/swh-docs
--- a/docs/index.rst
+++ b/docs/index.rst
@@ -280,7 +280,7 @@ Search, browse and reference
   code <https://www.softwareheritage.org/howto-archive-and-reference-your-code/>`__
 -  `Make your code identifiable : get a PID for your source
   code <https://annex.softwareheritage.org/public/tutorials/getswhid_dir.gif>`__
-  `Choosing what type of Software Heritage Identifier (SWHID) to
+-  `Choosing what type of Software Hash Identifier (SWHID) to
   use <devel/swh-model/persistent-identifiers.html#choosing-what-type-of-swhid-to-use>`__
 -  `Navigating through Software Heritage: behind the
   scenes <https://www.softwareheritage.org/2019/05/28/mining-software-metadata-for-80-m-projects-and-even-more/>`__
@@ -321,6 +321,7 @@ Ambassador program
 -  `Ambassadors mailing list <https://sympa.inria.fr/sympa/info/swh-ambassadors>`__
 -  `Outreach material (only available to ambassadors) <https://www.softwareheritage.org/ambassador-material/>`__
 -  `Outreach material on a Git repository <https://github.com/moranegg/swh-ambassadors/tree/main/Materials>`__
+-  `Questions Frequently Asked to ambassadors <https://gitlab.softwareheritage.org/outreach/swh-academy/swh-faq>`__

 Presentations
 -------------
@@ -365,7 +366,7 @@ Data model and identifiers

 -  `Our data
   model <devel/swh-model/data-model.html#data-model>`__
-  :ref:`Software Heritage IDentifiers
+-  :ref:`Software Hash IDentifiers
   (SWHID) <persistent-identifiers>` specifications
 -  Compute a SWHID locally using the `swh identify <devel/swh-model/cli.html>`__ command-line tool.

@@ -437,3 +438,4 @@ Table of contents
   devel/api-reference
   user/index
   sysadm/index
+   About this documentation project <README>
--- a/docs/software-origins-support.yml
+++ b/docs/software-origins-support.yml
--- a/docs/sysadm/data-silos/cassandra/debian-upgrade.rst
+++ b/docs/sysadm/data-silos/cassandra/debian-upgrade.rst
+.. _upgrade-debian-cassandra-cluster:
+
+Upgrade Procedure for Debian Nodes in a Cassandra Cluster
+=========================================================
+
+.. admonition:: Intended audience
+   :class: important
+
+   sysadm staff members
+
+Purpose
+--------
+
+This page documents the steps to upgrade Debian nodes running in a Cassandra
+cluster. The upgrade process involves various commands and checks before and
+after rebooting the node.
+
+Prerequisites
+-------------
+
+ Familiarity with SSH and CLI-based command execution
+ Out-of-band Access to the node (IDRAC/ILO) for reboot
+ Access to the node through SSH (requires the vpn)
+
+Step 0: Initial Steps
+---------------------
+
+Ensure the out of band access to the machine is ok. This definitely helps when
+something goes wrong during a reboot (disk order or names change, network,
+...).
+
+Step 1: Migrate to the next debian suite
+----------------------------------------
+
+Update the Debian version of the node (e.g. bullseye to bookworm) using the
+following command:
+
+.. code::
+
+   root@node:~# /usr/local/bin/migrate-to-${NEXT_CODENAME}.sh
+
+Note: The script should be present on the machine (installed through puppet).
+
+Step 2: Run Puppet Agent
+-------------------------
+
+Once the upgrade procedure happened, run the puppet agent to apply any necessary
+configuration changes (e.g. /etc/apt/sources.list change, etc...)
+
+.. code::
+
+   root@node:~# puppet agent -t
+
+Step 3: Stop Puppet Agent
+-------------------------
+
+As we will stop the service, we don't want the agent to start it back again.
+
+.. code::
+
+   root@node:~# puppet agent --disable "Ongoing debian upgrade"
+
+Step 4: Autoremove and Purge
+-----------------------------
+
+Perform autoremove to remove unnecessary packages left-over from the migration:
+
+.. code::
+
+   root@node:~# apt autoremove
+
+Step 5: Stop the cassandra service
+----------------------------------
+
+The cluster can support one non-responding node so it's ok to stop the
+service.
+
+.. code-block:: shell
+
+   $ nodetool drain
+
+
+Lookup for the '- DRAINED' pattern in the service log to know it's done.
+
+.. code-block:: shell
+
+   $ journalctl -e cassandra@instance1 | grep DRAINED
+   Nov 27 14:09:06 cassandra01 cassandra[769383]: INFO  [RMI TCP Connection(20949)-192.168.100.181] 2024-11-27 14:09:06,084 StorageService.java:1635 - DRAINED
+
+
+Then stop the cassandra service.
+
+.. code-block:: shell
+
+    $ systemctl stop cassandra@instance1
+
+
+In the output of the ``nodetool status``, the node whose service is stopped
+should be marked as DN (``Down and Normal``):
+
+   $ nodetool -h cassandra02 status -r | grep DN
+   DN  cassandra01.internal.softwareheritage.org  8.63 TiB  16      22.7%             cb0695ee-b7f1-4b31-ba5e-9ed7a068d993  rack1
+
+Step 6: Reboot the Node
+------------------------
+
+We are finally ready to reboot the node, so just do it:
+
+.. code::
+
+   root@node:~# reboot
+
+You can connect to the serial console of the machine to follow through the
+reboot.
+
+Step 7: Clean up some more
+--------------------------
+
+Once the machine is restarted, some cleanup might be necessary.
+
+.. code::
+
+   root@node:~# apt autopurge
+
+Step 8: Activate puppet agent
+-----------------------------
+
+Activate back the puppet agent and make it run. This will start back the
+cassandra service again.
+
+.. code::
+
+   root@node:~# puppet agent --enable && puppet agent --test
+
+Post cluster migration
+----------------------
+
+Once all the nodes of the cluster have been migrated:
+
+- Remove the argocd sync window so the cluster is back to nominal state.
+- Enable back the Rancher etcd snapshots.
+- Check the `holderIdentity` value in `rke2` and `rke2-lease` leases and configmaps.
--- a/docs/sysadm/data-silos/cassandra/index.rst
+++ b/docs/sysadm/data-silos/cassandra/index.rst
@@ -8,3 +8,5 @@ Cassandra

 .. toctree::
   installation
+   upgrade
+   debian-upgrade
--- a/docs/sysadm/data-silos/cassandra/installation.rst
+++ b/docs/sysadm/data-silos/cassandra/installation.rst
--- a/docs/sysadm/data-silos/cassandra/upgrade.rst
+++ b/docs/sysadm/data-silos/cassandra/upgrade.rst
+.. _cassandra_upgrade_cluster:
+
+How to upgrade a cassandra cluster
+==================================
+
+.. admonition:: Intended audience
+   :class: important
+
+   sysadm staff members
+
+
+This page document the actions to `upgrade an online cassandra
+cluster <https://docs.datastax.com/en/luna-cassandra/guides/upgrade/overview.html>`_. The
+overall plan is to upgrade each node of the cluster one at a time, in a rolling upgrade
+fashion.
+
+There are two ways to manage this upgrade procedure, either
+:ref:`manually <manual_cassandra_upgrade>` or :ref:`automatically <automatic_cassandra_upgrade>`.
+
+As our (static) cassandra clusters are managed through puppet. This implies we'll have
+some adaptations to do in the swh-site repository. Since our puppet manifest does not
+manage the restart of the service, it's ok to let puppet apply the changes in advance.
+
+Then identify the desired new version and retrieve its sha512 hash.
+
+https://archive.apache.org/dist/cassandra/4.0.15/apache-cassandra-4.0.15-bin.tar.gz
+https://archive.apache.org/dist/cassandra/4.0.15/apache-cassandra-4.0.15-bin.tar.gz.sha512
+
+Read the changelog just in case some extra actions are required for the upgrade.
+
+In the swh-site repository, adapt the environment's common.yaml file with
+those values:
+
+.. code-block:: yaml
+
+    $ echo $environment
+    staging
+    $ grep "cassandra::" .../swh-site/data/deployments/$environment/common.yaml
+    cassandra::version: 4.0.15
+    cassandra::version_checksum: 9368639fe07613995fec2d50de13ba5b4a2d02e3da628daa1a3165aa009e356295d7f7aefde0dedaab385e9752755af8385679dd5f919902454df29114a3fcc0
+
+Commit and push the changes.
+
+Connect to pergamon and deploy those changes.
+
+.. admonition:: Stop all repair jobs before upgrading
+   :class: warning
+
+   | All scheduled jobs must be paused and all running jobs must be stopped and aborted.
+   | You can perform these actions from the web UI `reaper <https://cassandra-reaper.io/docs/>`_.
+   | - `Reaper production <https://reaper.internal.softwareheritage.org/webui/login.html>`_
+   | - `Reaper staging <https://reaper.internal.staging.swh.network/webui/login.html>`_
+
+.. admonition:: Grafana tag
+   :class: Note
+
+   Set a Grafana tag to mark the start of the upgrade.
+
+.. _manual_cassandra_upgrade:
+
+Manual procedure
+----------------
+
+Then connect on each machine of the cluster in any order (lexicographic order
+is fine though).
+
+We'll need the nodetool access, so here is a simple alias to simplify the
+commands (used for the remaining part of the doc).
+
+.. code-block:: shell
+
+   $ USER=$(awk '{print $1}' /etc/cassandra/jmxremote.password)
+   $ PASS=$(awk '{print $2}' /etc/cassandra/jmxremote.password)
+   $ alias nodetool="/opt/cassandra/bin/nodetool --username $USER --password $PASS"
+
+
+From another node in the cluster, connect and check the status of the cluster
+is fine during the migration.
+
+.. code-block:: shell
+
+   $ period=10; while true; do \
+       date; nodetool status -r; echo; nodetool netstats; sleep $period; \
+     done
+
+
+Let's do a drain call first so the commitlog is flushed on disk sstables. It's
+recommended to do it before an upgrade to avoid any pending data in the commit log.
+
+.. code-block:: shell
+
+   $ nodetool drain
+
+
+Lookup for the '- DRAINED' pattern in the service log to know it's done.
+
+.. code-block:: shell
+
+   $ journalctl -e cassandra@instance1 | grep DRAINED
+   Nov 27 14:09:06 cassandra01 cassandra[769383]: INFO  [RMI TCP Connection(20949)-192.168.100.181] 2024-11-27 14:09:06,084 StorageService.java:1635 - DRAINED
+
+
+We stop the cassandra service.
+
+.. code-block:: shell
+
+    $ systemctl stop cassandra@instance1
+
+
+In the output of the ``nodetool status``, the node whose service is stopped
+should be marked as DN (Down and Normal):
+
+   $ nodetool -h cassandra02 status -r | grep DN
+   DN  cassandra01.internal.softwareheritage.org  8.63 TiB  16      22.7%             cb0695ee-b7f1-4b31-ba5e-9ed7a068d993  rack1
+
+
+Finally we upgrade cassandra version in the node (through puppet):
+
+.. code-block:: shell
+
+    $ puppet agent --enable && puppet agent --test
+
+Let's check the correct version is installed in /opt
+
+.. code-block:: shell
+
+   $ ls -lah /opt/ | grep cassandra-$version
+   lrwxrwxrwx  1 root root   21 Nov 27 14:13 cassandra -> /opt/cassandra-$version
+   drwxr-xr-x  8 root root 4.0K Nov 27 14:13 cassandra-$version
+
+
+Now start back the cassandra service.
+
+.. code-block:: shell
+
+    $ systemctl start cassandra@instance1
+
+Once the service is started again, the ``nodetool status`` should display an
+`UN` (Up and Normal) status again for the node upgraded.
+
+   $ nodetool status -r
+   ...
+   UN  cassandra01.internal.softwareheritage.org  8.63 TiB  16      22.7%             cb0695ee-b7f1-4b31-ba5e-9ed7a068d993  rack1
+
+.. _automatic_cassandra_upgrade:
+
+Automatic procedure
+-------------------
+
+It's the same procedure as previously described but only one call to a script in
+pergamon is required.
+
+With environment in {staging, production}:
+
+.. code-block:: shell
+
+   root@pergamon:~# /usr/local/bin/cassandra-restart-cluster.sh $environment
+
+Note that you can also use the previously described checks procedure from a cluster node
+to follow through the upgrade.
+
+
+.. _cassandra_upgrade_checks:
+
+Final Checks
+------------
+
+Finally, check the version is the expected one.
+
+.. code-block:: shell
+
+   $ nodetool version
+   ReleaseVersion: $version
+
+   $ nodetool describecluster
+   Cluster Information:
+           Name: archive_staging
+           Snitch: org.apache.cassandra.locator.GossipingPropertyFileSnitch
+           DynamicEndPointSnitch: enabled
+           Partitioner: org.apache.cassandra.dht.Murmur3Partitioner
+           Schema versions:
+                   583470c4-6dae-372d-bdab-f0bcbd679c74: [192.168.130.181, 192.168.130.182, 192.168.130.183]
+
+   Stats for all nodes:
+           Live: 3
+           Joining: 0
+           Moving: 0
+           Leaving: 0
+           Unreachable: 0
+
+   Data Centers:
+           sesi_rocquencourt_staging #Nodes: 3 #Down: 0
+
+   Database versions:
+           5.0.2: [192.168.130.181:7000, 192.168.130.182:7000, 192.168.130.183:7000]
+
+   Keyspaces:
+           swh -> Replication class: NetworkTopologyStrategy {sesi_rocquencourt_staging=3}
+           system_distributed -> Replication class: NetworkTopologyStrategy {replication_factor=3}
+           provenance_test -> Replication class: NetworkTopologyStrategy {sesi_rocquencourt_staging=3}
+           reaper_db -> Replication class: NetworkTopologyStrategy {sesi_rocquencourt_staging=3}
+           system_traces -> Replication class: SimpleStrategy {replication_factor=2}
+           system_auth -> Replication class: NetworkTopologyStrategy {sesi_rocquencourt_staging=3}
+           system_schema -> Replication class: LocalStrategy {}
+           system -> Replication class: LocalStrategy {}
+
+.. admonition:: Upgrading to a major version
+   :class: warning
+
+   | When updating to a major version, you need to run ``nodetool upgradesstables``.
+   | You can perform this command manually on each node or use a script from `pergamon`.
+   | With environment in {staging, production}:
+
+   .. code-block:: shell
+
+      root@pergamon:~# /usr/local/bin/cassandra-upgradesstables.sh $environment
\ No newline at end of file
--- a/docs/sysadm/data-silos/elasticsearch/debian-upgrade.rst
+++ b/docs/sysadm/data-silos/elasticsearch/debian-upgrade.rst
--- a/docs/sysadm/data-silos/elasticsearch/index.rst
+++ b/docs/sysadm/data-silos/elasticsearch/index.rst
@@ -3,5 +3,10 @@
 ElasticSearch
 =============

+.. toctree::
+   :titlesonly:
+
+   debian-upgrade
+
 .. todo::
   This page is a work in progress.
--- a/docs/sysadm/data-silos/index.rst
+++ b/docs/sysadm/data-silos/index.rst
@@ -8,3 +8,5 @@ Data silos
   cassandra/index
   kafka/index
   elasticsearch/index
+   winery/index
+   rancher/index
--- a/docs/sysadm/data-silos/kafka/debian-upgrade.rst
+++ b/docs/sysadm/data-silos/kafka/debian-upgrade.rst
--- a/docs/sysadm/data-silos/kafka/index.rst
+++ b/docs/sysadm/data-silos/kafka/index.rst
@@ -7,6 +7,7 @@ Kafka
   :titlesonly:

   manage-topics
+   debian-upgrade

 .. todo::
   This page is a work in progress.
--- a/docs/sysadm/data-silos/postgresql/howto-connect.rst
+++ b/docs/sysadm/data-silos/postgresql/howto-connect.rst
--- a/docs/sysadm/data-silos/rancher/debian-upgrade.rst
+++ b/docs/sysadm/data-silos/rancher/debian-upgrade.rst
--- a/docs/user/listers/rubygems.rst
+++ b/docs/user/listers/rubygems.rst
-.. _rubygems_lister:
+.. _cluster-rancher:

-RubyGems lister
+Cluster Rancher
 ===============

+.. toctree::
+   :titlesonly:
+
+   debian-upgrade
+
 .. todo::
   This page is a work in progress.
+
--- a/docs/sysadm/data-silos/winery/ceph.rst
+++ b/docs/sysadm/data-silos/winery/ceph.rst
+.. _winery-ceph:
+
+Winery Ceph Architecture
+========================
+
+.. admonition:: Intended audience
+   :class: important
+
+   sysadm staff members
+
+.. todo::
+   This page is a work in progress.
--- a/docs/sysadm/data-silos/winery/frontends.rst
+++ b/docs/sysadm/data-silos/winery/frontends.rst
+.. _winery-frontends:
+
+Winery Frontends Documentation
+==============================
+
+.. admonition:: Intended audience
+   :class: important
+
+   sysadm staff members
+
+.. todo::
+   This page is a work in progress.
--- a/docs/sysadm/data-silos/winery/index.rst
+++ b/docs/sysadm/data-silos/winery/index.rst
+.. _winery:
+
+Winery Deployment
+=================
+
+.. admonition:: Intended audience
+   :class: important
+
+   sysadm staff members
+
+This page documents the production deployment of winery and the ceph cluster on the **production**
+environment.
+
+.. toctree::
+   network
+   frontends
+   ceph
+   procedures/index
No results found