Skip to content
Snippets Groups Projects
Unverified Commit e63ae66c authored by Antoine R. Dumont's avatar Antoine R. Dumont
Browse files

docs/elasticsearch: Document debian upgrade procedure

Refs. swh/infra/sysadm-environment#5556
parent 665d68a1
No related branches found
No related tags found
No related merge requests found
Pipeline #12932 failed
.. _upgrade-debian-elasticsearch-cluster:
Upgrade Procedure for Debian Nodes in an Elasticsearch Cluster
==============================================================
.. admonition:: Intended audience
:class: important
sysadm staff members
Purpose
--------
This page documents the steps to upgrade Debian nodes running in an Elasticsearch
cluster. The upgrade process involves various commands and checks before and after
rebooting the node.
Prerequisites
-------------
+ Familiarity with SSH and CLI-based command execution
+ Out-of-band Access to the node (IDRAC/ILO) for reboot
+ Access to the node through SSH (requires the vpn)
Step 0: Initial Steps
---------------------
The elasticsearch nodes are only running on bare metal machines. So, ensuring the out of
band access to the machine is ok. This definitely helps when something goes wrong during
a reboot (disk order or names change, network, ...).
Step 1: Migrate to the next debian suite
----------------------------------------
Update the Debian version of the node (e.g. bullseye to bookworm) using the following
command:
.. code::
root@node:~# /usr/local/bin/migrate-to-${NEXT_CODENAME}.sh
Note: The script should be present on the machine (installed through puppet).
Step 2: Run Puppet Agent
-------------------------
Once the upgrade procedure happened, run the puppet agent to apply any necessary
configuration changes (e.g. /etc/apt/sources.list change, etc...)
.. code::
root@node:~# puppet agent -t
Step 2: Stop Puppet Agent
-------------------------
As we will stop the service, we don't want the agent to start it back again.
.. code::
root@node:~# puppet agent --disable "Ongoing debian upgrade"
Step 4: Autoremove and Purge
-----------------------------
Perform autoremove to remove unnecessary packages left-over from the migration.
.. code::
root@node:~# apt autoremove
Step 5: Stop the elasticsearch service
--------------------------------------
The cluster can support one non-responding node so it's ok to stop the service.
We can check the cluster's status which should stay green after the elasticsearch
service is stopped.
.. code::
root@node:~# systemctl stop elasticsearch
root@node:~# curl -s $server/_cluster/health | jq .status
"green"
Note: ``$server`` if of the form ``hostname:9200`` (with hostname another cluster node
than the one we are currently upgrading)
Step 6: Reboot the node
-----------------------
We are ready to reboot the node:
.. code::
root@node:~# reboot
You can connect to the serial console of the machine to follow through the reboot.
Step 7: Clean up some more
--------------------------
Once the machine is restarted, some cleanup might be necessary.
.. code::
root@node:~# apt autopurge
Step 8: Activate puppet agent
-----------------------------
Activate back the puppet agent and make it run. This will start back the elasticsearch
service again.
.. code::
root@node:~# puppet agent --enable && puppet agent --test
Step 8: Join back the cluster
-----------------------------
After the service restarted, check the node joined back the cluster.
.. code::
root@node:~# curl -s $server/_cat/allocation?v\&s=node;
root@node:~# curl -s $server/_cluster/health | jq .number_of_nodes
For example:
.. code::
root@esnode1:~# server=http://esnode1.internal.softwareheritage.org:9200; date; \
curl -s $server/_cat/allocation?v\&s=node; echo; \
curl -s $server/_cluster/health | jq
Wed Jan 29 09:57:01 UTC 2025
shards shards.undesired write_load.forecast disk.indices.forecast disk.indices disk.used disk.avail disk.total disk.percent host ip node node.role
638 0 0.0 5.6tb 5.6tb 5.6tb 1.1tb 6.8tb 82 192.168.100.61 192.168.100.61 esnode1 cdfhilmrstw
634 0 0.0 5.7tb 5.7tb 5.7tb 1tb 6.8tb 84 192.168.100.62 192.168.100.62 esnode2 cdfhilmrstw
639 0 0.0 5.6tb 5.6tb 5.6tb 1.1tb 6.8tb 82 192.168.100.63 192.168.100.63 esnode3 cdfhilmrstw
644 0 0.0 5.6tb 5.6tb 5.6tb 8.2tb 13.8tb 40 192.168.100.64 192.168.100.64 esnode7 cdfhilmrstw
645 0 0.0 5.5tb 5.5tb 5.5tb 5.9tb 11.4tb 48 192.168.100.65 192.168.100.65 esnode8 cdfhilmrstw
666 0 0.0 5.1tb 5.1tb 5.1tb 6.3tb 11.4tb 44 192.168.100.66 192.168.100.66 esnode9 cdfhilmrstw
{
"cluster_name": "swh-logging-prod",
"status": "green",
"timed_out": false,
"number_of_nodes": 6,
"number_of_data_nodes": 6,
"active_primary_shards": 1933,
"active_shards": 3866,
"relocating_shards": 0,
"initializing_shards": 0,
"unassigned_shards": 0,
"delayed_unassigned_shards": 0,
"number_of_pending_tasks": 0,
"number_of_in_flight_fetch": 0,
"task_max_waiting_in_queue_millis": 0,
"active_shards_percent_as_number": 100
}
Post cluster migration
----------------------
As the cluster should stay green all along the migration, there is nothing more to check
(we just did that after each node).
......@@ -3,5 +3,10 @@
ElasticSearch
=============
.. toctree::
:titlesonly:
debian-upgrade
.. todo::
This page is a work in progress.
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment