Skip to content
Snippets Groups Projects
Verified Commit 1e0a50e4 authored by Vincent Sellier's avatar Vincent Sellier
Browse files

sysadm/cassandra: Document how to install a new server

Related to swh/infra/sysadm-environment#5386
parent 4be5f8bf
No related branches found
No related tags found
No related merge requests found
Pipeline #10397 passed
......@@ -42,20 +42,198 @@ In the `common/cassandra.yaml` file, declare the node configuration:
System installation
-------------------
- Install the node with a debian bullseye distribution
- Install zfs and configure the pools according to the instances that will run on the node.
Based on the usual cassandra server swh uses:
- Configure the ipxe configuration for the new server (follow :ref:`server_architecture_install_physical`)
without running puppet to avoid the zfs configuration if declared
- Perform a low level nvme disk format to use a lbf format of 4096b
- one pool for the commitlogs using a fast write intensive disk
- one or several pools with the mixeduse disks
- for each nvme disk, execute:
- If the server name starts with `cassandra[0-9]+`, puppet will install all the necessary
packages and the configured instances.
.. code-block:: shell
.. warning:: The services are just enabled, aka puppet doesn't force the service start. It's done
on purpose to let the system administrator control the restarts of the instances
# apt update
# apt install nvme
# # for each disk
# # nvme id-ns -H /dev/nvme0n1| grep LBA
[3:0] : 0x1 Current LBA Format Selected
[0:0] : 0x1 Metadata as Part of Extended Data LBA Supported
LBA Format 0 : Metadata Size: 0 bytes - Data Size: 512 bytes - Relative Performance: 0x1 Better (in use)
LBA Format 1 : Metadata Size: 0 bytes - Data Size: 4096 bytes - Relative Performance: 0 Best <-- we want to use this one
LBA Format 2 : Metadata Size: 8 bytes - Data Size: 512 bytes - Relative Performance: 0x3 Degraded
LBA Format 3 : Metadata Size: 8 bytes - Data Size: 4096 bytes - Relative Performance: 0x2 Good
LBA Format 4 : Metadata Size: 64 bytes - Data Size: 4096 bytes - Relative Performance: 0x3 Degraded
# nvme format -f --lbaf=1 /dev/nvme0n1
Success formatting namespace:1
- Check the configuration looks correct and start the instance(s) with `systemctl start cassandra@<instance>`
- Launch puppet
.. code-block:: shell
# puppet agent --vardir /var/lib/puppet --server pergamon.internal.softwareheritage.org -t
.. warning::
Do not restart the server without disabling the `cassandra@instance1` service or cassandra
will start after the reboot without zfs configured
- Disable cassandra to avoid any issue in case of restart
.. code-block:: shell
# systemctl disable cassandra@instance1
- Create the zfs pool and datasets
.. note::
Always use the WWN (World Wide Name) of the device to be sure it will never change
.. code-block:: shell
# # get the wwmn name
# ls -al /dev/disk/by-id/nvme-*
#
# # Load the zfs module (only if the server was not restarted after initial puppet run)
# modprobe zfs
#
# # Create the zfs pool(s)
# zpool create -o ashift=12 -O atime=off -O relatime=on -O mountpoint=none -O compression=off \
mixeduse \
nvme-XX nvme-XY nvme-XZ nvme-YX
# # Only if the server has a write intensive disk for the commit log
# zpool create -o ashift=12 -O atime=off -O relatime=on -O mountpoint=none -O compression=off \
writeintensive \
nvme-XX
#
# Create the zfs datasets
# zfs create -o mountpoint=/srv/cassandra/instance1/data mixeduse/cassandra-instance1-data
# # Change the pool to writeintensive if the server has a dedicated disk for the commit logs
# zfs create -o mountpoint=/srv/cassandra/instance1/commitlog mixeduse/cassandra-instance1-commitlog
#
# # Reboot the server to ensure everything is correct
# reboot
#
# # Check the zfs configuration after the reboot
# zpool status
# zfs list
- Ensure the zfs dataset permissions are correct
.. code-block:: shell
# chown cassandra: /srv/cassandra/instance1/{data,commitlog}
- Start cassandra
.. code-block:: shell
# systemctl enable cassandra@instance1
# systemctl start cassandra@instance1
.. note::
During the first start, cassandra will bootstrap the new node with the data it must manage.
It usually take around 12 hours to finish
- Check everything is Ok
- On any node of the cluster
.. code-block:: shell
$ % /opt/cassandra/bin/nodetool -u cassandra --password [redacted] status -r
Datacenter: sesi_rocquencourt
=============================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns (effective) Host ID Rack
UN cassandra04.internal.softwareheritage.org 9.91 TiB 16 27.4% 9c618479-7898-4d89-a8e0-dc1a23fce04e rack1
UN cassandra01.internal.softwareheritage.org 10 TiB 16 27.5% cb0695ee-b7f1-4b31-ba5e-9ed7a068d993 rack1
UN cassandra06.internal.softwareheritage.org 10.12 TiB 16 27.4% 557341c9-dc0c-4a37-99b3-bc71fb46b29c rack1
UN cassandra08.internal.softwareheritage.org 10.02 TiB 16 27.2% 247cd9e3-a70c-465c-bca1-ea9d3af9609a rack1
UN cassandra03.internal.softwareheritage.org 10.01 TiB 16 27.0% 4cc44367-67dc-41ea-accf-4ef8335eabad rack1
UN cassandra11.internal.softwareheritage.org 8.94 TiB 16 27.2% 1199974f-9f03-4cc8-8d63-36676d00d53f rack1
UN cassandra10.internal.softwareheritage.org 10.03 TiB 16 27.4% f39713c4-d78e-4306-91dd-25a8b276b868 rack1
UN cassandra05.internal.softwareheritage.org 9.99 TiB 16 26.8% ac5e4446-9b26-43e4-8203-b05cb34f2c35 rack1
UN cassandra09.internal.softwareheritage.org 9.92 TiB 16 27.4% e635af9a-3707-4084-b310-8cde61647a6e rack1
UJ cassandra12.internal.softwareheritage.org 22.01 GiB 16 ? 563d9f83-7ab4-41a2-95ff-d6f2bfb3d8ba rack1
UN cassandra02.internal.softwareheritage.org 9.75 TiB 16 27.6% a3c89490-ee69-449a-acb1-c2aa6b3d6c71 rack1
UN cassandra07.internal.softwareheritage.org 9.94 TiB 16 27.3% 0b7b2a1f-1403-48a8-abe1-65734cc02622 rack1
The new node appears with a status `UJ` Up and Joining
- On the new node, the bootstrap progressing can be checked with
.. code-block:: shell
$ /opt/cassandra/bin/nodetool -u cassandra --password [REDACTED] netstats -H | grep -v 100%
Mode: JOINING
Bootstrap 9af73f50-5f97-11ef-88d7-57efd8d208be
/192.168.100.191
Receiving 1206 files, 566.42 GiB total. Already received 37 files (3.07%), 80.61 GiB total (14.23%)
/192.168.100.189
Receiving 756 files, 647.48 GiB total. Already received 65 files (8.60%), 90.85 GiB total (14.03%)
/192.168.100.186
Receiving 731 files, 811.57 GiB total. Already received 35 files (4.79%), 76.18 GiB total (9.39%)
swh/directory_entry-7 253477270/8750624313 bytes (2%) received from idx:0/192.168.100.186
/192.168.100.183
Receiving 730 files, 658.71 GiB total. Already received 43 files (5.89%), 83.18 GiB total (12.63%)
swh/directory_entry-7 17988974073/19482031143 bytes (92%) received from idx:0/192.168.100.183
/192.168.100.185
Receiving 622 files, 477.56 GiB total. Already received 36 files (5.79%), 81.96 GiB total (17.16%)
swh/directory_entry-8 2812190730/12861515323 bytes (21%) received from idx:0/192.168.100.185
/192.168.100.181
Receiving 640 files, 679.54 GiB total. Already received 38 files (5.94%), 84.17 GiB total (12.39%)
/192.168.100.184
Receiving 743 files, 813.96 GiB total. Already received 42 files (5.65%), 93.4 GiB total (11.47%)
swh/directory_entry-5 13940867674/15691104673 bytes (88%) received from idx:0/192.168.100.184
/192.168.100.190
Receiving 804 files, 792.49 GiB total. Already received 69 files (8.58%), 95.88 GiB total (12.10%)
swh/directory_entry-11 2315131981/3494406702 bytes (66%) received from idx:0/192.168.100.190
/192.168.100.188
Receiving 741 files, 706.3 GiB total. Already received 43 files (5.80%), 82.24 GiB total (11.64%)
swh/directory_entry-6 6478486533/17721982774 bytes (36%) received from idx:0/192.168.100.188
/192.168.100.182
Receiving 685 files, 623.98 GiB total. Already received 38 files (5.55%), 77.86 GiB total (12.48%)
swh/directory_entry-6 9007635102/12045552338 bytes (74%) received from idx:0/192.168.100.182
/192.168.100.187
Receiving 638 files, 706.2 GiB total. Already received 41 files (6.43%), 83.17 GiB total (11.78%)
swh/directory_entry-6 1508815317/6276710418 bytes (24%) received from idx:0/192.168.100.187
Read Repair Statistics:
Attempted: 0
Mismatch (Blocking): 0
Mismatch (Background): 0
Pool Name Active Pending Completed Dropped
Large messages n/a 0 0 0
Small messages n/a 0 5134236 0
- New node declaration
- To activate the monitoring, declare the node in the monitoring endpoints in
`swh-charts/cluster-components/values/archive-production-rke2.yaml` for production.
In the section `scrapeExternalMetrics.cassandra.ips`, add the ip of the new server.
- Add the node in the list of seeds in `swh-charts/swh/values/production/default.yaml`
for a production node. Add it in the `cassandraSeeds` list.
- Cleanup of the old nodes
After the new node is bootstrapped, the old nodes are not automatically cleaned and continue
to host the data migrated to the new host. To free the space, the cleanup operation must but
launched manually on all the pre-existing nodes.
.. note::
If several new node must be added in the same batch, the cleanup operation can be done after
all the new nodes were added and bootstrapped. It will avoid to clean each old node after each new
node bootstrap.
.. note::
The cleanup operation can be started in several nodes in parallel without any problem. Just check
carefully in the monitoring if the load of the cluster is not too important.
.. code-block:: shell
$ # Run this on each node except the last one added
$ /opt/cassandra/bin/nodetool -u cassandra --password [REDACTED] cleanup -j 0
Cassandra configuration
-----------------------
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment