Project 'infra/sysadm-environment' was moved to 'swh/infra/sysadm-environment'. Please update any links and bookmarks that may still have the old path.
The gros cluster at Nancy[1] has a lot of nodes(124) with small reservable SSD of 960Go. This can be a good candidate to create the second cluster. It will also allow to check the performance with data (and commit logs) on SSDs.
According to the main cluster, a minimum of 8 nodes are necessary to handle the volume of data (7.3 To and growing). Starting with 10 nodes will allow to have some remaining space.
change the value of the properties endpoint_snitch from SimpleSnitch to GossipingPropertyFileSnitch [2].
The recommanded value for production is GossipingPropertyFileSnitch so it should have been this since the beginning
configure the disk_optimization_strategy to ssd on the new datacenter
update the seed_provider to have one node on each datacenter
restart the datacenter1 nodes to apply the new configuration
start the datacenter2 nodes one by one, wait until the status of the node is UN (Up and Normal) before starting another one (They can be stay in the UJ (joining) state for a couple of minutes)
when done, update the swh keyspace to declare the replication strategy of the second DC
ALTER KEYSPACE swh WITH REPLICATION = {'class' : 'NetworkTopologyStrategy', 'datacenter1' : 3, 'datacenter2': 3};
The replication of the new changes starts here but the full table contents need to be copied
The progression can be monitored with nodetool command:
gros-50:~$ nodetool netstats Mode: NORMAL Rebuild e5e64920-0644-11ec-92a6-31a241f39914 /172.16.97.4 Receiving 199 files, 147926499702 bytes total. Already received 125 files (62.81%), 57339885570 bytes total (38.76%) swh/release-4 1082347/1082347 bytes (100%) received from idx:0/172.16.97.4 swh/content_by_blake2s256-2 3729362955/3729362955 bytes (100%) received from idx:0/172.16.97.4 swh/release-3 224510803/224510803 bytes (100%) received from idx:0/172.16.97.4 swh/content_by_blake2s256-1 240283216/240283216 bytes (100%) received from idx:0/172.16.97.4 swh/content_by_blake2s256-4 29491504/29491504 bytes (100%) received from idx:0/172.16.97.4 swh/release-2 6409474/6409474 bytes (100%) received from idx:0/172.16.97.4 ...Read Repair Statistics: Attempted: 0 Mismatch (Blocking): 0 Mismatch (Background): 0 Pool Name Active Pending Completed Dropped Large messages n/a 0 23 0 Small messages n/a 3 132753939 0 Gossip messages n/a 0 43915 0
or to filter only running transfers:
gros-50:~$ nodetool netstats | grep -v 100%Mode: NORMALRebuild e5e64920-0644-11ec-92a6-31a241f39914 /172.16.97.4 Receiving 199 files, 147926499702 bytes total. Already received 125 files (62.81%), 57557961160 bytes total (38.91%) swh/directory_entry-7 4819168032/4925484261 bytes (97%) received from idx:0/172.16.97.4 /172.16.97.2 Receiving 202 files, 111435975646 bytes total. Already received 139 files (68.81%), 60583670773 bytes total (54.37%) swh/directory_entry-12 1631210003/2906113367 bytes (56%) received from idx:0/172.16.97.2 /172.16.97.6 Receiving 236 files, 186694443984 bytes total. Already received 142 files (60.17%), 58869656747 bytes total (31.53%) swh/snapshot_branch-10 4449235102/7845572885 bytes (56%) received from idx:0/172.16.97.6 /172.16.97.5 Receiving 221 files, 143384473640 bytes total. Already received 132 files (59.73%), 58300913015 bytes total (40.66%) swh/directory_entry-4 982247023/3492851311 bytes (28%) received from idx:0/172.16.97.5Read Repair Statistics:Attempted: 0Mismatch (Blocking): 0Mismatch (Background): 0Pool Name Active Pending Completed DroppedLarge messages n/a 0 23 0Small messages n/a 2 135087921 0Gossip messages n/a 0 44176 0
Under the hood, a prometheus node was also added on the second datacenter. The datacenter1 prometheus node federates the data data. It allows to retrieve [3] all the monitoring data by just probing the datacenter1 prometheus