With the integration of granet in the production k8s cluster, we've decided to
distinguish the "high memory" / "swh-graph" nodes with a separate naming scheme
(-highmemXX instead of -metalXX).
We follow this naming scheme, it would be appropriate to rename rancher-node-metal05
to rancher-node-highmem02.
I believe the following steps should work, once the puppet manifests have been updated
to support the new naming scheme:
Ensure current version of the graph on rancher-node-metal05 is backed up
Disable puppet
swh-charts: Disable previous graph instances running on rancher-node-metal05
Drain node (some other pods was running on it, e.g. deposit-rpc)
Deprovision node from rancher cluster
(optional) Disable/uninstall rke2-agent service
Update inventory reference
(blocked) swh-ipxe: Reinstall os with fqdn rancher-node-highmem02
Reboot
Check machine has new hostname (hostname and hostname -f report the new values)
Run puppet (should provision a new certificate for the new hostname)
Decommission the old node in puppet
Reprovision the node in the rancher cluster
(canary test) Run the same graph version as rancher-node-highmem01
Disable ^
Note:
This node was already renamed without reinstallation so it may have plenty of dangling files from its previous life.
We deemed better to follow through a full reintallation instead.
The current graph data on that node is 2024-03-31 [1] which is already present on the rancher-node-highmem01 [2].
So there is no need to back it up already.
[1]
root@rancher-node-metal05:~# zfs list | grep graphdata/datasets/2024-03-31/compressed 4.86T 26.0T 4.79T /srv/kubernetes/volumes/pvc-1d95c55e-1fe7-4482-9ea0-c044c4dc4ff3_swh-cassandra_graph-20240331-persistent-pvc/2024-03-31/compressed
On the 5th tryout, I fetched the stuff it failed to retrieve and tried to provide them through pergamon but that hanged as well.
As a shot in the dark since it worked with previous distribution, i've tried to test with bullseye but that hangs out the same (expectedly but you never know ;).
So i'm out of ideas beside renaming that machine again.
The plus side of things, ipxe wise, we can declare either distribution we wanna use now... (not that i see any reasons to install an oldstable one but heh).
Nope, multiple tryouts (with variations, net1, new ipxe code, ...) and only failures...
In the end also, i just used the virtual keyboard to enter the one-time boot menu entry instead of changing punctually the setup (which took forever and a half prior to reboot).
[x] 2. Prepare the data/datasets zfs volume with proper defaults options (compression, noatime, ...)
Initially:
root@rancher-node-highmem02:~# zfs get all data/datasets | grep "time\\|xattr\\|compression"data/datasets compression off defaultdata/datasets atime on defaultdata/datasets xattr on defaultdata/datasets relatime off defaultroot@rancher-node-highmem02:~# zfs set compression=zstd data/datasetsroot@rancher-node-highmem02:~# zfs set atime=off data/datasetsroot@rancher-node-highmem02:~# zfs set relatime=on data/datasetsroot@rancher-node-highmem02:~# zfs set xattr=sa data/datasetsroot@rancher-node-highmem02:~# zfs get all data/datasets | grep "time\\|xattr\\|compression"data/datasets compression zstd localdata/datasets atime off localdata/datasets xattr sa localdata/datasets relatime on localroot@rancher-node-highmem02:~# zfs create data/datasets/2024-12-06root@rancher-node-highmem02:~# zfs get all data/datasets/2024-12-06 | grep "time\\|xattr\\|compression"data/datasets/2024-12-06 compression zstd inherited from data/datasetsdata/datasets/2024-12-06 atime off inherited from data/datasetsdata/datasets/2024-12-06 xattr sa inherited from data/datasetsdata/datasets/2024-12-06 relatime on inherited from data/datasets
[x] 3. Prepare ssh connection from one machine to the other (and vice-versa)
root@rancher-node-highmem02:~# ssh root@rancher-node-highmem01 dateMon Jan 27 02:03:41 PM UTC 2025
[x] 4. Transfer the zfs dataset from highmem01 to highmem02 (latest graph)
root@rancher-node-highmem02:~# ssh root@rancher-node-highmem01 zfs send -cvL data/datasets/2024-12-06/compressed@20250127T132857 | zfs receive data/datasets/2024-12-06/compressedfull send of data/datasets/2024-12-06/compressed@20250127T132857 estimated size is 4.96Ttotal estimated size is 4.96TTIME SENT SNAPSHOT data/datasets/2024-12-06/compressed@20250127T13285714:10:05 372M data/datasets/2024-12-06/compressed@20250127T132857...14:34:51 529G data/datasets/2024-12-06/compressed@20250127T132857...16:25:39 3.12T data/datasets/2024-12-06/compressed@20250127T132857...17:29:32 4.50T data/datasets/2024-12-06/compressed@20250127T132857...18:59:44 6.49T data/datasets/2024-12-06/compressed@20250127T132857root@rancher-node-highmem02:~# zfs list -t snapshot | grep -v data/rancherNAME USED AVAIL REFER MOUNTPOINTdata/datasets/2023-08-07@test 56K - 368K -data/datasets/2024-03-31/compressed@graph-2024-03-31 63.5G - 4.79T -data/datasets/2024-12-06/compressed@20250127T132857 0B - 4.95T -
[x] 5. Install the zfs graph dataset to proper path