Follow up of the staging deployment which went fine [1]
A lot more work will be needed as we need to recycle the mam machine to a new highmem03
machine into the production kubernetes cluster.
Plan:
Backup disk, network information (in case there is some issues during reinstall)
Update inventory about this machine
Backup any data interesting from mam (previous prototype machine) if any
mam: Disable puppet agent
pergamon: Decommission mam from puppet & disable alerts running for services on mam
[x] Backup any data interesting from mam (previous prototype machine)
I've asked to relevant people (the ones using it) and they have nothing left there.
So nothing to do.
00:00:00 <*> Date changed to March 27, 202510:08:55 <ardumont> david: vlorentz: mam will get integrated into the kubernetes production cluster (as in it will get reinstalled), any data you want to backup? please let me know soon or write in https://gitlab.softwareheritage.org/swh/infra/sysadm-environment/-/issues/5620, tia10:09:15 <ardumont> because it will then run the actual provenance grpc server ^10:10:01 <ardumont> (i'm planning on doing this after lunch because before we've got the tech meeting)15:36:28 <ardumont> david: vlorentz: I gather since there was no response back that all is fine and there is nothing to backup15:40:49 <vlorentz> I got nothing left15:42:49 <david> I did some retirieval yesterday (nothing critical, just in case), I'm ok now I believe (was about to add a comment in the issue but got caught by things)
Disk installation got done manually to avoid losing too much data disk.
Only 4Tb disk and it used to be a (raid1) disk for the system (which takes less than 100g of space).
Now that space got sized for 100g.
And the remaining space will get injected in the future zfs pool of data [1] \o/
root@rancher-node-highmem03:~# puppet agent --testInfo: Using environment 'production'Info: Retrieving pluginfactsInfo: Retrieving pluginInfo: Loading factsWarning: The current total number of facts: 2417 exceeds the number of facts limit: 2048Info: Caching catalog for rancher-node-highmem03.internal.softwareheritage.orgInfo: Applying configuration version '1743165079'Error: Execution of '/usr/sbin/zpool create data mirror /dev/disk/by-id/nvme-VO003840KXAVQ_SNABN5890I130BQ23 /dev/disk/by-id/nvme-VO003840KXAVQ_SNABN5890I130BQ28 mirror /dev/disk/by-id/nvme-VO003840KXAVQ_SNABN5890I130BQ24 /dev/disk/by-id/nvme-VO003840KXAVQ_SNABN5890I070BQ30 mirror /dev/disk/by-id/nvme-VO003840KXAVQ_SNABN5890I130BQ22 /dev/disk/by-id/nvme-VO003840KXAVQ_SNABN5890I070BQ33 mirror /dev/disk/by-id/nvme-VO003840KXAVQ_SNABN5890I130BQ25 /dev/disk/by-id/nvme-VO003840KXAVQ_SNABN5890I070BQ34 mirror /dev/disk/by-id/nvme-VO003840KXAVQ_SNABN5890I070BQ2V /dev/disk/by-id/nvme-VO003840KXAVQ_SNABN5890I130BQ2D mirror /dev/disk/by-id/nvme-VO003840KXAVQ_SNABN5890I130BQ2E /dev/disk/by-id/nvme-VO003840KXAVQ_SNABN5890I070BQ2X mirror /dev/disk/by-id/nvme-VO003840KXAVQ_SNABN5890I070BQ2W /dev/disk/by-id/nvme-VO003840KXAVQ_SNABN5890I130BQ2C mirror /dev/disk/by-id/nvme-VO003840KXAVQ_SNABN5890I070BQ2Y /dev/disk/by-id/nvme-VO003840KXAVQ_SNABN5890I130BQ2B mirror /dev/disk/by-id/nvme-VO003840KXAVQ_SNABN5890I070BQ32 /dev/disk/by-id/nvme-VO003840KXAVQ_SNABN5890I130BQ2A mirror /dev/disk/by-id/nvme-VO003840KXAVQ_SNABN5890I070BQ31 /dev/disk/by-id/nvme-VO003840KXAVQ_SNABN5890I130BQ27 mirror /dev/disk/by-id/nvme-VO003840KXAVQ_SNABN5890I130BQ29 /dev/disk/by-id/nvme-VO003840KXAVQ_SNABN5890I070BQ35 mirror /dev/disk/by-id/nvme-VO003840KXAVQ_SNABN5890I130BQ21-part3 /dev/disk/by-id/nvme-VO003840KXAVQ_SNABN5890I130BQ26-part3' returned 1: invalid vdev specificationuse '-f' to override the following errors:/dev/disk/by-id/nvme-VO003840KXAVQ_SNABN5890I130BQ23-part1 is part of potentially active pool 'data'/dev/disk/by-id/nvme-VO003840KXAVQ_SNABN5890I130BQ28-part1 is part of potentially active pool 'data'/dev/disk/by-id/nvme-VO003840KXAVQ_SNABN5890I130BQ24-part1 is part of potentially active pool 'data'/dev/disk/by-id/nvme-VO003840KXAVQ_SNABN5890I070BQ30-part1 is part of potentially active pool 'data'/dev/disk/by-id/nvme-VO003840KXAVQ_SNABN5890I130BQ22-part1 is part of potentially active pool 'data'/dev/disk/by-id/nvme-VO003840KXAVQ_SNABN5890I070BQ33-part1 is part of potentially active pool 'data'/dev/disk/by-id/nvme-VO003840KXAVQ_SNABN5890I130BQ25-part1 is part of potentially active pool 'data'/dev/disk/by-id/nvme-VO003840KXAVQ_SNABN5890I070BQ34-part1 is part of potentially active pool 'data'/dev/disk/by-id/nvme-VO003840KXAVQ_SNABN5890I070BQ2V-part1 is part of potentially active pool 'data'/dev/disk/by-id/nvme-VO003840KXAVQ_SNABN5890I130BQ2D-part1 is part of potentially active pool 'data'/dev/disk/by-id/nvme-VO003840KXAVQ_SNABN5890I130BQ2E-part1 is part of potentially active pool 'data'/dev/disk/by-id/nvme-VO003840KXAVQ_SNABN5890I070BQ2X-part1 is part of potentially active pool 'data'/dev/disk/by-id/nvme-VO003840KXAVQ_SNABN5890I070BQ2W-part1 is part of potentially active pool 'data'/dev/disk/by-id/nvme-VO003840KXAVQ_SNABN5890I130BQ2C-part1 is part of potentially active pool 'data'/dev/disk/by-id/nvme-VO003840KXAVQ_SNABN5890I070BQ2Y-part1 is part of potentially active pool 'data'/dev/disk/by-id/nvme-VO003840KXAVQ_SNABN5890I130BQ2B-part1 is part of potentially active pool 'data'/dev/disk/by-id/nvme-VO003840KXAVQ_SNABN5890I070BQ32-part1 is part of potentially active pool 'data'/dev/disk/by-id/nvme-VO003840KXAVQ_SNABN5890I130BQ2A-part1 is part of potentially active pool 'data'/dev/disk/by-id/nvme-VO003840KXAVQ_SNABN5890I070BQ31-part1 is part of potentially active pool 'data'/dev/disk/by-id/nvme-VO003840KXAVQ_SNABN5890I130BQ27-part1 is part of potentially active pool 'data'/dev/disk/by-id/nvme-VO003840KXAVQ_SNABN5890I130BQ29-part1 is part of potentially active pool 'data'/dev/disk/by-id/nvme-VO003840KXAVQ_SNABN5890I070BQ35-part1 is part of potentially active pool 'data'Error: /Stage[main]/Profile::Zfs::Common/Zpool[data]/ensure: change from 'absent' to 'present' failed: Execution of '/usr/sbin/zpool create data mirror /dev/disk/by-id/nvme-VO003840KXAVQ_SNABN5890I130BQ23 /dev/disk/by-id/nvme-VO003840KXAVQ_SNABN5890I130BQ28 mirror /dev/disk/by-id/nvme-VO003840KXAVQ_SNABN5890I130BQ24 /dev/disk/by-id/nvme-VO003840KXAVQ_SNABN5890I070BQ30 mirror /dev/disk/by-id/nvme-VO003840KXAVQ_SNABN5890I130BQ22 /dev/disk/by-id/nvme-VO003840KXAVQ_SNABN5890I070BQ33 mirror /dev/disk/by-id/nvme-VO003840KXAVQ_SNABN5890I130BQ25 /dev/disk/by-id/nvme-VO003840KXAVQ_SNABN5890I070BQ34 mirror /dev/disk/by-id/nvme-VO003840KXAVQ_SNABN5890I070BQ2V /dev/disk/by-id/nvme-VO003840KXAVQ_SNABN5890I130BQ2D mirror /dev/disk/by-id/nvme-VO003840KXAVQ_SNABN5890I130BQ2E /dev/disk/by-id/nvme-VO003840KXAVQ_SNABN5890I070BQ2X mirror /dev/disk/by-id/nvme-VO003840KXAVQ_SNABN5890I070BQ2W /dev/disk/by-id/nvme-VO003840KXAVQ_SNABN5890I130BQ2C mirror /dev/disk/by-id/nvme-VO003840KXAVQ_SNABN5890I070BQ2Y /dev/disk/by-id/nvme-VO003840KXAVQ_SNABN5890I130BQ2B mirror /dev/disk/by-id/nvme-VO003840KXAVQ_SNABN5890I070BQ32 /dev/disk/by-id/nvme-VO003840KXAVQ_SNABN5890I130BQ2A mirror /dev/disk/by-id/nvme-VO003840KXAVQ_SNABN5890I070BQ31 /dev/disk/by-id/nvme-VO003840KXAVQ_SNABN5890I130BQ27 mirror /dev/disk/by-id/nvme-VO003840KXAVQ_SNABN5890I130BQ29 /dev/disk/by-id/nvme-VO003840KXAVQ_SNABN5890I070BQ35 mirror /dev/disk/by-id/nvme-VO003840KXAVQ_SNABN5890I130BQ21-part3 /dev/disk/by-id/nvme-VO003840KXAVQ_SNABN5890I130BQ26-part3' returned 1: invalid vdev specificationuse '-f' to override the following errors:/dev/disk/by-id/nvme-VO003840KXAVQ_SNABN5890I130BQ23-part1 is part of potentially active pool 'data'/dev/disk/by-id/nvme-VO003840KXAVQ_SNABN5890I130BQ28-part1 is part of potentially active pool 'data'/dev/disk/by-id/nvme-VO003840KXAVQ_SNABN5890I130BQ24-part1 is part of potentially active pool 'data'/dev/disk/by-id/nvme-VO003840KXAVQ_SNABN5890I070BQ30-part1 is part of potentially active pool 'data'/dev/disk/by-id/nvme-VO003840KXAVQ_SNABN5890I130BQ22-part1 is part of potentially active pool 'data'/dev/disk/by-id/nvme-VO003840KXAVQ_SNABN5890I070BQ33-part1 is part of potentially active pool 'data'/dev/disk/by-id/nvme-VO003840KXAVQ_SNABN5890I130BQ25-part1 is part of potentially active pool 'data'/dev/disk/by-id/nvme-VO003840KXAVQ_SNABN5890I070BQ34-part1 is part of potentially active pool 'data'/dev/disk/by-id/nvme-VO003840KXAVQ_SNABN5890I070BQ2V-part1 is part of potentially active pool 'data'/dev/disk/by-id/nvme-VO003840KXAVQ_SNABN5890I130BQ2D-part1 is part of potentially active pool 'data'/dev/disk/by-id/nvme-VO003840KXAVQ_SNABN5890I130BQ2E-part1 is part of potentially active pool 'data'/dev/disk/by-id/nvme-VO003840KXAVQ_SNABN5890I070BQ2X-part1 is part of potentially active pool 'data'/dev/disk/by-id/nvme-VO003840KXAVQ_SNABN5890I070BQ2W-part1 is part of potentially active pool 'data'/dev/disk/by-id/nvme-VO003840KXAVQ_SNABN5890I130BQ2C-part1 is part of potentially active pool 'data'/dev/disk/by-id/nvme-VO003840KXAVQ_SNABN5890I070BQ2Y-part1 is part of potentially active pool 'data'/dev/disk/by-id/nvme-VO003840KXAVQ_SNABN5890I130BQ2B-part1 is part of potentially active pool 'data'/dev/disk/by-id/nvme-VO003840KXAVQ_SNABN5890I070BQ32-part1 is part of potentially active pool 'data'/dev/disk/by-id/nvme-VO003840KXAVQ_SNABN5890I130BQ2A-part1 is part of potentially active pool 'data'/dev/disk/by-id/nvme-VO003840KXAVQ_SNABN5890I070BQ31-part1 is part of potentially active pool 'data'/dev/disk/by-id/nvme-VO003840KXAVQ_SNABN5890I130BQ27-part1 is part of potentially active pool 'data'/dev/disk/by-id/nvme-VO003840KXAVQ_SNABN5890I130BQ29-part1 is part of potentially active pool 'data'/dev/disk/by-id/nvme-VO003840KXAVQ_SNABN5890I070BQ35-part1 is part of potentially active pool 'data'Notice: /Stage[main]/Profile::Zfs::Kubelet/Zfs[data/kubelet]: Dependency Zpool[data] has failures: trueWarning: /Stage[main]/Profile::Zfs::Kubelet/Zfs[data/kubelet]: Skipping because of failed dependenciesWarning: /Stage[main]/Profile::Zfs::Rancher/Zfs[data/rancher]: Skipping because of failed dependenciesWarning: /Stage[main]/Profile::Zfs::Rancher/Zfs[data/volumes]: Skipping because of failed dependenciesWarning: /Stage[main]/Profile::Rancher/File[/var/lib/rancher]: Skipping because of failed dependenciesWarning: /Stage[main]/Profile::Rancher/File[/var/lib/rancher/rke2]: Skipping because of failed dependenciesWarning: /Stage[main]/Profile::Rancher/File[/var/lib/rancher/rke2/agent]: Skipping because of failed dependenciesWarning: /Stage[main]/Profile::Rancher/File[/var/lib/rancher/rke2/agent/containerd]: Skipping because of failed dependenciesWarning: /Stage[main]/Profile::Rancher/File[/var/lib/rancher/rke2/agent/containerd/io.containerd.snapshotter.v1.zfs]: Skipping because of failed dependenciesNotice: Applied catalog in 7.77 seconds
so import the potentially activate data
root@rancher-node-highmem03:~# zpool import pool: data id: 3057112486231962622 state: ONLINEstatus: The pool was last accessed by another system. action: The pool can be imported using its name or numeric identifier and the '-f' flag. see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-EY config: data ONLINE raidz1-0 ONLINE nvme2n1 ONLINE nvme3n1 ONLINE nvme4n1 ONLINE nvme5n1 ONLINE nvme6n1 ONLINE nvme7n1 ONLINE nvme8n1 ONLINE nvme9n1 ONLINE nvme10n1 ONLINE nvme11n1 ONLINE nvme12n1 ONLINE nvme13n1 ONLINE nvme14n1 ONLINE nvme15n1 ONLINE nvme16n1 ONLINE nvme17n1 ONLINE nvme18n1 ONLINE nvme19n1 ONLINE nvme20n1 ONLINE nvme21n1 ONLINE nvme22n1 ONLINE nvme23n1 ONLINEroot@rancher-node-highmem03:~# zpool statusno pools availableroot@rancher-node-highmem03:~# zpool import datacannot import 'data': pool was previously in use from another system.Last accessed by mam (hostid=11e9a99e) at Fri Mar 28 09:26:33 2025The pool can be imported, use 'zpool import -f' to import the pool.root@rancher-node-highmem03:~# zpool import -f dataroot@rancher-node-highmem03:~# zpool status pool: data state: ONLINE scan: scrub repaired 0B in 1 days 11:10:10 with 0 errors on Mon Mar 10 11:34:11 2025config: NAME STATE READ WRITE CKSUM data ONLINE 0 0 0 raidz1-0 ONLINE 0 0 0 nvme2n1 ONLINE 0 0 0 nvme3n1 ONLINE 0 0 0 nvme4n1 ONLINE 0 0 0 nvme5n1 ONLINE 0 0 0 nvme6n1 ONLINE 0 0 0 nvme7n1 ONLINE 0 0 0 nvme8n1 ONLINE 0 0 0 nvme9n1 ONLINE 0 0 0 nvme10n1 ONLINE 0 0 0 nvme11n1 ONLINE 0 0 0 nvme12n1 ONLINE 0 0 0 nvme13n1 ONLINE 0 0 0 nvme14n1 ONLINE 0 0 0 nvme15n1 ONLINE 0 0 0 nvme16n1 ONLINE 0 0 0 nvme17n1 ONLINE 0 0 0 nvme18n1 ONLINE 0 0 0 nvme19n1 ONLINE 0 0 0 nvme20n1 ONLINE 0 0 0 nvme21n1 ONLINE 0 0 0 nvme22n1 ONLINE 0 0 0 nvme23n1 ONLINE 0 0 0errors: No known data errors
And destroy it.
root@rancher-node-highmem03:~# zpool destroy dataroot@rancher-node-highmem03:~# zpool statusno pools available
It accepted to run now:
root@rancher-node-highmem03:~# puppet agent --testInfo: Using environment 'production'Info: Retrieving pluginfactsInfo: Retrieving pluginInfo: Loading factsWarning: The current total number of facts: 2417 exceeds the number of facts limit: 2048Info: Caching catalog for rancher-node-highmem03.internal.softwareheritage.orgInfo: Applying configuration version '1743165359'Notice: /Stage[main]/Profile::Zfs::Common/Zpool[data]/ensure: createdNotice: /Stage[main]/Profile::Zfs::Kubelet/Zfs[data/kubelet]/ensure: createdNotice: /Stage[main]/Profile::Zfs::Rancher/Zfs[data/rancher]/ensure: createdNotice: /Stage[main]/Profile::Zfs::Rancher/Zfs[data/volumes]/ensure: createdNotice: /Stage[main]/Profile::Rancher/File[/var/lib/rancher/rke2]/ensure: createdNotice: /Stage[main]/Profile::Rancher/File[/var/lib/rancher/rke2/agent]/ensure: createdNotice: /Stage[main]/Profile::Rancher/File[/var/lib/rancher/rke2/agent/containerd]/ensure: createdNotice: /Stage[main]/Profile::Rancher/File[/var/lib/rancher/rke2/agent/containerd/io.containerd.snapshotter.v1.zfs]/ensure: createdNotice: Applied catalog in 12.39 seconds
Let's install some default options on dataset data:
root@rancher-node-highmem03:~# zfs set compression=zstd data; zfs set atime=off dataroot@rancher-node-highmem03:~# zfs get all data | grep "time\\|xattr\\|compression"data compression off defaultdata atime on defaultdata xattr on defaultdata relatime off defaultroot@rancher-node-highmem03:~# zfs get all data | grep "time\\|xattr\\|compression"data compression zstd localdata atime off localdata xattr on defaultdata relatime off default