[Hardware] Install the metal03 production compute node
Orders: https://mybox.inria.fr/smart-link/13004275-5cee-433b-ba88-3c73141757f0/
Inventory: https://inventory.internal.admin.swh.network/dcim/devices/261/
Summary:
- Rack position: C12/U12
- Management address (DNS): 128.93.134.46 (swh-kube3-adm.inria.fr)
- VLAN configuration: VLAN440
- Management Port: C12-eth-management/eth5 (-> swdc-30c11-02-dc1 (mgmt)/eth10)
- Access Ports:
- C12-1-optical/10 (-> ?)
- C12-1-optical/10 (-> ?)
- Internal IP(s): 192.168.100.133
- Internal DNS name(s): rancher-node-metal03.internal.softwareheritage.org
Tasks:
- Declare the servers in the inventory
- Declare rancher-node-metal03
- Add the management info in the credential store
- Install the OS [1]
- Add the root user password in the credential store
- Add puppet configuration
- Run puppet on the node
- Configure firewall rules
-
As a production rancher node:
-
zfs needs to be configure with a couple of datasets (-> puppet does most of it)
- data/rancher | /var/lib/rancher
- data/docker | /var/lib/docker
- swap file in data/swap (256Go)
- Ensure /tmp is a tmpfs of >= 256Go (/etc/fstab)
-
zfs needs to be configure with a couple of datasets (-> puppet does most of it)
[1] https://docs.softwareheritage.org/sysadm/server-architecture/howto-install-new-physical-server.html
Activity
-
Newest first Oldest first
-
Show all activity Show comments only Show history only
- Vincent Sellier changed milestone to %Dynamic infrastructure [Roadmap - Tooling and infrastructure]
changed milestone to %Dynamic infrastructure [Roadmap - Tooling and infrastructure]
- Vincent Sellier added hardware kubernetes labels
added hardware kubernetes labels
- Vincent Sellier assigned to @vsellier
assigned to @vsellier
- Author Owner
Taking the task to define the installation information
- Vincent Sellier changed the description
changed the description
- Vincent Sellier marked the checklist item Declare the servers in the inventory as completed
marked the checklist item Declare the servers in the inventory as completed
- Vincent Sellier changed the description
changed the description
- Vincent Sellier made the issue confidential
made the issue confidential
- Vincent Sellier made the issue visible to everyone
made the issue visible to everyone
- Vincent Sellier changed the description
changed the description
- Author Owner
The server should be installed the 2023-09-06
1 Collapse replies - Author Owner
The server is racked and up.
The iLo is responding. The credentials are in the credential store.
- Owner
great thx, we'll install it tomorrow then (w/ @guillaume).
Edited by Antoine R. Dumont
- Vincent Sellier marked the checklist item Add the management info in the credential store as completed
marked the checklist item Add the management info in the credential store as completed
- Vincent Sellier unassigned @vsellier
unassigned @vsellier
- Antoine R. Dumont changed the description
changed the description
- Guillaume Samson mentioned in commit ipxe@9c92ba70
mentioned in commit ipxe@9c92ba70
- Antoine R. Dumont changed the description
changed the description
- Owner
(w/ gsamson)
Build configuration and push them to pergamon:
$ cat configs/rancher-node-metal03.yaml --- VLAN_ID: 440 IPADDRESS: 192.168.100.133 NETMASK: 255.255.255.0 GATEWAY: 192.168.100.1 NAMESERVER: 192.168.100.29 HOSTNAME: rancher-node-metal03 DOMAINNAME: internal.softwareheritage.org DEPLOYMENT: production SUBNET: sesi_rocquencourt BOOT_DISK_ID_PATTERN: "*_Boot_Controller_*"% $ HOSTNAME=rancher-node-metal03 ./configs/build_iso.sh $HOSTNAME make: Entering directory '/home/tony/work/inria/repo/swh/sysadm-environment/swh-ipxe/src' [DEPS] image/embedded.c [BUILD] bin-x86_64-efi/embedded.o [VERSION] bin-x86_64-efi/version.ipxe.efi.o [AR] bin-x86_64-efi/blib.a ar: creating bin-x86_64-efi/blib.a [LD] bin-x86_64-efi/ipxe.efi.tmp [FINISH] bin-x86_64-efi/ipxe.efi [GENFSIMG] bin-x86_64-efi/ipxe.iso xorriso 1.5.4 : RockRidge filesystem manipulator, libburnia project. Makefile.efi:54: warning: pattern recipe did not update peer target 'bin-x86_64-efi/ipxe.usb'. rm bin-x86_64-efi/version.ipxe.efi.o bin-x86_64-efi/ipxe.efi make: Leaving directory '/home/tony/work/inria/repo/swh/sysadm-environment/swh-ipxe/src' built iso image /home/tony/work/inria/repo/swh/sysadm-environment/swh-ipxe/configs/rancher-node-metal03.iso Generated preseeding config in /home/tony/work/inria/repo/swh/sysadm-environment/swh-ipxe/configs/preseeding/rancher-node-metal03.txt and /home/tony/work/inria/repo/swh/sysadm-environment/swh-ipxe/configs/preseeding/finish_install/rancher-node-metal03.sh. $ rsync -av --include="*/" --include="${HOSTNAME}.sh" --include="${HOSTNAME}.txt" \ --exclude="*" configs/preseeding/ \ pergamon.internal.softwareheritage.org:/srv/softwareheritage/preseeding/ sending incremental file list rsync: failed to set times on "/srv/softwareheritage/preseeding/.": Operation not permitted (1) ./ rancher-node-metal03.txt rsync: failed to set times on "/srv/softwareheritage/preseeding/finish_install": Operation not permitted (1) finish_install/ finish_install/rancher-node-metal03.sh sent 20,693 bytes received 282 bytes 41,950.00 bytes/sec total size is 20,412 speedup is 0.97 rsync error: some files/attrs were not transferred (see previous errors) (code 23) at main.c(1338) [sender=3.2.7] $ ssh pergamon.internal.softwareheritage.org ls -lah /srv/softwareheritage/preseeding/rancher-node-metal03.txt /srv/softwareheritage/preseeding/finish_install/rancher-node-metal03.sh -rw-r--r-- 1 ardumont ardumont 1.8K Sep 8 08:29 /srv/softwareheritage/preseeding/finish_install/rancher-node-metal03.sh -rw-r--r-- 1 ardumont sudo 19K Sep 8 08:29 /srv/softwareheritage/preseeding/rancher-node-metal03.txt
- Antoine R. Dumont changed the description
changed the description
- Antoine R. Dumont mentioned in commit swh/infra/puppet/puppet-swh-site@39c6630b
mentioned in commit swh/infra/puppet/puppet-swh-site@39c6630b
- Antoine R. Dumont marked the checklist item Add puppet configuration as completed
marked the checklist item Add puppet configuration as completed
- Owner
First run of puppet was ran (and failed [1]). Which allowed to have the record registered in pergamon (
puppet agent -t
there). Which allowed to install the proper source.list to install zfs-dkms. Install of zfs-dkms, then reboot. A secondpuppet agent -t
went through.And the zfs configuration went through:
root@rancher-node-metal03:~# zfs list NAME USED AVAIL REFER MOUNTPOINT data 1.14M 3.36T 96K /data data/docker 172K 3.36T 172K /var/lib/docker data/kubelet 96K 3.36T 96K /var/lib/kubelet data/rancher 96K 3.36T 96K /var/lib/rancher
Remains to:
- Configure the data/swap & data/volumes to create.
- Register rancher-node-metal03 in the production cluster
- Add labels to the nodes so pods can be scheduled there.
[1] Unexpectedly expected ;). The puppet manifests expect the zfs-dkms to be installed. Which is true for most of our machines (especially vms) but not for bare machines.
- Antoine R. Dumont marked the checklist item Run puppet on the node as completed
marked the checklist item Run puppet on the node as completed
- Antoine R. Dumont changed the description
changed the description
- Owner
Swap:
root@rancher-node-metal03:~# mkswap -L swap /dev/zvol/data/swap Setting up swapspace version 1, size = 256 GiB (274877902848 bytes) LABEL=swap, UUID=20fffd00-7477-4053-a2d6-bb2b709197a7 root@rancher-node-metal03:~# zfs list NAME USED AVAIL REFER MOUNTPOINT data 264G 3.10T 96K /data data/docker 172K 3.10T 172K /var/lib/docker data/kubelet 96K 3.10T 96K /var/lib/kubelet data/rancher 96K 3.10T 96K /var/lib/rancher data/swap 264G 3.36T 64K - root@rancher-node-metal03:~# grep swap /etc/fstab #/dev/mapper/rancher--node--metal03--vg-swap_1 none swap sw 0 0 LABEL="swap" swap swap sw 0 0 root@rancher-node-metal03:~# swapon /dev/zvol/data/swap
- Antoine R. Dumont marked the checklist item swap file in data/swap (256Go) as completed
marked the checklist item swap file in data/swap (256Go) as completed
- Antoine R. Dumont marked the checklist item zfs needs to be configure with a couple of datasets (-> puppet does most of it) as completed
marked the checklist item zfs needs to be configure with a couple of datasets (-> puppet does most of it) as completed
- Owner
Ensure /tmp is a tmpfs of >= 256Go (/etc/fstab)
Another
puppet agent -t
and it created the /tmp as wanted.root@rancher-node-metal03:~# puppet agent -t Info: Using configured environment 'production' Info: Retrieving pluginfacts Info: Retrieving plugin Info: Retrieving locales Info: Loading facts Info: Caching catalog for rancher-node-metal03.internal.softwareheritage.org Info: Applying configuration version '1694175712' Notice: /Stage[main]/Profile::Mountpoints/Mount[/tmp]/options: options changed 'size=1023406080,nr_inodes=200m,noexec,n osuid,nodev,relatime,rw' to 'size=275901308928,nr_inodes=200m,noexec,nosuid,nodev,relatime,rw' Info: Computing checksum on file /etc/fstab Info: /Stage[main]/Profile::Mountpoints/Mount[/tmp]: Scheduling refresh of Mount[/tmp] Info: Mount[/tmp](provider=parsed): Remounting Notice: /Stage[main]/Profile::Mountpoints/Mount[/tmp]: Triggered 'refresh' from 1 event Info: /Stage[main]/Profile::Mountpoints/Mount[/tmp]: Scheduling refresh of Mount[/tmp] Notice: Applied catalog in 17.24 seconds
- Antoine R. Dumont mentioned in commit swh/infra/puppet/puppet-swh-site@6f2cc3a0
mentioned in commit swh/infra/puppet/puppet-swh-site@6f2cc3a0
- Antoine R. Dumont mentioned in merge request swh/infra/puppet/puppet-swh-site!641 (merged)
mentioned in merge request swh/infra/puppet/puppet-swh-site!641 (merged)
- Antoine R. Dumont mentioned in commit swh/infra/puppet/puppet-swh-site@27d23aa0
mentioned in commit swh/infra/puppet/puppet-swh-site@27d23aa0
- Antoine R. Dumont marked the checklist item Configure firewall rules as completed
marked the checklist item Configure firewall rules as completed
- Antoine R. Dumont marked the checklist item Ensure /tmp is a tmpfs of >= 256Go (/etc/fstab) as completed
marked the checklist item Ensure /tmp is a tmpfs of >= 256Go (/etc/fstab) as completed
- Antoine R. Dumont marked the checklist item As a production rancher node: as completed
marked the checklist item As a production rancher node: as completed
- Owner
Node is registered in the production cluster and new pods are scheduled and running without issues.
Logs are showing up in kibana. Metrics are popping up in grafana too [1]
[1] https://grafana.softwareheritage.org/goto/jYZZNjkSz?orgId=1
- Owner
Done.
- Antoine R. Dumont closed
closed
- Antoine R. Dumont mentioned in commit swh/infra/ci-cd/swh-charts@80565212
mentioned in commit swh/infra/ci-cd/swh-charts@80565212
- Antoine R. Dumont mentioned in commit swh/infra/ci-cd/swh-charts@5f5b06fd
mentioned in commit swh/infra/ci-cd/swh-charts@5f5b06fd
- Antoine R. Dumont mentioned in commit swh/infra/ci-cd/swh-charts@5b58e5e5
mentioned in commit swh/infra/ci-cd/swh-charts@5b58e5e5
- Antoine R. Dumont mentioned in commit swh/infra/ci-cd/swh-charts@8c28f550
mentioned in commit swh/infra/ci-cd/swh-charts@8c28f550
- Antoine R. Dumont mentioned in commit swh/infra/puppet/puppet-swh-site@6140468a
mentioned in commit swh/infra/puppet/puppet-swh-site@6140468a