I had to use a system shell to manually mount the lvmetad socket inside the alternate root, the vg configuration backup was hanging. Gotta love switches running Debian, I guess.
The bifur upgrade hanged in the same place, but I noticed that the the lvm2 postinst has vgcfgbackup || :, so I just killed the hung vgcfgbackup process.
dori and nori switchs basic configurations are done, the admin connection is possible via their ipmi addresses.
The user swh is configured on both switch.
A first MLAG configuration attempt was unsuccessfully tried and is not yet working.
The C06 nodes ILos are configured with the ips declared in the inventory.
There is an additional server that apparently was ordered as a bastion in the rack. It's currently name gloin003 as is it's label in the rack but it doesn't have a frontend configuration, so we'll see how to rename it later.
For the record, the Ilo ips were already configured via a dhcp server running on angrenost, so we were just missing the password to being able to access the server just after their installation.
bifur and bofur have been configured, via ansible:
VLT setup with 1 x 100G cross-connect on vlt domain 1
port-channels have been configured for link aggregation, each with a matching vlt-port-channel id
1-2 for gloin001/002
11-12 for dwalin001/002
21-40 for balin001-020
port-channel 999 for interconnection with nori & dori (manually configured)
nori and dori have been configured, manually:
mlag setup with 2 x 100G cross connect on port channel 1000 (peer ip addresses 10.25.254.1/30 + 10.25.254.2/30; mlag-vip on the management network 10.25.3.246/24)
mlag port channel 999 configured for interconnect with Dell switches
1 x 100G link between bifur (eth 1/1/26) and nori (eth 1/20)
2 x 25G backup links between bofur (eth 1/1/21-1/1/22) and dori (eth 1/16-1/17), to be upgraded to 1x100G when a 3m-long QSFP28 DAC is acquired
(the dori eth 1/16 link is reporting "bad signal integrity", I might have kinked the DAC when pulling it through the floor tiles. it's only a temp fallback link, it's probably not a problem, we should monitor it once we start installing the OSDs)
mlag port channel 13 has been configured for dwalin003 (using port 8 on both switches)
The gloin003 server has been connected with a pair of leftover 10 Gbps DACs to both ports 1 on nori & dori.
Network equipment shopping list:
2 x 3m QSFP28 (100 Gbps) DAC (interconnect bifur/bofur + bofur/dori) - we need 3m as we have to skip over one rack
2 x 50cm or 1m SFP28 (25 Gbps) DAC (gloin003 - {dori,nori})
2 x 3m SFP28 (25 Gbps) DAC (extra for angrenost (?) + 1 spare + 1 spare recovered from the bofur-dori links)
1 x additional 50cm QSFP28 DAC (extra cross-connect bifur - bofur, if we have a spare QSFP28 port on both switches, or spare) \
Is 50cm long enough to connect the switches in 2 different racks ?
It should have enough free ports to move 4 servers from one of the qsfp28->4sfp8 adapter
2 x 3m SFP28 (25 Gbps) DAC (extra for angrenost (?) + 1 spare + 1 spare recovered from the bofur-dori links)
We can order a couple of additional ones to have some spares just in case ;)
Actually we need to jump below one rack, so 2 x 3m it is (we can move the 2m bifur/nori link back between bifur and bofur, and use the 3m cables to go between dell and hp with more slack)
The ports of the C06 OSDs on nori&dori are configured with untagged vlan 1 and no aggregation to facilitate the install process (it's not clear that mlag supports a fallback in case no lacp negotiation happens, so it's much easier that way). They should be moved to mlags once the servers are installed.
Upgrading nori & dori successively (both switches upgraded twice in a row, because of the onyx upgrade process) didn't affect connectivity from gloin003 to the rest of the cluster.