$ IPADDRESS=$(swhpass show infra/$HOSTNAME/idrac | awk -F/ '/^Url/{print $NF}')LOGIN=$(swhpass show infra/$HOSTNAME/idrac | awk '/^User/{print $2}')PASSWORD=$(swhpass show infra/$HOSTNAME/idrac | head -1)ipmitool -I lanplus -H "$IPADDRESS" -U "$LOGIN" -P "$PASSWORD" sol activategpg: WARNING: server 'gpg-agent' is older than us (2.3.7 < 2.4.0)gpg: WARNING: server 'gpg-agent' is older than us (2.3.7 < 2.4.0)gpg: WARNING: server 'gpg-agent' is older than us (2.3.7 < 2.4.0)[SOL Session operational. Use ~? for help]
Note: swhpass is a wrapper on my machine to use the swh creds store (as i have my own already configured with "pass" which predates the swh one, others can simply use "pass").
so far, it complains on the network interface setup [1] [2]:
│ │ │ Failed to run preseeded command │ │ Execution of preseeded command "anna-install net-modules-`uname -r` │ │ && modprobe bonding mode=802.3ad miimon=100 lacp_rate=slow │ │ xmit_hash_policy=layer3+4 && ip l set ens10f0np0 master bond0 && ip l │ │ set ens10f1np1 master bond0 && ip l add link bond0 name vlan440 type │ │ vlan id 440 && for iface in ens10f0np0 ens10f1np1 bond0 vlan440; do │ │ ip l set $iface up; done && ip a add dev vlan440 │ │ 192.168.100.64/255.255.255.0 && ip r add default via 192.168.100.1 && │ │ echo esnode7 > /etc/hostname && echo nameserver 192.168.100.29 > │ │ /etc/resolv.conf && sed -i -e '/ip link set/d' │ │ /bin/check-missing-firmware && (echo '#!/bin/sh'; echo 'exit 0') > │ │ /bin/netcfg && chmod +x /bin/netcfg && sleep 10" failed with exit │ │ code 2. │ │ │
[2]
┌───────────────────┤ [!] Detect network hardware ├────────────────────┐ │ │ │ Some of your hardware needs non-free firmware files to operate. The │ │ firmware can be loaded from removable media, such as a USB stick or │ │ floppy. │ │ │ │ The missing firmware files are: │ │ qed/qed_init_values_zipped-8.42.2.0.bin │ │ qed/qed_init_values_zipped-8.42.2.0.bin │ │ │ │ If you have such media available now, insert it, and continue. │ │ │ │ Load missing firmware from removable media? │ │ │ │ <Yes> <No> │ │ │
Fallback to a shell command within the installer (from serial console) to check the network interfaces.
They seem to mismatch what we have in the preseed files:
I've adapted the preseeding to match those interfaces name and trigger back the install to check whether that unstuck the install.
root@pergamon:~# grep ens2f0np0 /srv/softwareheritage/preseeding/esnode7.txt ip l set ens2f0np0 master bond0 && \ for iface in ens2f0np0 ens2f1np1 bond0 vlan440; do ip l set $iface up; done && \root@pergamon:~# grep ens2f1np1 /srv/softwareheritage/preseeding/esnode7.txt ip l set ens2f1np1 master bond0 && \ for iface in ens2f0np0 ens2f1np1 bond0 vlan440; do ip l set $iface up; done && \
Well, that did not raise the first issue on network interface setup so it helped.
That still fails on the hardware detection though [2].
Those files would be part of a debian package (non-free) firmware-qlogic apparently.
[2]
┌───────────────────┤ [!] Detect network hardware ├────────────────────┐ │ │ │ Some of your hardware needs non-free firmware files to operate. The │ │ firmware can be loaded from removable media, such as a USB stick or │ │ floppy. │ │ │ │ The missing firmware files are: │ │ qed/qed_init_values_zipped-8.42.2.0.bin │ │ qed/qed_init_values_zipped-8.42.2.0.bin │ │ │ │ If you have such media available now, insert it, and continue. │ │ │ │ Load missing firmware from removable media? │ │ │ │ <Yes> <No> │ │ │
That does not seem enough as there is no ip associated to those interfaces [1]
(I have tried to use networkctl reconfigure and a reboot too)
[1]
root@esnode7:~# ip a1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever inet6 ::1/128 scope host valid_lft forever preferred_lft forever2: ens2f0np0: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc mq master bond0 state UP group default qlen 1000 link/ether e2:12:55:04:50:b5 brd ff:ff:ff:ff:ff:ff permaddr 88:e9:a4:67:81:c0 altname enp3s0f0np03: ens2f1np1: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc mq master bond0 state UP group default qlen 1000 link/ether e2:12:55:04:50:b5 brd ff:ff:ff:ff:ff:ff permaddr 88:e9:a4:67:81:c1 altname enp3s0f1np14: bond0: <BROADCAST,MULTICAST,MASTER,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000 link/ether e2:12:55:04:50:b5 brd ff:ff:ff:ff:ff:ff5: vlan440@bond0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000 link/ether e2:12:55:04:50:b5 brd ff:ff:ff:ff:ff:ff inet6 fe80::e012:55ff:fe04:50b5/64 scope link valid_lft forever preferred_lft foreverroot@esnode7:~# cat /etc/systemd/network/bond0-interfaces.network[Match]Name=ens2f0np0Name=ens2f1np1[Network]Bond=bond0root@esnode7:~# dmesg...[ 5.299224] 8021q: 802.1Q VLAN Support v1.8[ 5.360830] 8021q: adding VLAN 0 to HW filter on device ens2f0np0[ 5.372608] 8021q: adding VLAN 0 to HW filter on device ens2f1np1[ 5.405553] bond0: Warning: No 802.3ad response from the link partner for any adapters in the bond[ 5.405598] 8021q: adding VLAN 0 to HW filter on device bond0[ 5.405684] IPv6: ADDRCONF(NETDEV_CHANGE): vlan440: link becomes ready[ 5.405765] IPv6: ADDRCONF(NETDEV_CHANGE): bond0: link becomes ready[ 5.428853] bond0: (slave ens2f1np1): link status definitely up, 25000 Mbps full duplex[ 5.428860] bond0: active interface up!...
Looks like I messed up the systemd-networkd syntax by adding the expanded netmask in the Address= statement instead of a separate Netmask= statement, oops.
Puppet applied and now the elasticsearch cluster is rebalancing data between nodes.
root@esnode7:~# while true; do date; df -h /srv/elasticsearch/nodes/; sleep 10; doneThu 23 Mar 2023 03:19:59 PM UTCFilesystem Size Used Avail Use% Mounted onelasticsearch/data 14T 25G 14T 1% /srv/elasticsearch/nodesThu 23 Mar 2023 03:20:09 PM UTCFilesystem Size Used Avail Use% Mounted onelasticsearch/data 14T 26G 14T 1% /srv/elasticsearch/nodesThu 23 Mar 2023 03:20:19 PM UTCFilesystem Size Used Avail Use% Mounted onelasticsearch/data 14T 28G 14T 1% /srv/elasticsearch/nodes
Antoine R. Dumontchanged title from Install the new bare metal server(s) for ELK cluster to Install the new bare metal server esnode + configure it so it's part of the production ELK cluster
changed title from Install the new bare metal server(s) for ELK cluster to Install the new bare metal server esnode + configure it so it's part of the production ELK cluster
Next time, deactivate puttet and apply the changes (add new node) gradually, waiting in between for the cluster to get back from orange to green prior to execute yet another puppet run (in another cluster node).
Here, what happens is that the puppet change got applied to the new node which went fine (it started receiving shards).
And then puppet executed on the other nodes at around the same window period span (automatically), which made the cluster go red (as es restarted on all nodes but 1). It took some time to get back to green after that.