rancher-highmem02 has a flappy network connection
The flappy network connection was a bit of a pain to kickstart the snapshot copy from highmem01 to highmem02 but it seems that starting it kept it steadily enough for the copy to go through.
[1] flappy network interface up/down
root@rancher-node-highmem01:~# journalctl -kf
Jan 23 14:13:06 rancher-node-highmem01 kernel: IPv6: ADDRCONF(NETDEV_CHANGE): cali165b4fab219: link becomes ready
Jan 23 14:20:34 rancher-node-highmem01 kernel: IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
Jan 23 14:20:34 rancher-node-highmem01 kernel: IPv6: ADDRCONF(NETDEV_CHANGE): cali9c85a6a28fe: link becomes ready
Jan 23 14:40:07 rancher-node-highmem01 kernel: IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
Jan 23 14:40:07 rancher-node-highmem01 kernel: IPv6: ADDRCONF(NETDEV_CHANGE): cali15bb5183802: link becomes ready
Jan 23 14:56:08 rancher-node-highmem01 kernel: IPv6: ADDRCONF(NETDEV_CHANGE): calieb9117716fa: link becomes ready
Jan 23 15:13:05 rancher-node-highmem01 kernel: IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
Jan 23 15:13:05 rancher-node-highmem01 kernel: IPv6: ADDRCONF(NETDEV_CHANGE): cali70f4319825a: link becomes ready
Jan 23 15:14:04 rancher-node-highmem01 kernel: IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
Jan 23 15:14:04 rancher-node-highmem01 kernel: IPv6: ADDRCONF(NETDEV_CHANGE): cali47df0296e41: link becomes ready
[2] might be related to disk issue or something
root@rancher-node-highmem01:~# systemctl list-units --failed
UNIT LOAD ACTIVE SUB DESCRIPTION
● smartmontools.service loaded failed failed Self Monitoring and Reporting Technology (SMART) Daemon
LOAD = Reflects whether the unit definition was properly loaded.
ACTIVE = The high-level unit activation state, i.e. generalization of SUB.
SUB = The low-level unit activation state, values depend on unit type.
1 loaded units listed.
root@rancher-node-highmem01:~# systemctl status smartmontools.service
× smartmontools.service - Self Monitoring and Reporting Technology (SMART) Daemon
Loaded: loaded (/lib/systemd/system/smartmontools.service; enabled; preset: enabled)
Active: failed (Result: timeout) since Mon 2025-01-27 14:07:37 UTC; 2h 14min ago
Docs: man:smartd(8)
man:smartd.conf(5)
Process: 2258907 ExecStart=/usr/sbin/smartd -n $smartd_opts (code=killed, signal=TERM)
Main PID: 2258907 (code=killed, signal=TERM)
Status: "Initializing ..."
CPU: 28ms
Jan 27 14:06:03 rancher-node-highmem01 smartd[2258907]: smartd 7.3 2022-02-28 r5338 [x86_64-linux-6.1.0-30-amd64] (local build)
Jan 27 14:06:03 rancher-node-highmem01 smartd[2258907]: Copyright (C) 2002-22, Bruce Allen, Christian Franke, www.smartmontools.org
Jan 27 14:06:03 rancher-node-highmem01 smartd[2258907]: Opened configuration file /etc/smartd.conf
Jan 27 14:06:03 rancher-node-highmem01 smartd[2258907]: Drive: DEVICESCAN, implied '-a' Directive on line 21 of file /etc/smartd.conf
Jan 27 14:06:03 rancher-node-highmem01 smartd[2258907]: Configuration file /etc/smartd.conf was parsed, found DEVICESCAN, scanning devices
Jan 27 14:06:03 rancher-node-highmem01 smartd[2258907]: Device: /dev/sda, opened
Jan 27 14:06:34 rancher-node-highmem01 smartd[2258907]: Device: /dev/sda, [SEAGATE ST16000NM010G ESL5], lu id: 0x5000c500cb4677e7, S/N: ZL2CFHNC, 16.0 TB
Jan 27 14:07:33 rancher-node-highmem01 systemd[1]: smartmontools.service: start operation timed out. Terminating.
Jan 27 14:07:37 rancher-node-highmem01 systemd[1]: smartmontools.service: Failed with result 'timeout'.
Jan 27 14:07:37 rancher-node-highmem01 systemd[1]: Failed to start smartmontools.service - Self Monitoring and Reporting Technology (SMART) Daemon.
Activity
-
Newest first Oldest first
-
Show all activity Show comments only Show history only
- Author Owner
At the end of the zfs transfer, the network connection is still flappy... [1] (but the transfer went fine nonetheless)
Nothing more to show in journal or dmesg command though. [2]
[1]
root@rancher-node-highmem02:~# date Tue Jan 28 08:10:19 AM UTC 2025 root@rancher-node-highmem02:~# ssh root@rancher-node-highmem01 date ssh: connect to host rancher-node-highmem01 port 22: No route to host root@rancher-node-highmem02:~# ping rancher-node-highmem01 PING rancher-node-highmem01.internal.softwareheritage.org (192.168.100.136) 56(84) bytes of data. 64 bytes from rancher-node-highmem01.internal.softwareheritage.org (192.168.100.136): icmp_seq=1 ttl=64 time=0.232 ms 64 bytes from rancher-node-highmem01.internal.softwareheritage.org (192.168.100.136): icmp_seq=2 ttl=64 time=0.415 ms ^C --- rancher-node-highmem01.internal.softwareheritage.org ping statistics --- 2 packets transmitted, 2 received, 0% packet loss, time 1000ms rtt min/avg/max/mdev = 0.232/0.323/0.415/0.091 ms root@rancher-node-highmem02:~# ssh root@rancher-node-highmem01 date Tue Jan 28 08:10:34 AM UTC 2025 ... root@rancher-node-highmem02:~# ssh root@rancher-node-highmem01 date ssh: connect to host rancher-node-highmem01 port 22: No route to host root@rancher-node-highmem02:~# ping rancher-node-highmem01 PING rancher-node-highmem01.internal.softwareheritage.org (192.168.100.136) 56(84) bytes of data. From rancher-node-highmem02.internal.softwareheritage.org (192.168.100.135) icmp_seq=1 Destination Host Unreachable From rancher-node-highmem02.internal.softwareheritage.org (192.168.100.135) icmp_seq=2 Destination Host Unreachable ^C --- rancher-node-highmem01.internal.softwareheritage.org ping statistics --- 4 packets transmitted, 0 received, +2 errors, 100% packet loss, time 3041ms pipe 2 root@rancher-node-highmem02:~# date Tue Jan 28 08:11:50 AM UTC 2025
[2] logs
root@rancher-node-highmem01:~# dmesg --human | tail -20 [Jan18 18:49] sd 0:0:3:0: Power-on or device reset occurred [Jan19 08:02] perf: interrupt took too long (4962 > 4941), lowering kernel.perf_event_max_sample_rate to 40250 [Jan20 00:04] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready [ +0.000144] IPv6: ADDRCONF(NETDEV_CHANGE): calicd719c6a34b: link becomes ready [Jan22 04:21] perf: interrupt took too long (6211 > 6202), lowering kernel.perf_event_max_sample_rate to 32000 [Jan23 01:07] systemd-journald[26921]: Data hash table of /var/log/journal/8c55c8823ccc4c21a70646feece1d55a/system.journal has a fill level at 75.0 (174764 of 233016 items, 75497472 file size, 431 bytes per hash table item), suggesting rotation. [ +0.000013] systemd-journald[26921]: /var/log/journal/8c55c8823ccc4c21a70646feece1d55a/system.journal: Journal header limits reached or header out-of-date, rotating. [Jan23 14:12] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready [ +0.000134] IPv6: ADDRCONF(NETDEV_CHANGE): cali61a6e6f90ff: link becomes ready [ +7.025258] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready [ +0.000119] IPv6: ADDRCONF(NETDEV_CHANGE): cali165b4fab219: link becomes ready [Jan23 14:20] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready [ +0.000116] IPv6: ADDRCONF(NETDEV_CHANGE): cali9c85a6a28fe: link becomes ready [Jan23 14:39] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready [ +0.000132] IPv6: ADDRCONF(NETDEV_CHANGE): cali15bb5183802: link becomes ready [Jan23 14:55] IPv6: ADDRCONF(NETDEV_CHANGE): calieb9117716fa: link becomes ready [Jan23 15:12] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready [ +0.000133] IPv6: ADDRCONF(NETDEV_CHANGE): cali70f4319825a: link becomes ready [Jan23 15:13] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready [ +0.000112] IPv6: ADDRCONF(NETDEV_CHANGE): cali47df0296e41: link becomes ready root@rancher-node-highmem01:~# journalctl -xek | tail -20 Jan 18 18:49:22 rancher-node-highmem01 kernel: sd 0:0:3:0: Power-on or device reset occurred Jan 19 08:02:37 rancher-node-highmem01 kernel: perf: interrupt took too long (4962 > 4941), lowering kernel.perf_event_max_sample_rate to 40250 Jan 20 00:05:00 rancher-node-highmem01 kernel: IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready Jan 20 00:05:00 rancher-node-highmem01 kernel: IPv6: ADDRCONF(NETDEV_CHANGE): calicd719c6a34b: link becomes ready Jan 22 04:21:24 rancher-node-highmem01 kernel: perf: interrupt took too long (6211 > 6202), lowering kernel.perf_event_max_sample_rate to 32000 Jan 23 01:08:20 rancher-node-highmem01 systemd-journald[26921]: Data hash table of /var/log/journal/8c55c8823ccc4c21a70646feece1d55a/system.journal has a fill level at 75.0 (174764 of 233016 items, 75497472 file size, 431 bytes per hash table item), suggesting rotation. Jan 23 01:08:20 rancher-node-highmem01 systemd-journald[26921]: /var/log/journal/8c55c8823ccc4c21a70646feece1d55a/system.journal: Journal header limits reached or header out-of-date, rotating. Jan 23 14:12:59 rancher-node-highmem01 kernel: IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready Jan 23 14:12:59 rancher-node-highmem01 kernel: IPv6: ADDRCONF(NETDEV_CHANGE): cali61a6e6f90ff: link becomes ready Jan 23 14:13:06 rancher-node-highmem01 kernel: IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready Jan 23 14:13:06 rancher-node-highmem01 kernel: IPv6: ADDRCONF(NETDEV_CHANGE): cali165b4fab219: link becomes ready Jan 23 14:20:34 rancher-node-highmem01 kernel: IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready Jan 23 14:20:34 rancher-node-highmem01 kernel: IPv6: ADDRCONF(NETDEV_CHANGE): cali9c85a6a28fe: link becomes ready Jan 23 14:40:07 rancher-node-highmem01 kernel: IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready Jan 23 14:40:07 rancher-node-highmem01 kernel: IPv6: ADDRCONF(NETDEV_CHANGE): cali15bb5183802: link becomes ready Jan 23 14:56:08 rancher-node-highmem01 kernel: IPv6: ADDRCONF(NETDEV_CHANGE): calieb9117716fa: link becomes ready Jan 23 15:13:05 rancher-node-highmem01 kernel: IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready Jan 23 15:13:05 rancher-node-highmem01 kernel: IPv6: ADDRCONF(NETDEV_CHANGE): cali70f4319825a: link becomes ready Jan 23 15:14:04 rancher-node-highmem01 kernel: IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready Jan 23 15:14:04 rancher-node-highmem01 kernel: IPv6: ADDRCONF(NETDEV_CHANGE): cali47df0296e41: link becomes ready
Edited by Antoine R. Dumont - Antoine R. Dumont changed the description
changed the description
- Author Owner
It seems the rancher-node-highmem01 is fine in the end and that's highmem02 which has its network config flappy.
Here is a
journal -xek
output from its uptime filtered with only network related entries.journal -xek
... Jan 28 10:51:57 rancher-node-highmem02 kernel: bnxt_en 0000:63:00.0 eth0: Broadcom BCM57414 NetXtreme-E 10Gb/25Gb Ethernet found at mem df210000, node addr e4:3d:1a:6e:3b:b0 Jan 28 10:51:57 rancher-node-highmem02 kernel: bnxt_en 0000:63:00.0: 63.008 Gb/s available PCIe bandwidth (8.0 GT/s PCIe x8 link) Jan 28 10:51:57 rancher-node-highmem02 kernel: bnxt_en 0000:63:00.1 (unnamed net_device) (uninitialized): Device requests max timeout of 100 seconds, may trigger hung task watchdog ... Jan 28 10:51:57 rancher-node-highmem02 kernel: tg3 0000:e1:00.0 eth1: Tigon3 [partno(BCM95720) rev 5720000] (PCI Express) MAC address b0:7b:25:d4:b7:a4 Jan 28 10:51:57 rancher-node-highmem02 kernel: tg3 0000:e1:00.0 eth1: attached PHY is 5720C (10/100/1000Base-T Ethernet) (WireSpeed[1], EEE[1]) Jan 28 10:51:57 rancher-node-highmem02 kernel: tg3 0000:e1:00.0 eth1: RXcsums[1] LinkChgREG[0] MIirq[0] ASF[1] TSOcap[1] Jan 28 10:51:57 rancher-node-highmem02 kernel: tg3 0000:e1:00.0 eth1: dma_rwctrl[00000001] dma_mask[64-bit] ... Jan 28 10:51:57 rancher-node-highmem02 kernel: tg3 0000:e1:00.1 eth3: Tigon3 [partno(BCM95720) rev 5720000] (PCI Express) MAC address b0:7b:25:d4:b7:a5 Jan 28 10:51:57 rancher-node-highmem02 kernel: tg3 0000:e1:00.1 eth3: attached PHY is 5720C (10/100/1000Base-T Ethernet) (WireSpeed[1], EEE[1]) Jan 28 10:51:57 rancher-node-highmem02 kernel: tg3 0000:e1:00.1 eth3: RXcsums[1] LinkChgREG[0] MIirq[0] ASF[1] TSOcap[1] Jan 28 10:51:57 rancher-node-highmem02 kernel: tg3 0000:e1:00.1 eth3: dma_rwctrl[00000001] dma_mask[64-bit] Jan 28 10:51:57 rancher-node-highmem02 kernel: tg3 0000:e1:00.1 eno8403: renamed from eth3 ... Jan 28 10:51:57 rancher-node-highmem02 kernel: tg3 0000:e1:00.0 eno8303: renamed from eth1 ... Jan 28 10:51:57 rancher-node-highmem02 kernel: bnxt_en 0000:63:00.1 eno12409np1: renamed from eth2 Jan 28 10:51:57 rancher-node-highmem02 kernel: bnxt_en 0000:63:00.0 eno12399np0: renamed from eth0 ... Jan 28 10:51:57 rancher-node-highmem02 kernel: bnxt_en 0000:63:00.1 eno12409np1: NIC Link is Up, 10000 Mbps (NRZ) full duplex, Flow control: none Jan 28 10:51:57 rancher-node-highmem02 kernel: bnxt_en 0000:63:00.1 eno12409np1: FEC autoneg off encoding: None Jan 28 10:51:57 rancher-node-highmem02 kernel: Adding 999420k swap on /dev/mapper/rancher--node--highmem02--vg-swap_1. Priority:-2 extents:1 across:999420k FS Jan 28 10:51:57 rancher-node-highmem02 kernel: bond0: (slave eno12409np1): Enslaving as a backup interface with an up link Jan 28 10:51:57 rancher-node-highmem02 kernel: bnxt_en 0000:63:00.0 eno12399np0: NIC Link is Up, 10000 Mbps (NRZ) full duplex, Flow control: none Jan 28 10:51:57 rancher-node-highmem02 kernel: bnxt_en 0000:63:00.0 eno12399np0: FEC autoneg off encoding: None Jan 28 10:51:57 rancher-node-highmem02 kernel: bond0: (slave eno12399np0): Enslaving as a backup interface with an up link Jan 28 10:51:57 rancher-node-highmem02 kernel: IPv6: ADDRCONF(NETDEV_CHANGE): bond0: link becomes ready ... Jan 28 10:52:01 rancher-node-highmem02 kernel: audit: type=1400 audit(1738061521.150:10): apparmor="STATUS" operation="profile_load" profile="unconfined" name="/usr/lib/NetworkManager/nm-dhcp-client.action" pid=3885 comm="apparmor_parser" Jan 28 10:52:01 rancher-node-highmem02 kernel: audit: type=1400 audit(1738061521.150:11): apparmor="STATUS" operation="profile_load" profile="unconfined" name="/usr/lib/NetworkManager/nm-dhcp-helper" pid=3885 comm="apparmor_parser" ... Jan 28 10:52:01 rancher-node-highmem02 kernel: RPC: Registered named UNIX socket transport module. Jan 28 10:52:01 rancher-node-highmem02 kernel: RPC: Registered udp transport module. Jan 28 10:52:01 rancher-node-highmem02 kernel: RPC: Registered tcp transport module. Jan 28 10:52:01 rancher-node-highmem02 kernel: RPC: Registered tcp NFSv4.1 backchannel transport module. Jan 28 10:52:10 rancher-node-highmem02 kernel: bridge: filtering via arp/ip/ip6tables is no longer available by default. Update your scripts to load br_netfilter if you need this. Jan 28 10:52:10 rancher-node-highmem02 kernel: Bridge firewalling registered Jan 28 10:52:17 rancher-node-highmem02 kernel: kauditd_printk_skb: 2 callbacks suppressed Jan 28 10:52:17 rancher-node-highmem02 kernel: audit: type=1400 audit(1738061537.232:14): apparmor="STATUS" operation="profile_load" profile="unconfined" name="cri-containerd.apparmor.d" pid=15766 comm="apparmor_parser" Jan 28 10:52:20 rancher-node-highmem02 kernel: wireguard: WireGuard 1.0.0 loaded. See www.wireguard.com for information. Jan 28 10:52:20 rancher-node-highmem02 kernel: wireguard: Copyright (C) 2015-2019 Jason A. Donenfeld <Jason@zx2c4.com>. All Rights Reserved. Jan 28 10:52:21 rancher-node-highmem02 kernel: Initializing XFRM netlink socket Jan 28 10:52:21 rancher-node-highmem02 kernel: IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready Jan 28 10:52:21 rancher-node-highmem02 kernel: IPv6: ADDRCONF(NETDEV_CHANGE): cali9c149628661: link becomes ready Jan 28 10:52:36 rancher-node-highmem02 kernel: Key type ceph registered Jan 28 10:52:36 rancher-node-highmem02 kernel: libceph: loaded (mon/osd proto 15/24) Jan 28 10:52:36 rancher-node-highmem02 kernel: rbd: loaded (major 253) Jan 28 10:53:51 rancher-node-highmem02 kernel: usb 3-1.2: USB disconnect, device number 5 Jan 28 10:55:57 rancher-node-highmem02 kernel: IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready Jan 28 10:55:57 rancher-node-highmem02 kernel: IPv6: ADDRCONF(NETDEV_CHANGE): caliaa283cb1cca: link becomes ready Jan 28 10:55:57 rancher-node-highmem02 kernel: IPv6: ADDRCONF(NETDEV_CHANGE): cali4a3f3be9b2c: link becomes ready ... Jan 28 11:23:43 rancher-node-highmem02 kernel: device vlan440 entered promiscuous mode Jan 28 11:23:43 rancher-node-highmem02 kernel: device bond0 entered promiscuous mode Jan 28 11:23:43 rancher-node-highmem02 kernel: device eno12409np1 entered promiscuous mode Jan 28 11:23:43 rancher-node-highmem02 kernel: device eno12399np0 entered promiscuous mode Jan 28 11:23:48 rancher-node-highmem02 kernel: device vlan440 left promiscuous mode Jan 28 11:23:48 rancher-node-highmem02 kernel: device bond0 left promiscuous mode Jan 28 11:23:48 rancher-node-highmem02 kernel: device eno12409np1 left promiscuous mode Jan 28 11:23:48 rancher-node-highmem02 kernel: device eno12399np0 left promiscuous mode Jan 28 11:24:10 rancher-node-highmem02 kernel: device vlan440 entered promiscuous mode Jan 28 11:24:10 rancher-node-highmem02 kernel: device bond0 entered promiscuous mode Jan 28 11:24:10 rancher-node-highmem02 kernel: device eno12409np1 entered promiscuous mode Jan 28 11:24:10 rancher-node-highmem02 kernel: device eno12399np0 entered promiscuous mode Jan 28 11:24:17 rancher-node-highmem02 kernel: device vlan440 left promiscuous mode Jan 28 11:24:17 rancher-node-highmem02 kernel: device bond0 left promiscuous mode Jan 28 11:24:17 rancher-node-highmem02 kernel: device eno12409np1 left promiscuous mode Jan 28 11:24:17 rancher-node-highmem02 kernel: device eno12399np0 left promiscuous mode Jan 28 11:24:34 rancher-node-highmem02 kernel: device vlan440 entered promiscuous mode Jan 28 11:24:34 rancher-node-highmem02 kernel: device bond0 entered promiscuous mode Jan 28 11:24:34 rancher-node-highmem02 kernel: device eno12409np1 entered promiscuous mode Jan 28 11:24:34 rancher-node-highmem02 kernel: device eno12399np0 entered promiscuous mode Jan 28 11:25:50 rancher-node-highmem02 kernel: device vlan440 left promiscuous mode Jan 28 11:25:50 rancher-node-highmem02 kernel: device bond0 left promiscuous mode Jan 28 11:25:50 rancher-node-highmem02 kernel: device eno12409np1 left promiscuous mode Jan 28 11:25:50 rancher-node-highmem02 kernel: device eno12399np0 left promiscuous mode Jan 28 11:35:48 rancher-node-highmem02 kernel: device vlan440 entered promiscuous mode Jan 28 11:35:48 rancher-node-highmem02 kernel: device bond0 entered promiscuous mode Jan 28 11:35:48 rancher-node-highmem02 kernel: device eno12409np1 entered promiscuous mode Jan 28 11:35:48 rancher-node-highmem02 kernel: device eno12399np0 entered promiscuous mode Jan 28 11:36:13 rancher-node-highmem02 kernel: device vlan440 left promiscuous mode Jan 28 11:36:13 rancher-node-highmem02 kernel: device bond0 left promiscuous mode Jan 28 11:36:13 rancher-node-highmem02 kernel: device eno12409np1 left promiscuous mode Jan 28 11:36:13 rancher-node-highmem02 kernel: device eno12399np0 left promiscuous mode Jan 28 11:40:08 rancher-node-highmem02 kernel: device vlan440 entered promiscuous mode Jan 28 11:40:08 rancher-node-highmem02 kernel: device bond0 entered promiscuous mode Jan 28 11:40:08 rancher-node-highmem02 kernel: device eno12409np1 entered promiscuous mode Jan 28 11:40:08 rancher-node-highmem02 kernel: device eno12399np0 entered promiscuous mode Jan 28 11:40:47 rancher-node-highmem02 kernel: device vlan440 left promiscuous mode Jan 28 11:40:47 rancher-node-highmem02 kernel: device bond0 left promiscuous mode Jan 28 11:40:47 rancher-node-highmem02 kernel: device eno12409np1 left promiscuous mode Jan 28 11:40:47 rancher-node-highmem02 kernel: device eno12399np0 left promiscuous mode Jan 28 11:44:53 rancher-node-highmem02 kernel: device vlan440 entered promiscuous mode Jan 28 11:44:53 rancher-node-highmem02 kernel: device bond0 entered promiscuous mode Jan 28 11:44:53 rancher-node-highmem02 kernel: device eno12409np1 entered promiscuous mode Jan 28 11:44:53 rancher-node-highmem02 kernel: device eno12399np0 entered promiscuous mode Jan 28 11:44:58 rancher-node-highmem02 kernel: device vlan440 left promiscuous mode Jan 28 11:44:58 rancher-node-highmem02 kernel: device bond0 left promiscuous mode Jan 28 11:44:58 rancher-node-highmem02 kernel: device eno12409np1 left promiscuous mode Jan 28 11:44:58 rancher-node-highmem02 kernel: device eno12399np0 left promiscuous mode Jan 28 11:45:17 rancher-node-highmem02 kernel: device vlan440 entered promiscuous mode Jan 28 11:45:17 rancher-node-highmem02 kernel: device bond0 entered promiscuous mode Jan 28 11:45:17 rancher-node-highmem02 kernel: device eno12409np1 entered promiscuous mode Jan 28 11:45:17 rancher-node-highmem02 kernel: device eno12399np0 entered promiscuous mode Jan 28 11:45:35 rancher-node-highmem02 kernel: device vlan440 left promiscuous mode Jan 28 11:45:35 rancher-node-highmem02 kernel: device bond0 left promiscuous mode Jan 28 11:45:35 rancher-node-highmem02 kernel: device eno12409np1 left promiscuous mode Jan 28 11:45:35 rancher-node-highmem02 kernel: device eno12399np0 left promiscuous mode Jan 28 11:45:40 rancher-node-highmem02 kernel: device vlan440 entered promiscuous mode Jan 28 11:45:40 rancher-node-highmem02 kernel: device bond0 entered promiscuous mode Jan 28 11:45:40 rancher-node-highmem02 kernel: device eno12409np1 entered promiscuous mode Jan 28 11:45:40 rancher-node-highmem02 kernel: device eno12399np0 entered promiscuous mode Jan 28 11:45:49 rancher-node-highmem02 kernel: device vlan440 left promiscuous mode Jan 28 11:45:49 rancher-node-highmem02 kernel: device bond0 left promiscuous mode Jan 28 11:45:49 rancher-node-highmem02 kernel: device eno12409np1 left promiscuous mode Jan 28 11:45:49 rancher-node-highmem02 kernel: device eno12399np0 left promiscuous mode Jan 28 11:46:04 rancher-node-highmem02 kernel: device vlan440 entered promiscuous mode Jan 28 11:46:04 rancher-node-highmem02 kernel: device bond0 entered promiscuous mode Jan 28 11:46:04 rancher-node-highmem02 kernel: device eno12409np1 entered promiscuous mode Jan 28 11:46:04 rancher-node-highmem02 kernel: device eno12399np0 entered promiscuous mode Jan 28 11:46:46 rancher-node-highmem02 kernel: device vlan440 left promiscuous mode Jan 28 11:46:46 rancher-node-highmem02 kernel: device bond0 left promiscuous mode Jan 28 11:46:46 rancher-node-highmem02 kernel: device eno12409np1 left promiscuous mode Jan 28 11:46:46 rancher-node-highmem02 kernel: device eno12399np0 left promiscuous mode Jan 28 11:52:10 rancher-node-highmem02 kernel: device vlan440 entered promiscuous mode Jan 28 11:52:10 rancher-node-highmem02 kernel: device bond0 entered promiscuous mode Jan 28 11:52:10 rancher-node-highmem02 kernel: device eno12409np1 entered promiscuous mode Jan 28 11:52:10 rancher-node-highmem02 kernel: device eno12399np0 entered promiscuous mode Jan 28 11:52:19 rancher-node-highmem02 kernel: device vlan440 left promiscuous mode Jan 28 11:52:19 rancher-node-highmem02 kernel: device bond0 left promiscuous mode Jan 28 11:52:19 rancher-node-highmem02 kernel: device eno12409np1 left promiscuous mode Jan 28 11:52:19 rancher-node-highmem02 kernel: device eno12399np0 left promiscuous mode Jan 28 11:52:23 rancher-node-highmem02 kernel: device vlan440 entered promiscuous mode Jan 28 11:52:23 rancher-node-highmem02 kernel: device bond0 entered promiscuous mode Jan 28 11:52:23 rancher-node-highmem02 kernel: device eno12409np1 entered promiscuous mode Jan 28 11:52:23 rancher-node-highmem02 kernel: device eno12399np0 entered promiscuous mode Jan 28 11:52:34 rancher-node-highmem02 kernel: device vlan440 left promiscuous mode Jan 28 11:52:34 rancher-node-highmem02 kernel: device bond0 left promiscuous mode Jan 28 11:52:34 rancher-node-highmem02 kernel: device eno12409np1 left promiscuous mode Jan 28 11:52:34 rancher-node-highmem02 kernel: device eno12399np0 left promiscuous mode Jan 28 11:53:18 rancher-node-highmem02 kernel: device vlan440 entered promiscuous mode Jan 28 11:53:18 rancher-node-highmem02 kernel: device bond0 entered promiscuous mode Jan 28 11:53:18 rancher-node-highmem02 kernel: device eno12409np1 entered promiscuous mode Jan 28 11:53:18 rancher-node-highmem02 kernel: device eno12399np0 entered promiscuous mode Jan 28 11:53:32 rancher-node-highmem02 kernel: device vlan440 left promiscuous mode Jan 28 11:53:32 rancher-node-highmem02 kernel: device bond0 left promiscuous mode Jan 28 11:53:32 rancher-node-highmem02 kernel: device eno12409np1 left promiscuous mode Jan 28 11:53:32 rancher-node-highmem02 kernel: device eno12399np0 left promiscuous mode Jan 28 11:53:47 rancher-node-highmem02 kernel: device vlan440 entered promiscuous mode Jan 28 11:53:47 rancher-node-highmem02 kernel: device bond0 entered promiscuous mode Jan 28 11:53:47 rancher-node-highmem02 kernel: device eno12409np1 entered promiscuous mode Jan 28 11:53:47 rancher-node-highmem02 kernel: device eno12399np0 entered promiscuous mode Jan 28 11:54:02 rancher-node-highmem02 kernel: device vlan440 left promiscuous mode Jan 28 11:54:02 rancher-node-highmem02 kernel: device bond0 left promiscuous mode Jan 28 11:54:02 rancher-node-highmem02 kernel: device eno12409np1 left promiscuous mode Jan 28 11:54:02 rancher-node-highmem02 kernel: device eno12399np0 left promiscuous mode Jan 28 13:17:00 rancher-node-highmem02 kernel: IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready Jan 28 13:17:00 rancher-node-highmem02 kernel: IPv6: ADDRCONF(NETDEV_CHANGE): calif6fbbf950bf: link becomes ready Jan 28 15:23:51 rancher-node-highmem02 kernel: device vlan440 entered promiscuous mode Jan 28 15:23:51 rancher-node-highmem02 kernel: device bond0 entered promiscuous mode Jan 28 15:23:51 rancher-node-highmem02 kernel: device eno12409np1 entered promiscuous mode Jan 28 15:23:51 rancher-node-highmem02 kernel: device eno12399np0 entered promiscuous mode Jan 28 15:25:23 rancher-node-highmem02 kernel: device vlan440 left promiscuous mode Jan 28 15:25:23 rancher-node-highmem02 kernel: device bond0 left promiscuous mode Jan 28 15:25:23 rancher-node-highmem02 kernel: device eno12409np1 left promiscuous mode Jan 28 15:25:23 rancher-node-highmem02 kernel: device eno12399np0 left promiscuous mode Jan 28 15:29:49 rancher-node-highmem02 kernel: device vlan440 entered promiscuous mode Jan 28 15:29:49 rancher-node-highmem02 kernel: device bond0 entered promiscuous mode Jan 28 15:29:49 rancher-node-highmem02 kernel: device eno12409np1 entered promiscuous mode Jan 28 15:29:49 rancher-node-highmem02 kernel: device eno12399np0 entered promiscuous mode Jan 28 15:29:55 rancher-node-highmem02 kernel: device vlan440 left promiscuous mode Jan 28 15:29:55 rancher-node-highmem02 kernel: device bond0 left promiscuous mode Jan 28 15:29:55 rancher-node-highmem02 kernel: device eno12409np1 left promiscuous mode Jan 28 15:29:55 rancher-node-highmem02 kernel: device eno12399np0 left promiscuous mode Jan 28 15:30:00 rancher-node-highmem02 kernel: device vlan440 entered promiscuous mode Jan 28 15:30:00 rancher-node-highmem02 kernel: device bond0 entered promiscuous mode Jan 28 15:30:00 rancher-node-highmem02 kernel: device eno12409np1 entered promiscuous mode Jan 28 15:30:00 rancher-node-highmem02 kernel: device eno12399np0 entered promiscuous mode Jan 28 15:30:46 rancher-node-highmem02 kernel: device vlan440 left promiscuous mode Jan 28 15:30:46 rancher-node-highmem02 kernel: device bond0 left promiscuous mode Jan 28 15:30:46 rancher-node-highmem02 kernel: device eno12409np1 left promiscuous mode Jan 28 15:30:46 rancher-node-highmem02 kernel: device eno12399np0 left promiscuous mode Jan 28 15:30:51 rancher-node-highmem02 kernel: device vlan440 entered promiscuous mode Jan 28 15:30:51 rancher-node-highmem02 kernel: device bond0 entered promiscuous mode Jan 28 15:30:51 rancher-node-highmem02 kernel: device eno12409np1 entered promiscuous mode Jan 28 15:30:51 rancher-node-highmem02 kernel: device eno12399np0 entered promiscuous mode Jan 28 15:32:05 rancher-node-highmem02 kernel: device vlan440 left promiscuous mode Jan 28 15:32:05 rancher-node-highmem02 kernel: device bond0 left promiscuous mode Jan 28 15:32:05 rancher-node-highmem02 kernel: device eno12409np1 left promiscuous mode Jan 28 15:32:05 rancher-node-highmem02 kernel: device eno12399np0 left promiscuous mode Jan 28 15:34:45 rancher-node-highmem02 kernel: device vlan440 entered promiscuous mode Jan 28 15:34:45 rancher-node-highmem02 kernel: device bond0 entered promiscuous mode Jan 28 15:34:45 rancher-node-highmem02 kernel: device eno12409np1 entered promiscuous mode Jan 28 15:34:45 rancher-node-highmem02 kernel: device eno12399np0 entered promiscuous mode Jan 28 15:34:52 rancher-node-highmem02 kernel: device vlan440 left promiscuous mode Jan 28 15:34:52 rancher-node-highmem02 kernel: device bond0 left promiscuous mode Jan 28 15:34:52 rancher-node-highmem02 kernel: device eno12409np1 left promiscuous mode Jan 28 15:34:52 rancher-node-highmem02 kernel: device eno12399np0 left promiscuous mode Jan 28 15:34:56 rancher-node-highmem02 kernel: device vlan440 entered promiscuous mode Jan 28 15:34:56 rancher-node-highmem02 kernel: device bond0 entered promiscuous mode Jan 28 15:34:56 rancher-node-highmem02 kernel: device eno12409np1 entered promiscuous mode Jan 28 15:34:56 rancher-node-highmem02 kernel: device eno12399np0 entered promiscuous mode Jan 28 15:41:28 rancher-node-highmem02 kernel: device vlan440 left promiscuous mode Jan 28 15:41:28 rancher-node-highmem02 kernel: device bond0 left promiscuous mode Jan 28 15:41:28 rancher-node-highmem02 kernel: device eno12409np1 left promiscuous mode Jan 28 15:41:28 rancher-node-highmem02 kernel: device eno12399np0 left promiscuous mode Jan 28 15:52:25 rancher-node-highmem02 kernel: device vlan440 entered promiscuous mode Jan 28 15:52:25 rancher-node-highmem02 kernel: device bond0 entered promiscuous mode Jan 28 15:52:25 rancher-node-highmem02 kernel: device eno12409np1 entered promiscuous mode Jan 28 15:52:25 rancher-node-highmem02 kernel: device eno12399np0 entered promiscuous mode Jan 28 15:52:47 rancher-node-highmem02 kernel: device vlan440 left promiscuous mode Jan 28 15:52:47 rancher-node-highmem02 kernel: device bond0 left promiscuous mode Jan 28 15:52:47 rancher-node-highmem02 kernel: device eno12409np1 left promiscuous mode Jan 28 15:52:47 rancher-node-highmem02 kernel: device eno12399np0 left promiscuous mode
Edited by Antoine R. Dumont - Author Owner
fwiw, here its configuration prior its reinstallation as highmem02. It's the same now. The thing is I did not notice at the tome so i can't really know whether it demonstrated the same issue prior to its reinstallation.
ip a
root@rancher-node-metal05:~# ip a 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever inet6 ::1/128 scope host valid_lft forever preferred_lft forever 2: eno12399np0: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc mq master bond0 state UP group default qlen 1000 link/ether 9a:d6:9b:7f:d9:73 brd ff:ff:ff:ff:ff:ff permaddr e4:3d:1a:6e:3b:b0 altname enp99s0f0np0 3: eno12409np1: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc mq master bond0 state UP group default qlen 1000 link/ether 9a:d6:9b:7f:d9:73 brd ff:ff:ff:ff:ff:ff permaddr e4:3d:1a:6e:3b:b1 altname enp99s0f1np1 4: eno8303: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000 link/ether b0:7b:25:d4:b7:a4 brd ff:ff:ff:ff:ff:ff altname enp225s0f0 5: eno8403: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000 link/ether b0:7b:25:d4:b7:a5 brd ff:ff:ff:ff:ff:ff altname enp225s0f1 6: bond0: <BROADCAST,MULTICAST,MASTER,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000 link/ether 9a:d6:9b:7f:d9:73 brd ff:ff:ff:ff:ff:ff 7: vlan440@bond0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000 link/ether 9a:d6:9b:7f:d9:73 brd ff:ff:ff:ff:ff:ff inet 192.168.100.135/24 brd 192.168.100.255 scope global vlan440 valid_lft forever preferred_lft forever inet6 fe80::98d6:9bff:fe7f:d973/64 scope link valid_lft forever preferred_lft forever 11: vxlan.calico: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UNKNOWN group default qlen 1000 link/ether 66:54:06:41:40:e1 brd ff:ff:ff:ff:ff:ff inet 10.42.240.128/32 scope global vxlan.calico valid_lft forever preferred_lft forever inet6 fe80::6454:6ff:fe41:40e1/64 scope link valid_lft forever preferred_lft forever 185421: calia53445d70ce@if2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UP group default qlen 1000 link/ether ee:ee:ee:ee:ee:ee brd ff:ff:ff:ff:ff:ff link-netns cni-acb7143b-eb0e-c645-05b3-aa5802801830 inet6 fe80::ecee:eeff:feee:eeee/64 scope link valid_lft forever preferred_lft forever 182112: cali0fd21937cf2@if2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UP group default qlen 1000 link/ether ee:ee:ee:ee:ee:ee brd ff:ff:ff:ff:ff:ff link-netns cni-98f908e0-1ad7-1545-31a9-0a247ef993ec inet6 fe80::ecee:eeff:feee:eeee/64 scope link valid_lft forever preferred_lft forever 185443: cali007e36d3045@if2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UP group default qlen 1000 link/ether ee:ee:ee:ee:ee:ee brd ff:ff:ff:ff:ff:ff link-netns cni-edeedb33-2bb9-e5d2-c7da-92c9fd27f7a6 inet6 fe80::ecee:eeff:feee:eeee/64 scope link valid_lft forever preferred_lft forever 185506: cali021f7ae2366@if2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UP group default qlen 1000 link/ether ee:ee:ee:ee:ee:ee brd ff:ff:ff:ff:ff:ff link-netns cni-f4a70f41-101f-7084-6d4c-2ad4c2b7601d inet6 fe80::ecee:eeff:feee:eeee/64 scope link valid_lft forever preferred_lft forever 185520: cali39128f82881@if2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UP group default qlen 1000 link/ether ee:ee:ee:ee:ee:ee brd ff:ff:ff:ff:ff:ff link-netns cni-425bae4c-3f49-6981-0da7-3816785c611f inet6 fe80::ecee:eeff:feee:eeee/64 scope link valid_lft forever preferred_lft forever 185524: cali180fc7f846e@if2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UP group default qlen 1000 link/ether ee:ee:ee:ee:ee:ee brd ff:ff:ff:ff:ff:ff link-netns cni-94426ada-10d8-0376-6764-39b3f92188f3 inet6 fe80::ecee:eeff:feee:eeee/64 scope link valid_lft forever preferred_lft forever 185525: cali2adb08d82ae@if2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UP group default qlen 1000 link/ether ee:ee:ee:ee:ee:ee brd ff:ff:ff:ff:ff:ff link-netns cni-09d2248f-d066-75ed-d983-3f4032324d84 inet6 fe80::ecee:eeff:feee:eeee/64 scope link valid_lft forever preferred_lft forever 185559: cali4ae89a10b71@if2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UP group default qlen 1000 link/ether ee:ee:ee:ee:ee:ee brd ff:ff:ff:ff:ff:ff link-netns cni-8d7ad7f1-2ac0-674d-b755-58dcc49a9f90 inet6 fe80::ecee:eeff:feee:eeee/64 scope link valid_lft forever preferred_lft forever
network configuration
root@rancher-node-metal05:~# cat /etc/systemd/network/00-bond0-interfaces.network [Match] Name=eno12399np0 Name=eno12409np1 [Network] Bond=bond0 root@rancher-node-metal05:~# cat /etc/systemd/network/10-bond0.netdev [NetDev] Name=bond0 Description=Public bonded interface Kind=bond [Bond] Mode=802.3ad TransmitHashPolicy=layer2+3 LACPTransmitRate=fast MIIMonitorSec=100ms DownDelaySec=200ms root@rancher-node-metal05:~# cat /etc/systemd/network/10-bond0.network [Match] Name=bond0 Type=bond [Network] VLAN=vlan440 LinkLocalAddressing=no LLDP=no EmitLLDP=no IPv6AcceptRA=no IPv6SendRA=no root@rancher-node-metal05:~# cat /etc/systemd/network/20-vlan440.netdev [NetDev] Name=vlan440 Description=Internal network vlan Kind=vlan [VLAN] Id=440 root@rancher-node-metal05:~# cat /etc/systemd/network/20-vlan440.network [Match] Name=vlan440 Type=vlan [Network] Description=Internal network Address=192.168.100.135/24 Gateway=192.168.100.1
Edited by Antoine R. Dumont - Antoine R. Dumont changed title from rancher-highmem01 has a flappy network connection to rancher-highmem02 has a flappy network connection
changed title from rancher-highmem01 has a flappy network connection to rancher-highmem02 has a flappy network connection
- Antoine R. Dumont changed the description
changed the description
- Author Owner
It's steadily failing to reach the esnode{1-3} nodes so that explains why the opentelemetry pod is just logging failure. Once in a while this ouputs shows other nodes (e.g. .136, .134).
Every 2.0s: ip neigh | awk '/FAILED/' rancher-node-highmem02: Tue Jan 28 16:33:42 2025 192.168.100.63 dev vlan440 FAILED -> esnode1 192.168.100.62 dev vlan440 FAILED -> esnode2 192.168.100.61 dev vlan440 FAILED -> esnode3 192.168.100.18 dev vlan440 FAILED -> banco ...
- Guillaume Samson assigned to @guillaume
assigned to @guillaume
- Owner
Some ARP entries of the vlan440 are unstable:
root@rancher-node-highmem02:~# ip neigh s dev vlan440 192.168.100.136 INCOMPLETE 192.168.100.143 INCOMPLETE 192.168.100.63 FAILED 192.168.100.64 lladdr e2:12:55:04:50:b5 STALE 192.168.100.19 lladdr 06:3c:05:36:99:c3 REACHABLE 192.168.100.132 lladdr ba:77:6c:ce:a2:f2 REACHABLE 192.168.100.109 INCOMPLETE 192.168.100.29 lladdr be:83:fd:d7:61:94 REACHABLE 192.168.100.131 lladdr fe:a1:39:27:7c:ad REACHABLE 192.168.100.134 lladdr e4:43:4b:69:59:dc DELAY 192.168.100.133 lladdr 6a:82:3a:03:f7:0b REACHABLE 192.168.100.142 lladdr f2:dc:d5:76:12:4f REACHABLE 192.168.100.1 lladdr 00:00:5e:00:01:09 REACHABLE 192.168.100.141 lladdr ee:45:e8:41:45:31 REACHABLE 192.168.100.61 FAILED 192.168.100.18 FAILED
The entries with status incomplete or failed crashlooped:
root@rancher-node-highmem02:~# ip neigh get 192.168.100.136 dev vlan440 192.168.100.136 dev vlan440 INCOMPLETE root@rancher-node-highmem02:~# ip neigh get 192.168.100.136 dev vlan440 192.168.100.136 dev vlan440 lladdr 6a:ee:37:c3:55:3d DELAY root@rancher-node-highmem02:~# ip neigh get 192.168.100.136 dev vlan440 192.168.100.136 dev vlan440 lladdr 6a:ee:37:c3:55:3d PROBE root@rancher-node-highmem02:~# ip neigh get 192.168.100.136 dev vlan440 192.168.100.136 dev vlan440 FAILED root@rancher-node-highmem02:~# ip neigh get 192.168.100.136 dev vlan440 192.168.100.136 dev vlan440 INCOMPLETE
During the DELAY and PROBE time the node is reachable. Here is three ping sequences from rancher-node-highmem01:
root@rancher-node-highmem01:~# ping rancher-node-highmem02 [218/397] PING rancher-node-highmem02.internal.softwareheritage.org (192.168.100.135) 56(84) bytes of data. 64 bytes from rancher-node-highmem02.internal.softwareheritage.org (192.168.100.135): icmp_seq=26 ttl=64 time=1024 ms 64 bytes from rancher-node-highmem02.internal.softwareheritage.org (192.168.100.135): icmp_seq=27 ttl=64 time=0.353 ms 64 bytes from rancher-node-highmem02.internal.softwareheritage.org (192.168.100.135): icmp_seq=28 ttl=64 time=0.364 ms 64 bytes from rancher-node-highmem02.internal.softwareheritage.org (192.168.100.135): icmp_seq=29 ttl=64 time=0.341 ms 64 bytes from rancher-node-highmem02.internal.softwareheritage.org (192.168.100.135): icmp_seq=30 ttl=64 time=0.164 ms 64 bytes from rancher-node-highmem02.internal.softwareheritage.org (192.168.100.135): icmp_seq=31 ttl=64 time=0.178 ms 64 bytes from rancher-node-highmem02.internal.softwareheritage.org (192.168.100.135): icmp_seq=32 ttl=64 time=0.324 ms 64 bytes from rancher-node-highmem02.internal.softwareheritage.org (192.168.100.135): icmp_seq=33 ttl=64 time=0.174 ms 64 bytes from rancher-node-highmem02.internal.softwareheritage.org (192.168.100.135): icmp_seq=34 ttl=64 time=0.189 ms 64 bytes from rancher-node-highmem02.internal.softwareheritage.org (192.168.100.135): icmp_seq=35 ttl=64 time=0.189 ms 64 bytes from rancher-node-highmem02.internal.softwareheritage.org (192.168.100.135): icmp_seq=183 ttl=64 time=468 ms 64 bytes from rancher-node-highmem02.internal.softwareheritage.org (192.168.100.135): icmp_seq=184 ttl=64 time=0.188 ms 64 bytes from rancher-node-highmem02.internal.softwareheritage.org (192.168.100.135): icmp_seq=185 ttl=64 time=0.184 ms 64 bytes from rancher-node-highmem02.internal.softwareheritage.org (192.168.100.135): icmp_seq=186 ttl=64 time=0.329 ms 64 bytes from rancher-node-highmem02.internal.softwareheritage.org (192.168.100.135): icmp_seq=187 ttl=64 time=0.187 ms 64 bytes from rancher-node-highmem02.internal.softwareheritage.org (192.168.100.135): icmp_seq=188 ttl=64 time=0.140 ms 64 bytes from rancher-node-highmem02.internal.softwareheritage.org (192.168.100.135): icmp_seq=189 ttl=64 time=0.321 ms 64 bytes from rancher-node-highmem02.internal.softwareheritage.org (192.168.100.135): icmp_seq=190 ttl=64 time=0.330 ms 64 bytes from rancher-node-highmem02.internal.softwareheritage.org (192.168.100.135): icmp_seq=191 ttl=64 time=0.209 ms 64 bytes from rancher-node-highmem02.internal.softwareheritage.org (192.168.100.135): icmp_seq=300 ttl=64 time=0.531 ms 64 bytes from rancher-node-highmem02.internal.softwareheritage.org (192.168.100.135): icmp_seq=301 ttl=64 time=0.369 ms 64 bytes from rancher-node-highmem02.internal.softwareheritage.org (192.168.100.135): icmp_seq=302 ttl=64 time=0.235 ms 64 bytes from rancher-node-highmem02.internal.softwareheritage.org (192.168.100.135): icmp_seq=303 ttl=64 time=0.207 ms 64 bytes from rancher-node-highmem02.internal.softwareheritage.org (192.168.100.135): icmp_seq=304 ttl=64 time=0.207 ms 64 bytes from rancher-node-highmem02.internal.softwareheritage.org (192.168.100.135): icmp_seq=305 ttl=64 time=0.208 ms 64 bytes from rancher-node-highmem02.internal.softwareheritage.org (192.168.100.135): icmp_seq=306 ttl=64 time=0.222 ms 64 bytes from rancher-node-highmem02.internal.softwareheritage.org (192.168.100.135): icmp_seq=307 ttl=64 time=0.201 ms
Removing a single entry have no effect:
root@rancher-node-highmem02:~# ip neigh get 192.168.100.136 dev vlan440 && sed -n '/192.168.100.136/p' /proc/net/arp 192.168.100.136 dev vlan440 INCOMPLETE 192.168.100.136 0x1 0x0 6a:ee:37:c3:55:3d * vlan440 root@rancher-node-highmem02:~# ip neigh del 192.168.100.136 dev vlan440 root@rancher-node-highmem02:~# ip neigh get 192.168.100.136 dev vlan440 && sed -n '/192.168.100.136/p' /proc/net/arp 192.168.100.136 dev vlan440 INCOMPLETE 192.168.100.136 0x1 0x0 00:00:00:00:00:00 * vlan440 root@rancher-node-highmem02:~# ip neigh get 192.168.100.136 dev vlan440 && sed -n '/192.168.100.136/p' /proc/net/arp 192.168.100.136 dev vlan440 lladdr 6a:ee:37:c3:55:3d REACHABLE 192.168.100.136 0x1 0x2 6a:ee:37:c3:55:3d * vlan440 root@rancher-node-highmem02:~# ip neigh get 192.168.100.136 dev vlan440 && sed -n '/192.168.100.136/p' /proc/net/arp 192.168.100.136 dev vlan440 lladdr 6a:ee:37:c3:55:3d DELAY 192.168.100.136 0x1 0x2 6a:ee:37:c3:55:3d * vlan440 root@rancher-node-highmem02:~# ip neigh get 192.168.100.136 dev vlan440 && sed -n '/192.168.100.136/p' /proc/net/arp 192.168.100.136 dev vlan440 lladdr 6a:ee:37:c3:55:3d PROBE 192.168.100.136 0x1 0x2 6a:ee:37:c3:55:3d * vlan440 root@rancher-node-highmem02:~# ip neigh get 192.168.100.136 dev vlan440 && sed -n '/192.168.100.136/p' /proc/net/arp 192.168.100.136 dev vlan440 FAILED 192.168.100.136 0x1 0x0 6a:ee:37:c3:55:3d * vlan440 root@rancher-node-highmem02:~# ip neigh get 192.168.100.136 dev vlan440 && sed -n '/192.168.100.136/p' /proc/net/arp 192.168.100.136 dev vlan440 INCOMPLETE 192.168.100.136 0x1 0x0 6a:ee:37:c3:55:3d * vlan440
Enabling/disabling arp on vlan440 interface from an iDRAC console (to clear
/proc/net/arp
) have no effect too:root@rancher-node-highmem02:~# ip link set arp off dev vlan440 root@rancher-node-highmem02:~# ip neigh s dev vlan440 root@rancher-node-highmem02:~# ip link set arp on dev vlan440
For now the only way I found to ensure the connection between
highmem01
andhighmem02
is to declare the arp entry manually:root@rancher-node-highmem02:~# arp -s 192.168.100.136 -i vlan440 6a:ee:37:c3:55:3d root@rancher-node-highmem02:~# ip neigh get 192.168.100.136 dev vlan440 && sed -n '/192.168.100.136/p' /proc/net/arp 192.168.100.136 dev vlan440 lladdr 6a:ee:37:c3:55:3d PERMANENT 192.168.100.136 0x1 0x6 6a:ee:37:c3:55:3d * vlan440
- Owner
I created permanent ARP entries for the failed/incomplete ARP entries on
highmem02
:root@rancher-node-highmem02:~# ip neigh s dev vlan440 192.168.100.136 lladdr 6a:ee:37:c3:55:3d PERMANENT 192.168.100.143 lladdr 66:87:ee:c3:6a:ba REACHABLE 192.168.100.63 lladdr b4:96:91:1c:6f:8c PERMANENT 192.168.100.64 lladdr e2:12:55:04:50:b5 REACHABLE 192.168.100.19 lladdr 06:3c:05:36:99:c3 REACHABLE 192.168.100.132 lladdr ba:77:6c:ce:a2:f2 REACHABLE 192.168.100.109 lladdr e4:43:4b:f0:ca:b0 REACHABLE 192.168.100.29 lladdr be:83:fd:d7:61:94 REACHABLE 192.168.100.131 lladdr fe:a1:39:27:7c:ad REACHABLE 192.168.100.134 lladdr e4:43:4b:69:59:dc PERMANENT 192.168.100.2 lladdr 88:e9:a4:67:81:64 STALE 192.168.100.133 lladdr 6a:82:3a:03:f7:0b REACHABLE 192.168.100.142 lladdr f2:dc:d5:76:12:4f REACHABLE 192.168.100.1 lladdr 00:00:5e:00:01:09 REACHABLE 192.168.100.62 lladdr b4:96:91:1c:6f:da PERMANENT 192.168.100.141 lladdr ee:45:e8:41:45:31 REACHABLE 192.168.100.18 FAILED 192.168.100.61 lladdr b4:96:91:1c:6f:dc PERMANENT
except one entry for tests
root@rancher-node-highmem02:~# dig -x 192.168.100.18 +short banco.internal.softwareheritage.org.
tracing ARP traffic on
banco
gsamson@banco ~ % sudo tcpdump -ni vlan440 arp and host 192.168.100.135 -c 10 tcpdump: verbose output suppressed, use -v[v]... for full protocol decode listening on vlan440, link-type EN10MB (Ethernet), snapshot length 262144 bytes 09:53:58.698165 ARP, Request who-has 192.168.100.18 tell 192.168.100.135, length 42 09:53:58.698175 ARP, Reply 192.168.100.18 is-at 14:18:77:3c:00:55, length 28 09:53:59.722029 ARP, Request who-has 192.168.100.18 tell 192.168.100.135, length 42 09:53:59.722039 ARP, Reply 192.168.100.18 is-at 14:18:77:3c:00:55, length 28 09:54:00.764251 ARP, Request who-has 192.168.100.18 tell 192.168.100.135, length 42 09:54:00.764262 ARP, Reply 192.168.100.18 is-at 14:18:77:3c:00:55, length 28 09:54:01.770173 ARP, Request who-has 192.168.100.18 tell 192.168.100.135, length 42 09:54:01.770182 ARP, Reply 192.168.100.18 is-at 14:18:77:3c:00:55, length 28 09:54:02.794197 ARP, Request who-has 192.168.100.18 tell 192.168.100.135, length 42 09:54:02.794206 ARP, Reply 192.168.100.18 is-at 14:18:77:3c:00:55, length 28 10 packets captured 18 packets received by filter 0 packets dropped by kernel
tracing ARP traffic on
rancher-node-highmem02
root@rancher-node-highmem02:~# tcpdump -ni vlan440 arp -c 10 tcpdump: verbose output suppressed, use -v[v]... for full protocol decode listening on vlan440, link-type EN10MB (Ethernet), snapshot length 262144 bytes 09:53:51.753853 ARP, Request who-has 192.168.100.18 tell 192.168.100.135, length 28 09:53:52.777932 ARP, Request who-has 192.168.100.18 tell 192.168.100.135, length 28 09:53:54.624073 ARP, Request who-has 192.168.100.18 tell 192.168.100.135, length 28 09:53:55.625852 ARP, Request who-has 192.168.100.18 tell 192.168.100.135, length 28 09:53:56.534033 ARP, Request who-has 192.168.100.109 tell 192.168.100.2, length 42 09:53:56.650013 ARP, Request who-has 192.168.100.18 tell 192.168.100.135, length 28 09:53:57.692263 ARP, Request who-has 192.168.100.18 tell 192.168.100.135, length 28 09:53:58.697978 ARP, Request who-has 192.168.100.18 tell 192.168.100.135, length 28 09:53:59.721852 ARP, Request who-has 192.168.100.18 tell 192.168.100.135, length 28 09:54:00.764071 ARP, Request who-has 192.168.100.18 tell 192.168.100.135, length 28 10 packets captured 10 packets received by filter 0 packets dropped by kernel
rancher-node-highmem02
never receives the reply even if the reply seems correct:It's as if some ARP replies to
highmem02
were filtered out. - Owner
After removing all permanent entries in arp table on
rancher-node-highmem02
, the resolution works fine except for these three addresses (I redeclared the arp resolution manually):root@rancher-node-highmem02:~# ip neigh s dev vlan440 | grep PERMANENT 192.168.100.63 lladdr b4:96:91:1c:6f:8c PERMANENT 192.168.100.61 lladdr b4:96:91:1c:6f:dc PERMANENT 192.168.100.62 lladdr b4:96:91:1c:6f:da PERMANENT
root@rancher-node-highmem02:~# ip neigh s dev vlan440 | grep -v PERMANENT 192.168.100.142 lladdr f2:dc:d5:76:12:4f REACHABLE 192.168.100.131 lladdr fe:a1:39:27:7c:ad REACHABLE 192.168.100.132 lladdr ba:77:6c:ce:a2:f2 REACHABLE 192.168.100.187 lladdr 92:98:97:51:71:e4 REACHABLE 192.168.100.188 lladdr 5a:28:4a:e1:ec:66 REACHABLE 192.168.100.109 lladdr e4:43:4b:f0:ca:b0 REACHABLE 192.168.100.18 lladdr 14:18:77:3c:00:55 REACHABLE 192.168.100.185 lladdr ba:c5:9a:d0:cc:d5 REACHABLE 192.168.100.64 lladdr e2:12:55:04:50:b5 REACHABLE 192.168.100.183 lladdr 5a:54:54:ce:5f:46 REACHABLE 192.168.100.193 lladdr ea:5f:8d:7e:e3:f4 REACHABLE 192.168.100.186 lladdr 12:eb:5b:e4:91:92 REACHABLE 192.168.100.191 lladdr ee:81:b7:14:75:7f REACHABLE 192.168.100.181 lladdr 5e:a2:f2:01:0b:6b REACHABLE 192.168.100.143 lladdr 66:87:ee:c3:6a:ba REACHABLE 192.168.100.29 lladdr be:83:fd:d7:61:94 REACHABLE 192.168.100.201 lladdr 4c:d9:8f:c3:3f:72 STALE 192.168.100.133 lladdr 6a:82:3a:03:f7:0b REACHABLE 192.168.100.184 lladdr 36:75:44:d3:32:d4 REACHABLE 192.168.100.1 lladdr 00:00:5e:00:01:09 REACHABLE 192.168.100.189 lladdr 6a:a6:13:62:9c:6f REACHABLE 192.168.100.182 lladdr 02:45:58:30:e1:97 REACHABLE 192.168.100.136 lladdr 6a:ee:37:c3:55:3d REACHABLE 192.168.100.141 lladdr ee:45:e8:41:45:31 REACHABLE 192.168.100.134 lladdr e4:43:4b:69:59:dc REACHABLE 192.168.100.19 lladdr 06:3c:05:36:99:c3 REACHABLE 192.168.100.192 lladdr aa:e7:99:ae:35:b5 REACHABLE 192.168.100.2 lladdr 88:e9:a4:67:81:64 STALE 192.168.100.190 lladdr fa:af:09:52:b3:88 REACHABLE
- Owner
I retried to delete permanent ARP entries...still the same weird behavior (
rancher-node-highmem02
on the left,esnode2
on the right):Screencast_from_2025-02-21_10-20-28
So I redeclared these ARP entries manually (
arp -s <ip-address> -i vlan440 <mac-address>
):root@rancher-node-highmem02:~# ip neigh s dev vlan440 | grep PERMANENT 192.168.100.63 lladdr b4:96:91:1c:6f:8c PERMANENT 192.168.100.61 lladdr b4:96:91:1c:6f:dc PERMANENT 192.168.100.62 lladdr b4:96:91:1c:6f:da PERMANENT
- Owner
I don't know if it's linked to the network intervention in the early afternoon, but all the arp table entries on
rancher-node-highmem02
are "normal" and stable:root@rancher-node-highmem02:~# ip n s dev vlan440 192.168.100.142 lladdr f2:dc:d5:76:12:4f REACHABLE 192.168.100.131 lladdr fe:a1:39:27:7c:ad REACHABLE 192.168.100.63 lladdr b4:96:91:1c:6f:8c REACHABLE 192.168.100.132 lladdr ba:77:6c:ce:a2:f2 REACHABLE 192.168.100.187 lladdr 92:98:97:51:71:e4 REACHABLE 192.168.100.188 lladdr 5a:28:4a:e1:ec:66 REACHABLE 192.168.100.35 lladdr 6c:92:cf:b9:31:10 REACHABLE 192.168.100.104 lladdr 02:9b:11:3f:81:21 REACHABLE 192.168.100.109 lladdr e4:43:4b:f0:ca:b0 REACHABLE 192.168.100.18 lladdr 14:18:77:3c:00:55 REACHABLE 192.168.100.61 lladdr b4:96:91:1c:6f:dc REACHABLE 192.168.100.185 lladdr ba:c5:9a:d0:cc:d5 REACHABLE 192.168.100.33 lladdr 88:e9:a4:50:95:30 REACHABLE 192.168.100.64 lladdr e2:12:55:04:50:b5 REACHABLE 192.168.100.183 lladdr 5a:54:54:ce:5f:46 REACHABLE 192.168.100.62 lladdr b4:96:91:1c:6f:da REACHABLE 192.168.100.203 lladdr 4c:d9:8f:c3:3f:a2 REACHABLE 192.168.100.204 lladdr 4c:d9:8f:c3:3d:92 REACHABLE 192.168.100.193 lladdr ea:5f:8d:7e:e3:f4 REACHABLE 192.168.100.186 lladdr 12:eb:5b:e4:91:92 REACHABLE 192.168.100.3 lladdr 88:e9:a4:67:81:68 STALE 192.168.100.191 lladdr ee:81:b7:14:75:7f REACHABLE 192.168.100.108 lladdr 68:05:ca:ac:d8:24 REACHABLE 192.168.100.181 lladdr 5e:a2:f2:01:0b:6b REACHABLE 192.168.100.143 lladdr 66:87:ee:c3:6a:ba REACHABLE 192.168.100.29 lladdr be:83:fd:d7:61:94 REACHABLE 192.168.100.201 lladdr 4c:d9:8f:c3:3f:72 REACHABLE 192.168.100.133 lladdr 6a:82:3a:03:f7:0b REACHABLE 192.168.100.184 lladdr 36:75:44:d3:32:d4 REACHABLE 192.168.100.1 lladdr 00:00:5e:00:01:09 REACHABLE 192.168.100.189 lladdr 6a:a6:13:62:9c:6f REACHABLE 192.168.100.182 lladdr 02:45:58:30:e1:97 REACHABLE 192.168.100.136 lladdr 6a:ee:37:c3:55:3d REACHABLE 192.168.100.141 lladdr ee:45:e8:41:45:31 REACHABLE 192.168.100.202 lladdr 4c:d9:8f:c3:3f:32 REACHABLE 192.168.100.134 lladdr e4:43:4b:69:59:dc REACHABLE 192.168.100.19 lladdr 06:3c:05:36:99:c3 REACHABLE 192.168.100.192 lladdr aa:e7:99:ae:35:b5 REACHABLE 192.168.100.2 lladdr 88:e9:a4:67:81:64 STALE 192.168.100.190 lladdr fa:af:09:52:b3:88 REACHABLE
- Owner
No more intermittent connections.
Closing. 1 - Guillaume Samson closed
closed