Evaluate MetalLB as inbound loadbalancer
MetalLB[1] is load balancer implementation allowing to use LoadBalancer object in bare metal kubernetes deployments. It could allow us to expose the services without deploying and manage a new load balancing stack to ensure the HA
Migrated from T4534 (view on Phabricator)
Designs
- Show closed items
- #4523Dynamic infrastructure [Roadmap - Tooling and infrastructure]
Activity
-
Newest first Oldest first
-
Show all activity Show comments only Show history only
- Phabricator Migration user marked this issue as related to #4523 (closed)
marked this issue as related to #4523 (closed)
- Vincent Sellier added System administration priority:Normal labels
added System administration priority:Normal labels
- Vincent Sellier added state:wip label
added state:wip label
- Phabricator Migration user mentioned in commit swh-sysadmin-provisioning@acbc844f
mentioned in commit swh-sysadmin-provisioning@acbc844f
- Author Owner
With the ingress controller correctly configured and an ingress declared, everything seems to work correctly:
vsellier@pergamon ~ % cat test-ingress.txt GET /graphql/ HTTP/1.0 Host: archive.softwareheritage.org vsellier@pergamon ~ % cat test-ingress.txt| nc 192.168.100.119 80 | head -n 20 HTTP/1.1 200 OK Server: nginx/1.23.1 Date: Tue, 27 Sep 2022 15:36:35 GMT Content-Type: text/html; charset=utf-8 Content-Length: 1527 Connection: close <!DOCTYPE html> <html> <head> <meta charset=utf-8/> <meta name="viewport" content="user-scalable=no, initial-scale=1.0, minimum-scale=1.0, maximum-scale=1.0, minimal-ui"> <title>GraphQL Playground</title> <link rel="stylesheet" href="//cdn.jsdelivr.net/npm/graphql-playground-react/build/static/css/index.css" /> <link rel="shortcut icon" href="//cdn.jsdelivr.net/npm/graphql-playground-react/build/favicon.png" /> <script src="//cdn.jsdelivr.net/npm/graphql-playground-react/build/static/js/middleware.js"></script> </head> <body>
The first recovery test with a failing node was not very conclusive:
vsellier@pergamon ~ % (while true; do date; sleep 2; done) & - [1] 1305858 vsellier@pergamon ~ % sudo arping 192.168.100.119Tue Sep 27 15:40:44 UTC 2022 Tue Sep 27 15:41:26 UTC 2022 60 bytes from 2e:81:20:19:02:4a (192.168.100.119): index=42 time=604.637 usec 60 bytes from 2e:81:20:19:02:4a (192.168.100.119): index=43 time=744.078 usec Tue Sep 27 15:41:28 UTC 2022 60 bytes from 2e:81:20:19:02:4a (192.168.100.119): index=44 time=1.562 msec <--- rancher-node-production-worker03 stopped abruptly via the proxmox ui Tue Sep 27 15:41:30 UTC 2022 Timeout Timeout ... Tue Sep 27 15:47:55 UTC 2022 Timeout Timeout 60 bytes from 2e:84:a0:44:9e:c9 (192.168.100.119): index=45 time=408.477 msec <--- balanced to rancher-node-production-worker02 Tue Sep 27 15:47:57 UTC 2022 60 bytes from 2e:84:a0:44:9e:c9 (192.168.100.119): index=46 time=645.523 usec 60 bytes from 2e:84:a0:44:9e:c9 (192.168.100.119): index=47 time=507.705 msec 60 bytes from 2e:84:a0:44:9e:c9 (192.168.100.119): index=48 time=2.451 msec
The ip was not rebalanced until worker03 was restarted
metallb logs:
metallb/metallb-speaker-jh92n[speaker]: {"caller":"speakerlist.go:256","error":"1 error occurred:\n\t* Failed to join 192.168.100.123:7946: dial tcp 192.168.100.123:7946: connect: no route to host\n\n","expected":1,"joined":0,"level":"error","msg":"partial join","op":"memberDiscovery","ts":"2022-09-27T15:44:14Z"} metallb/metallb-speaker-8tkq7[speaker]: {"caller":"speakerlist.go:256","error":"1 error occurred:\n\t* Failed to join 192.168.100.123:7946: dial tcp 192.168.100.123:7946: connect: no route to host\n\n","expected":1,"joined":0,"level":"error","msg":"partial join","op":"memberDiscovery","ts":"2022-09-27T15:44:14Z"} metallb/metallb-speaker-llxgq[speaker]: {"caller":"config_controller.go:48","controller":"ConfigReconciler","level":"info","start reconcile":"/rancher-node-production-mgmt1","ts":"2022-09-27T15:44:44Z"} metallb/metallb-speaker-llxgq[speaker]: {"caller":"config_controller.go:136","controller":"ConfigReconciler","event":"force service reload","level":"info","ts":"2022-09-27T15:44:44Z"} metallb/metallb-speaker-llxgq[speaker]: {"caller":"config_controller.go:147","controller":"ConfigReconciler","event":"config reloaded","level":"info","ts":"2022-09-27T15:44:44Z"} metallb/metallb-speaker-llxgq[speaker]: {"caller":"config_controller.go:148","controller":"ConfigReconciler","end reconcile":"/rancher-node-production-mgmt1","level":"info","ts":"2022-09-27T15:44:44Z"} metallb/metallb-speaker-llxgq[speaker]: {"caller":"service_controller_reload.go:61","controller":"ServiceReconciler - reprocessAll","level":"info","start reconcile":"metallbreload/reload","ts":"2022-09-27T15:44:44Z"} metallb/metallb-speaker-llxgq[speaker]: {"caller":"service_controller_reload.go:103","controller":"ServiceReconciler - reprocessAll","end reconcile":"metallbreload/reload","level":"info","ts":"2022-09-27T15:44:44Z"} metallb/metallb-speaker-jh92n[speaker]: {"caller":"config_controller.go:48","controller":"ConfigReconciler","level":"info","start reconcile":"/rancher-node-production-mgmt1","ts":"2022-09-27T15:44:44Z"} metallb/metallb-speaker-jh92n[speaker]: {"caller":"config_controller.go:136","controller":"ConfigReconciler","event":"force service reload","level":"info","ts":"2022-09-27T15:44:44Z"} metallb/metallb-speaker-8tkq7[speaker]: {"caller":"config_controller.go:48","controller":"ConfigReconciler","level":"info","start reconcile":"/rancher-node-production-mgmt1","ts":"2022-09-27T15:44:44Z"} metallb/metallb-speaker-8tkq7[speaker]: {"caller":"config_controller.go:136","controller":"ConfigReconciler","event":"force service reload","level":"info","ts":"2022-09-27T15:44:44Z"} metallb/metallb-speaker-8tkq7[speaker]: {"caller":"config_controller.go:147","controller":"ConfigReconciler","event":"config reloaded","level":"info","ts":"2022-09-27T15:44:44Z"} metallb/metallb-speaker-8tkq7[speaker]: {"caller":"config_controller.go:148","controller":"ConfigReconciler","end reconcile":"/rancher-node-production-mgmt1","level":"info","ts":"2022-09-27T15:44:44Z"} metallb/metallb-speaker-8tkq7[speaker]: {"caller":"service_controller_reload.go:61","controller":"ServiceReconciler - reprocessAll","level":"info","start reconcile":"metallbreload/reload","ts":"2022-09-27T15:44:44Z"} metallb/metallb-speaker-jh92n[speaker]: {"caller":"config_controller.go:147","controller":"ConfigReconciler","event":"config reloaded","level":"info","ts":"2022-09-27T15:44:44Z"} metallb/metallb-speaker-8tkq7[speaker]: {"caller":"service_controller_reload.go:103","controller":"ServiceReconciler - reprocessAll","end reconcile":"metallbreload/reload","level":"info","ts":"2022-09-27T15:44:44Z"} metallb/metallb-speaker-jh92n[speaker]: {"caller":"config_controller.go:148","controller":"ConfigReconciler","end reconcile":"/rancher-node-production-mgmt1","level":"info","ts":"2022-09-27T15:44:44Z"} metallb/metallb-speaker-jh92n[speaker]: {"caller":"service_controller_reload.go:61","controller":"ServiceReconciler - reprocessAll","level":"info","start reconcile":"metallbreload/reload","ts":"2022-09-27T15:44:44Z"} metallb/metallb-speaker-jh92n[speaker]: {"caller":"service_controller_reload.go:103","controller":"ServiceReconciler - reprocessAll","end reconcile":"metallbreload/reload","level":"info","ts":"2022-09-27T15:44:44Z"} metallb/metallb-speaker-llxgq[speaker]: {"caller":"speakerlist.go:256","error":"1 error occurred:\n\t* Failed to join 192.168.100.123:7946: dial tcp 192.168.100.123:7946: connect: no route to host\n\n","expected":1,"joined":0,"level":"error","msg":"partial join","op":"memberDiscovery","ts":"2022-09-27T15:44:44Z"} metallb/metallb-speaker-8tkq7[speaker]: {"caller":"speakerlist.go:256","error":"1 error occurred:\n\t* Failed to join 192.168.100.123:7946: dial tcp 192.168.100.123:7946: connect: no route to host\n\n","expected":1,"joined":0,"level":"error","msg":"partial join","op":"memberDiscovery","ts":"2022-09-27T15:45:14Z"} metallb/metallb-speaker-jh92n[speaker]: {"caller":"speakerlist.go:256","error":"1 error occurred:\n\t* Failed to join 192.168.100.123:7946: dial tcp 192.168.100.123:7946: connect: no route to host\n\n","expected":1,"joined":0,"level":"error","msg":"partial join","op":"memberDiscovery","ts":"2022-09-27T15:45:14Z"} metallb/metallb-speaker-jh92n[speaker]: {"caller":"node_controller.go:42","controller":"NodeReconciler","level":"info","start reconcile":"/rancher-node-production-worker04","ts":"2022-09-27T15:45:37Z"} metallb/metallb-speaker-llxgq[speaker]: {"caller":"config_controller.go:48","controller":"ConfigReconciler","level":"info","start reconcile":"/rancher-node-production-worker04","ts":"2022-09-27T15:45:37Z"} metallb/metallb-speaker-llxgq[speaker]: {"caller":"config_controller.go:136","controller":"ConfigReconciler","event":"force service reload","level":"info","ts":"2022-09-27T15:45:37Z"} metallb/metallb-speaker-llxgq[speaker]: {"caller":"config_controller.go:147","controller":"ConfigReconciler","event":"config reloaded","level":"info","ts":"2022-09-27T15:45:37Z"} metallb/metallb-speaker-llxgq[speaker]: {"caller":"config_controller.go:148","controller":"ConfigReconciler","end reconcile":"/rancher-node-production-worker04","level":"info","ts":"2022-09-27T15:45:37Z"} metallb/metallb-speaker-llxgq[speaker]: {"caller":"service_controller_reload.go:61","controller":"ServiceReconciler - reprocessAll","level":"info","start reconcile":"metallbreload/reload","ts":"2022-09-27T15:45:37Z"} metallb/metallb-speaker-llxgq[speaker]: {"caller":"service_controller_reload.go:103","controller":"ServiceReconciler - reprocessAll","end reconcile":"metallbreload/reload","level":"info","ts":"2022-09-27T15:45:37Z"} metallb/metallb-speaker-jh92n[speaker]: {"caller":"speakerlist.go:271","level":"info","msg":"triggering discovery","op":"memberDiscovery","ts":"2022-09-27T15:45:37Z"} metallb/metallb-speaker-jh92n[speaker]: {"caller":"node_controller.go:64","controller":"NodeReconciler","end reconcile":"/rancher-node-production-worker04","level":"info","ts":"2022-09-27T15:45:37Z"} metallb/metallb-speaker-jh92n[speaker]: {"caller":"config_controller.go:136","controller":"ConfigReconciler","event":"force service reload","level":"info","ts":"2022-09-27T15:45:37Z"} metallb/metallb-speaker-jh92n[speaker]: {"caller":"config_controller.go:147","controller":"ConfigReconciler","event":"config reloaded","level":"info","ts":"2022-09-27T15:45:37Z"} metallb/metallb-speaker-jh92n[speaker]: {"caller":"config_controller.go:148","controller":"ConfigReconciler","end reconcile":"/rancher-node-production-worker04","level":"info","ts":"2022-09-27T15:45:37Z"} metallb/metallb-speaker-jh92n[speaker]: {"caller":"service_controller_reload.go:61","controller":"ServiceReconciler - reprocessAll","level":"info","start reconcile":"metallbreload/reload","ts":"2022-09-27T15:45:37Z"} metallb/metallb-speaker-jh92n[speaker]: {"caller":"service_controller_reload.go:103","controller":"ServiceReconciler - reprocessAll","end reconcile":"metallbreload/reload","level":"info","ts":"2022-09-27T15:45:37Z"} metallb/metallb-speaker-8tkq7[speaker]: {"caller":"config_controller.go:48","controller":"ConfigReconciler","level":"info","start reconcile":"/rancher-node-production-worker04","ts":"2022-09-27T15:45:37Z"} metallb/metallb-speaker-8tkq7[speaker]: {"caller":"config_controller.go:136","controller":"ConfigReconciler","event":"force service reload","level":"info","ts":"2022-09-27T15:45:37Z"} metallb/metallb-speaker-8tkq7[speaker]: {"caller":"config_controller.go:147","controller":"ConfigReconciler","event":"config reloaded","level":"info","ts":"2022-09-27T15:45:37Z"} metallb/metallb-speaker-8tkq7[speaker]: {"caller":"config_controller.go:148","controller":"ConfigReconciler","end reconcile":"/rancher-node-production-worker04","level":"info","ts":"2022-09-27T15:45:37Z"} metallb/metallb-speaker-8tkq7[speaker]: {"caller":"service_controller_reload.go:61","controller":"ServiceReconciler - reprocessAll","level":"info","start reconcile":"metallbreload/reload","ts":"2022-09-27T15:45:37Z"} metallb/metallb-speaker-8tkq7[speaker]: {"caller":"service_controller_reload.go:103","controller":"ServiceReconciler - reprocessAll","end reconcile":"metallbreload/reload","level":"info","ts":"2022-09-27T15:45:37Z"}
- Author Owner
If a node is drained out of the cluster, the rebalancing occurs in ~10s which it's what it's announced in the documentation
Tue Sep 27 16:17:21 UTC 2022 60 bytes from 2e:84:a0:44:9e:c9 (192.168.100.119): index=1985 time=1.710 msec 60 bytes from 2e:84:a0:44:9e:c9 (192.168.100.119): index=1986 time=1.376 msec Tue Sep 27 16:17:23 UTC 2022 Timeout Tue Sep 27 16:17:25 UTC 2022 Timeout Timeout Tue Sep 27 16:17:27 UTC 2022 Timeout Timeout Tue Sep 27 16:17:29 UTC 2022 Timeout Timeout Tue Sep 27 16:17:31 UTC 2022 Timeout Timeout Tue Sep 27 16:17:33 UTC 2022 Timeout Timeout 60 bytes from 2e:81:20:19:02:4a (192.168.100.119): index=1987 time=669.150 msec Tue Sep 27 16:17:35 UTC 2022 60 bytes from 2e:81:20:19:02:4a (192.168.100.119): index=1988 time=959.452 usec
- Author Owner
A new test with a node completely down, it seems it recover after ~5mn which looks related to some cache expiracy somewhere
Tue Sep 27 16:31:59 UTC 2022 60 bytes from 2e:81:20:19:02:4a (192.168.100.119): index=2926 time=1.679 msec Tue Sep 27 16:32:01 UTC 2022 Timeout Timeout Tue Sep 27 16:32:03 UTC 2022 ... Tue Sep 27 16:37:56 UTC 2022 Timeout Timeout Tue Sep 27 16:37:58 UTC 2022 60 bytes from 2e:84:a0:44:9e:c9 (192.168.100.119): index=2927 time=814.409 msec 60 bytes from 2e:84:a0:44:9e:c9 (192.168.100.119): index=2928 time=864.574 usec 60 bytes from 2e:84:a0:44:9e:c9 (192.168.100.119): index=2929 time=973.083 msec 60 bytes from 2e:84:a0:44:9e:c9 (192.168.100.119): index=2930 time=32.151 msec
- Owner
FWIW, we didn't manage to replicate the timeout issue when manually killing and/or bringing down the network on the node currently responding to the MetalLB IP address... Every time, the failover happened within 10 seconds.