In order to validate the feasibility and the possible caveats to implements an elastic
workers infrastructure, we will implement a poc managing the workers for the gitlab
repositories
First we need to refresh and land the kubernetes branch on the swh-environment to have a working example
Refresh the rancher VM on uffizi to test the solution in a pseudo real environment (created from scratch, cf. terraform/staging)
Create workers and register them in the rancher cluster
POC image building / deployment process (manual push on docker hub)
POC worker autoscaling according to message in queues
POC worker autoscaling according to available ressources on the cluster
root@poc-rancher:~# curl -sfL https://get.k3s.io | sh -root@poc-rancher:~# systemctl status k3s | grep Active Active: active (running) since Wed 2021-09-22 14:03:13 UTC; 30s agoroot@poc-rancher:~# curl -fsSL -o get_helm.sh https://raw.githubusercontent.com/helm/helm/master/scripts/get-helm-3root@poc-rancher:~# chmod 700root@poc-rancher:~# chmod 700 get_helm.shroot@poc-rancher:~# less get_helm.shroot@poc-rancher:~# ./get_helm.shDownloading https://get.helm.sh/helm-v3.7.0-linux-amd64.tar.gzVerifying checksum... Done.Preparing to install helm into /usr/local/binroot@poc-rancher:~# which helm/usr/local/bin/helmroot@poc-rancher:~# helm repo add rancher-stable https://releases.rancher.com/server-charts/stable"rancher-stable" has been added to your repositoriesroot@poc-rancher:~# helm repo listNAME URLrancher-stable https://releases.rancher.com/server-charts/stableroot@poc-rancher:~# kubectl create namespace cattle-systemnamespace/cattle-system createdroot@poc-rancher:~# helm repo updateHang tight while we grab the latest from your chart repositories......Successfully got an update from the "jetstack" chart repository...Successfully got an update from the "rancher-stable" chart repositoryUpdate Complete. ⎈Happy Helming!⎈# Need some setup for the kube user (root here)root@poc-rancher:~/.kube# cp /etc/rancher/k3s/k3s.yaml ~/.kube/configroot@poc-rancher:~# kubectl apply -f https://github.com/jetstack/cert-manager/releases/download/v1.5.3/cert-manager.crds.yamlcustomresourcedefinition.apiextensions.k8s.io/certificaterequests.cert-manager.io configuredcustomresourcedefinition.apiextensions.k8s.io/certificates.cert-manager.io configuredcustomresourcedefinition.apiextensions.k8s.io/challenges.acme.cert-manager.io configuredcustomresourcedefinition.apiextensions.k8s.io/clusterissuers.cert-manager.io configuredcustomresourcedefinition.apiextensions.k8s.io/issuers.cert-manager.io configuredcustomresourcedefinition.apiextensions.k8s.io/orders.acme.cert-manager.io configuredroot@poc-rancher:~# helm install cert-manager jetstack/cert-manager \ --namespace cert-manager \ --create-namespace \ --version v1.5.3NAME: cert-managerLAST DEPLOYED: Wed Sep 22 14:19:59 2021NAMESPACE: cert-managerSTATUS: deployedREVISION: 1TEST SUITE: NoneNOTES:cert-manager v1.5.3 has been deployed successfully!In order to begin issuing certificates, you will need to set up a ClusterIssueror Issuer resource (for example, by creating a 'letsencrypt-staging' issuer).More information on the different types of issuers and how to configure themcan be found in our documentation:https://cert-manager.io/docs/configuration/For information on how to configure cert-manager to automatically provisionCertificates for Ingress resources, take a look at the `ingress-shim`documentation:https://cert-manager.io/docs/usage/ingress/root@poc-rancher:~# kubectl get pods --namespace cert-managerNAME READY STATUS RESTARTS AGEcert-manager-cainjector-856d4df858-5xqcl 1/1 Running 0 6m57scert-manager-66b6d6bf59-59wqv 1/1 Running 0 6m57scert-manager-webhook-5fd7d458f7-kmqhq 1/1 Running 0 6m57sroot@poc-rancher:~# helm install rancher rancher-stable/rancher \ --namespace cattle-system \ --set hostname=poc-rancher.internal.staging.swh.network \ --set bootstrapPassword=<redacted>W0922 14:31:40.931662 48675 warnings.go:70] cert-manager.io/v1beta1 Issuer is deprecatedin v1.4+, unavailable in v1.6+; use cert-manager.io/v1 IssuerNAME: rancherLAST DEPLOYED: Wed Sep 22 14:31:40 2021NAMESPACE: cattle-systemSTATUS: deployedREVISION: 1TEST SUITE: NoneNOTES:Rancher Server has been installed.NOTE: Rancher may take several minutes to fully initialize. Please standby whileCertificates are being issued and Ingress comes up.Check out our docs at https://rancher.com/docs/rancher/v2.x/en/Browse to https://poc-rancher.internal.staging.swh.networkHappy Containering!root@poc-rancher:~# kubectl -n cattle-system rollout status deploy/rancherWaiting for deployment "rancher" rollout to finish: 0 of 3 updated replicas are available...Waiting for deployment "rancher" rollout to finish: 1 of 3 updated replicas are available...Waiting for deployment "rancher" rollout to finish: 2 of 3 updated replicas are available...Waiting for deployment spec update to be observed...Waiting for deployment "rancher" rollout to finish: 2 of 3 updated replicas are available...deployment "rancher" successfully rolled out
After having hard time, we have solved several issues:
The rancher initialization problem was because we were using a wrong version of k3s compared to the compatibility matrix of rancher.
We installed rancher 2.5.9 on a recent version of k3s installing kubernetes 1.22.2. According to the compatibility matrix of rancher[1], using a older version of k3s solved the problem and the clusters start correctly after that
After solving the rancher issue, we faced another issue with internode communication.
2 nodes on the cluster were unable to talk together. It's not really a problem for the workers as they don't need to communicate with other nodes in the cluster, but it often blocks the dns resolution on the pods because the dns resolvers are deployed with a daemonset and dispatched on several nodes [2]
A standalone k3s cluster also has the problem so it's not a rancher issue.
With 2 ubuntu vms, everything is working well so it's a compatibility issue with debian.
Unfortunately, the nodes were still not able to communicate each others
A lot of network issues seems to be be related to the support vxlan on debian / flannel[3]
Trying to change the flannel backend to host-gw (on the k3s server only, not the agents) as indicated finally solve the problem and the node are able to communicate:
root@poc-rancher-sw0:~# ./test-network.sh => Start network overlay testpoc-rancher-sw1 can reach poc-rancher-sw1poc-rancher-sw1 can reach poc-rancher-sw0poc-rancher-sw0 can reach poc-rancher-sw1poc-rancher-sw0 can reach poc-rancher-sw0=> End network overlay test
We have successfully ran loaders in staging using the helm chart we have wrote [1] and an hardcoded number of worker, It adds the possibility to perform rolling upgrades for example
We have tried the integrated horizontal pod autoscaler [2], it works pretty well but it's not adapted for our worker scenario.
It's based on the cpu consumption(on our test [3], but can be other things) of the pod to decide if the number of running pods must be upscaled or downscaled. It can be very useful to manage classical load like for gunicorn container, but not for the scenario of long running tasks
Kubernetes also has some functionalities to reduce the pressure on a node when some limts are reached but it looks like it's more emergency actions than proper scaling management. It's configured at the kubelet level and not dynamic at all [4]. It was rapidly tested but we have lost the node due to oom before the node eviction starts.
There are a lot of stuff we can also test like (no exhaustive):
trying to write an operator monitoring the overall cluster load and adapting the paralellism. the hard part is to find a way to identify the instances to stop
keda looks promising. migrated/migration$1185 is an example of configuration working for the docker environment. It's able to scale to 0 when no messages are present on the queue.
When messages are present, the loaders are launched progressively until the limit of cpu/memory of the host is reached or the max number of allowed worker is reached.
just a quick remark about the scheduling of (sub)tasks of this task: IMHO the autoscaling should come last; all the supervision/monitoring/logging related tasks are much more important than the autoscaling.
It seems the rancher network issue is fixed in version 2.6.3 which is quite a good news
swhworker@poc-rancher:~$ ./test-network.sh => Start network overlay testpoc-rancher-sw0 can reach poc-rancher-sw0poc-rancher-sw0 can reach poc-rancher-sw1poc-rancher-sw1 can reach poc-rancher-sw0poc-rancher-sw1 can reach poc-rancher-sw1=> End network overlay test
It works with debian 10 and debian 11 as soon the iptables-legacy is used as an alternative of the new iptables-nft command
Good news, it looks like there is no more issues with the inter-node communication with rancher 2.6.4 and bullseye.
The test[1] was tested OK on 2 different clusters (gitlab and the one for the elastic workers)
~/wip ❯ ./overlay.sh=> Start network overlay testelastic-worker1 can reach elastic-worker1elastic-worker1 can reach elastic-worker0elastic-worker1 can reach elastic-worker2elastic-worker0 can reach elastic-worker1elastic-worker0 can reach elastic-worker0elastic-worker0 can reach elastic-worker2elastic-worker2 can reach elastic-worker1elastic-worker2 can reach elastic-worker0elastic-worker2 can reach elastic-worker2=> End network overlay test