Project 'infra/sysadm-environment' was moved to 'swh/infra/sysadm-environment'. Please update any links and bookmarks that may still have the old path.
Unfortunately, after several tries, we were unable to restart the cluster due to a problem with the etcd leader election / data on the nodes (probably wrong manipulation from us).
We finally destroyed the cluster (we had to follow [1] because the cluster was in an unstable state and rancher refused to remove it)
Once the cluster was removed, it was recreated with terraform. The nodes were manually added with the docker command provided by rancher.
The fourth node was started with only a worker configuration. The terraform configuration will be updated accordingly
For the record, terraform doesn't like the cluster creation without nodes because the applications (monitoring / keda) can't be added until the cluster is in an active state.
We will probably have to move this initial configuration outside terraform later.
Regarding the cpu consumption on the nodes, it seems it related to the cluster management load.
It seems it's confirmed by [1].
Some interesting way to dig to reduce the cpu consumption on small clusters: [2]
I tried on the test cluster on our infra for gitlab, it reduced by ~10%the cpu consumption but I'm not sure it's worth it as it can impact the cluster stability.
Perhaps we should try to have 3 small nodes for the cluster management only, and bigger nodes for the workers