Upgrade k8s on AKS clusters
Here is the current kubernetes version of our AKS clusters:
ᐅ az aks list | jq -r '.[]| "\(.name) \(.kubernetesVersion) \(.resourceGroup)"' | \
awk 'BEGIN{format="%-25s %-20s %-15s\n";
printf format,"Cluster Name","Kubernetes Version", "Resource Group";
printf format,"---","---","---"}
{printf format,$1,$2,$3}'
Cluster Name Kubernetes Version Resource Group
--- --- ---
euwest-gitlab-staging 1.26.10 euwest-gitlab-staging
euwest-rancher 1.26.6 euwest-rancher
euwest-gitlab-production 1.26.10 euwest-gitlab-production
The 1.26 version has reached his end of life, see AKS Kubernetes release calendar.
I'm not sure of the target version .
Activity
-
Newest first Oldest first
-
Show all activity Show comments only Show history only
- Guillaume Samson added kubernetes label
added kubernetes label
- Guillaume Samson assigned to @guillaume
assigned to @guillaume
- Guillaume Samson changed the description
changed the description
- OwnerResolved by Guillaume Samson
Rancher 2.8.x supports up to kubernetes 1.28.
Our GitLab operator version (1.1.1) claims support for up to kubernetes 1.29
1 reply Last reply by Guillaume Samson
- Author Owner
Rancher cluster upgraded to 1.28.9:
ᐅ az aks nodepool get-upgrades --resource-group euwest-rancher \ --nodepool-name default \ --cluster-name euwest-rancher --output table KubernetesVersion LatestNodeImageVersion Name OsType ResourceGroup ------------------- ---------------------------------------- ------- -------- --------------- 1.28.9 AKSUbuntu-2204gen2containerd-202406.07.0 default Linux euwest-rancher
- Author Owner
Staging GitLab cluster upgraded to 1.29.4:
ᐅ az aks nodepool get-upgrades --resource-group euwest-gitlab-staging \ --nodepool-name default \ --cluster-name euwest-gitlab-staging --output table KubernetesVersion LatestNodeImageVersion Name OsType ResourceGroup ------------------- ---------------------------------------- ------- -------- --------------------- 1.29.4 AKSUbuntu-2204gen2containerd-202406.07.0 default Linux euwest-gitlab-staging
The kubernetes role is no more shown with kubectl:
ᐅ kb --context euwest-gitlab-staging get nodes NAME STATUS ROLES AGE VERSION aks-default-31796401-vmss0000as Ready <none> 17m v1.29.4 aks-default-31796401-vmss0000fl Ready <none> 15m v1.29.4 aks-default-31796401-vmss0000gk Ready <none> 12m v1.29.4 aks-default-31796401-vmss0000gw Ready <none> 3m17s v1.29.4
ᐅ kb --context euwest-gitlab-staging describe nodes aks-default-31796401-vmss0000as | \ awk '/.*role=.*agent.*/' kubernetes.azure.com/role=agent
ᐅ kb --context euwest-gitlab-production describe nodes aks-default-31036454-vmss00007m | \ awk '/.*role=.*agent.*/' kubernetes.azure.com/role=agent kubernetes.io/role=agent
There will be service interruptions when upgrading k8s in the production GitLab:
- every time postgresql, redis, ... pods were drained en recreated on new nodes, pulling image took some time;
- there were several errors due to insufficient memory and the Azure node pool autoscale took a long time.
Warning FailedScheduling 7s default-scheduler 0/3 nodes are available: 2 Insufficient memory, 3 Insufficient cpu. preemption: 0/3 nodes are available: 3 No preemption victims found for incoming pod.
- Guillaume Samson added 2h of time spent
added 2h of time spent
- Owner
Feel free to bump the size of the production node pool before launching the upgrade(s) to avoid the memory issue.
The disruptiveness is kind of unavoidable with statefulsets
- Author Owner
This should avoid the memory issue on GitLab production during the k8s upgrades:
ᐅ az aks show --resource-group euwest-gitlab-production \ --name euwest-gitlab-production --query agentPoolProfiles | \ jq -r '.[] | "count: \(.count)\nenableAutoScaling: \(.enableAutoScaling) maxCount: \(.maxCount)\nminCount: \(.minCount)"' count: 5 enableAutoScaling: true maxCount: 6 minCount: 5
- Author Owner
Production GitLab cluster upgraded to 1.29.4:
ᐅ az aks nodepool get-upgrades --resource-group euwest-gitlab-production \ --nodepool-name default \ --cluster-name euwest-gitlab-production --output table KubernetesVersion LatestNodeImageVersion Name OsType ResourceGroup ------------------- ---------------------------------------- ------- -------- ------------------------ 1.29.4 AKSUbuntu-2204gen2containerd-202406.07.0 default Linux euwest-gitlab-production
ᐅ kb --context euwest-gitlab-production get nodes NAME STATUS ROLES AGE VERSION aks-default-31036454-vmss00007m Ready <none> 46m v1.29.4 aks-default-31036454-vmss0000av Ready <none> 39m v1.29.4 aks-default-31036454-vmss0000b6 Ready <none> 33m v1.29.4 aks-default-31036454-vmss0000c5 Ready <none> 30m v1.29.4 aks-default-31036454-vmss0000c8 Ready <none> 28m v1.29.4
Autoscaling parameters:
ᐅ az aks show --resource-group euwest-gitlab-production \ --name euwest-gitlab-production --query agentPoolProfiles | \ jq -r '.[] | "count: \(.count)\nenableAutoScaling: \(.enableAutoScaling) maxCount: \(.maxCount)\nminCount: \(.minCount)"' count: 5 enableAutoScaling: true maxCount: 6 minCount: 4
- Guillaume Samson added 2h of time spent
added 2h of time spent
- Guillaume Samson mentioned in commit swh-sysadmin-provisioning@52cb1100
mentioned in commit swh-sysadmin-provisioning@52cb1100
- Author Owner
There is a deprecation warning on terraform plan:
No changes. Your infrastructure matches the configuration. Terraform has compared your real infrastructure against your configuration and found no differences, so no changes are needed. ╷ │ Warning: Argument is deprecated │ │ with module.gitlab-staging.module.gitlab_aks_cluster.azurerm_monitor_diagnostic_setting.diagnostic, │ on modules/kubernetes/main.tf line 54, in resource "azurerm_monitor_diagnostic_setting" "diagnostic": │ 54: resource "azurerm_monitor_diagnostic_setting" "diagnostic" { │ │ `retention_policy` has been deprecated in favor of `azurerm_storage_management_policy` resource - to learn more https://aka.ms/diagnostic_settings_log_retention │ │ (and 17 more similar warnings elsewhere) ╵
- Guillaume Samson closed
closed