Provision persistent ceph storage for the rke2 clusters
For workloads that require it, we need some persistent, distributed storage provisioned on the kubernetes clusters.
For now, these workloads are pinned to specific nodes.
Our proxmox cluster now has plenty of free space on ceph, backed by fast disks. We should be able to provision ceph storage for our kube clusters on it.
- firewall: open access to ceph cluster (upstream reference)
- test-staging-rke2
- admin-rke2
- archive-staging-rke2
- test-staging-rke2 cluster
- provision ceph rbd pool (upstream reference)
- provision ceph user (upstream reference)
- install and configure rook on cluster (upstream reference)
- admin-rke2 cluster
- provision ceph rbd pool
- provision ceph user
- install and configure rook on cluster
- archive-staging-rke2 cluster
- provision ceph rbd pool
- provision ceph user
- install and configure rook on cluster
- archive-production-rke2 cluster
- provision ceph rbd pool
- provision ceph user
- install and configure rook on cluster
Activity
-
Newest first Oldest first
-
Show all activity Show comments only Show history only
- Nicolas Dandrimont changed milestone to %Dynamic infrastructure [Roadmap - Tooling and infrastructure]
changed milestone to %Dynamic infrastructure [Roadmap - Tooling and infrastructure]
- Nicolas Dandrimont assigned to @olasd
assigned to @olasd
- Nicolas Dandrimont changed the description
changed the description
- Nicolas Dandrimont marked the checklist item test-staging-rke2 as completed
marked the checklist item test-staging-rke2 as completed
- Author Owner
To open the firewall I've set up or updated the following host aliases:
-
swh_admin_kube_all
: All nodes for admin kube cluster -
swh_admin_kube_mgmt
: Management nodes for admin kube cluster -
swh_admin_kube_workers
: Worker nodes for admin kube cluster -
swh_production_kube_all
: All nodes for production kube cluster -
swh_production_kube_mgmt
: Management nodes for production kube cluster -
swh_production_kube_workers
: (already existing, added saam and banco) swh production kubernetes workers -
swh_staging_kube_all
: All nodes for staging kube cluster -
swh_staging_kube_mgmt
: Management nodes for staging kubernetes cluster -
swh_staging_kube_workers
: (already existing) Staging Cluster rke2 kubernetes workers -
swh_test_staging_kube_all
: All kube hosts for test-staging-rke2 -
swh_test_staging_kube_mgmt
: Management nodes for test rke2 cluster -
swh_test_staging_kube_workers
: (already existing) Test Staging cluster rke2 kubernetes workers -
swh_ceph_on_proxmox
: Ceph on Proxmox hosts
And the following port aliases:
-
ceph_dynamic_ports
: Dynamic ports for Ceph daemons -
ceph_mon_port
: Ceph Monitor public port -
ceph_ports
: Ports for all Ceph daemons
I've added a rule on VLAN440:
swh_test_staging_kube_all
->swh_ceph_on_proxmox
TCP on portsceph_ports
1 -
Collapse replies - Author Owner
Replaced
ceph_mon_port
withceph_mon_ports
(added port 3300).Added a rule from my vpn host and validated ceph cluster access.
- Nicolas Dandrimont marked the checklist item provision ceph rbd pool (upstream reference) as completed
marked the checklist item provision ceph rbd pool (upstream reference) as completed
- Nicolas Dandrimont marked the checklist item provision ceph user (upstream reference) as completed
marked the checklist item provision ceph user (upstream reference) as completed
- Author Owner
Provisioning of the ceph rbd pool and user:
root@hypervisor3:~# ceph osd pool create k8s.test-staging-rke2.rbd pool 'k8s.test-staging-rke2.rbd' created root@hypervisor3:~# rbd pool init k8s.test-staging-rke2.rbd root@hypervisor3:~# ceph auth get-or-create client.k8s-test-staging-rke2 mon 'profile rbd' osd 'profile rbd pool=k8s.test-staging-rke2.rbd' mgr 'profile rbd pool=k8s.test-staging-rke2.rbd' [client.k8s-test-staging-rke2] key = [redacted]
- Author Owner
Retrieve user key:
ceph auth print-key client.k8s-test-staging-rke2
- Author Owner
Turns out rook needs a bunch more access to the ceph cluster. Therefore, a script has been written: swh/devel/snippets@11e70eab
rook-ceph-values.yaml
--- priorityClassName: cluster-components-system # Lower resource requirements on puny minikube cluster resources: requests: cpu: 100m csi: csiRBDProvisionerResource: | - name : csi-provisioner resource: requests: memory: 128Mi cpu: 1m limits: memory: 256Mi - name : csi-resizer resource: requests: memory: 128Mi cpu: 1m limits: memory: 256Mi - name : csi-attacher resource: requests: memory: 128Mi cpu: 1m limits: memory: 256Mi - name : csi-snapshotter resource: requests: memory: 128Mi cpu: 1m limits: memory: 256Mi - name : csi-rbdplugin resource: requests: memory: 512Mi limits: memory: 1Gi - name : csi-omap-generator resource: requests: memory: 512Mi cpu: 2m limits: memory: 1Gi - name : liveness-prometheus resource: requests: memory: 128Mi cpu: 1m limits: memory: 256Mi # -- CEPH CSI RBD plugin resource requirement list # @default -- see values.yaml csiRBDPluginResource: | - name : driver-registrar resource: requests: memory: 128Mi cpu: 1m limits: memory: 256Mi - name : csi-rbdplugin resource: requests: memory: 512Mi cpu: 2m limits: memory: 1Gi - name : liveness-prometheus resource: requests: memory: 128Mi cpu: 1m limits: memory: 256Mi # -- CEPH CSI CephFS provisioner resource requirement list # @default -- see values.yaml csiCephFSProvisionerResource: | - name : csi-provisioner resource: requests: memory: 128Mi cpu: 1m limits: memory: 256Mi - name : csi-resizer resource: requests: memory: 128Mi cpu: 1m limits: memory: 256Mi - name : csi-attacher resource: requests: memory: 128Mi cpu: 1m limits: memory: 256Mi - name : csi-snapshotter resource: requests: memory: 128Mi cpu: 1m limits: memory: 256Mi - name : csi-cephfsplugin resource: requests: memory: 512Mi cpu: 2m limits: memory: 1Gi - name : liveness-prometheus resource: requests: memory: 128Mi cpu: 1m limits: memory: 256Mi # -- CEPH CSI CephFS plugin resource requirement list # @default -- see values.yaml csiCephFSPluginResource: | - name : driver-registrar resource: requests: memory: 128Mi cpu: 1m limits: memory: 256Mi - name : csi-cephfsplugin resource: requests: memory: 512Mi cpu: 1m limits: memory: 1Gi - name : liveness-prometheus resource: requests: memory: 128Mi cpu: 1m limits: memory: 256Mi
rook-ceph-cluster-values.yaml
--- operatorNamespace: rook-ceph cephClusterSpec: external: enable: true crashCollector: disable: true healthCheck: daemonHealth: mon: disabled: false interval: 45s cephBlockPools: [] cephFileSystems: [] cephObjectStores: []
Populating the k8s cluster resources with the output of the ceph provisioning script:
$ kubectl apply -f ceph-provisioning-output.yaml secret/admin-secret unchanged secret/rook-csi-rbd-provisioner unchanged secret/rook-csi-rbd-node unchanged secret/rook-csi-cephfs-provisioner unchanged secret/rook-csi-cephfs-node unchanged configmap/rook-ceph-mon-endpoints unchanged secret/rook-ceph-mon unchanged storageclass.storage.k8s.io/ceph-rbd unchanged storageclass.storage.k8s.io/cephfs unchanged
Then installing the rook components
helm repo add rook-release https://charts.rook.io/release helm install --replace --create-namespace --namespace rook-ceph rook-ceph rook-release/rook-ceph --version '~1.13' -f rook-ceph-values.yaml helm install --replace --namespace rook-ceph rook-ceph-cluster rook-release/rook-ceph-cluster --version '~1.13' -f rook-ceph-cluster-values.yaml
Should eventually yield:
$ kubectl get -n rook-ceph cephclusters NAME DATADIRHOSTPATH MONCOUNT AGE PHASE MESSAGE HEALTH EXTERNAL FSID rook-ceph /var/lib/rook 3 66m Connected Cluster connected successfully HEALTH_OK true 82307cfd-401a-4b58-8d29-7cbcd98f26fd
Generating a few workloads with PVCs of the right storage class, they get provisioned in ceph:
$ kubectl get -A pvc NAMESPACE NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE docker-cache image-store-dockercache-docker-io-0 Bound pvc-41d57f80-086c-4a8c-b884-cd503db75dbb 30Gi RWO ceph-rbd 32m docker-cache image-store-dockercache-swh-0 Bound pvc-3692bcef-0b37-483e-b1c8-20b9123ce666 10Gi RWO ceph-rbd 32m
- Nicolas Dandrimont marked the checklist item install and configure rook on cluster (upstream reference) as completed
marked the checklist item install and configure rook on cluster (upstream reference) as completed
- Author Owner
I've manually installed the rook config on the test-staging-rke2 cluster by:
- manually creating argocd applications for the rook-ceph and rook-ceph-cluster helm charts
- manually applying the output of the provisioning script (the few secrets and the storageclasses) to the cluster
I've started to write an ApplicationSet to apply the config to all the clusters, but to keep it DRY I would like to use a newer version of the ApplicationSet CRD which supports
templatePatch
. Looks like it's argocd upgrade time? 1 - Nicolas Dandrimont mentioned in issue #5249 (closed)
mentioned in issue #5249 (closed)
- Nicolas Dandrimont mentioned in commit swh/infra/ci-cd/k8s-clusters-conf@959176a8
mentioned in commit swh/infra/ci-cd/k8s-clusters-conf@959176a8
- Nicolas Dandrimont mentioned in merge request swh/infra/ci-cd/k8s-clusters-conf!36 (merged)
mentioned in merge request swh/infra/ci-cd/k8s-clusters-conf!36 (merged)
- Nicolas Dandrimont mentioned in commit swh/infra/ci-cd/k8s-clusters-conf@67e40925
mentioned in commit swh/infra/ci-cd/k8s-clusters-conf@67e40925
- Nicolas Dandrimont marked the checklist item provision ceph rbd pool as completed
marked the checklist item provision ceph rbd pool as completed
- Nicolas Dandrimont marked the checklist item provision ceph user as completed
marked the checklist item provision ceph user as completed
- Nicolas Dandrimont marked the checklist item install and configure rook on cluster as completed
marked the checklist item install and configure rook on cluster as completed
- Nicolas Dandrimont marked the checklist item admin-rke2 as completed
marked the checklist item admin-rke2 as completed