Configure a docker pull-through cache in Rocquencourt
To reduce the traffic and improve the latency of docker image downloads, we should install a docker pull-through cache on the Rocquencourt infra.
Requirements :
- support for multiple upstreams (at least docker.io, container-registry.softwareheritage.org, others?)
- can be configured in rke2, transparently bypassed if the cache is down
- accessible from all tenants (so that the storage space is shared, and can be used for all k8s clusters, as well as jenkins nodes)
Plausible approaches :
- kuik : https://enix.io/en/blog/cache-image-docker-kubernetes/
- pros:
- "produit sur étagère"
- used in production for large infras
- explicit control over which images are cached and over the expiry process
- the per-node proxy bypasses the local registry if the image is not available
- fine control over which pods go through the per-node proxy
- cons:
- storage not shared between tenants (the cache registry is hardcoded to the one installed within the cluster where the chart is pushed)
- republishes images instead of proxying them, so all pods are mutated by a webhook (confusing for argocd?)
- pros:
- Rolling our own cache based on the upstream docker registry
- pros:
- fine control over where the registry is available from
- simpler architecture: only one component and one ingress, can be shared between tenants
- rke2 supports using a mirror URL, and falling back to the main one if the mirror is unavailable (https://docs.rke2.io/install/containerd_registry_configuration)
- cons:
- DIY
- need a separate instance of the registry for each upstream. Supposedly this is simple to do upstream (https://www.reddit.com/r/kubernetes/comments/14o06w6/docker_pull_through_cache_to_multiple_upstreams/)
- pros:
- Harbor (can be configured as a pull-through cache for multiple upstreams)
- https://www.talos.dev/v1.6/talos-guides/configuration/pull-through-cache/
- probably overkill if we don't use other features
-
https://github.com/rpardini/docker-registry-proxy
- pretty heavy-handed (HTTPS proxy intercepts all requests)
- doesn't fallback gracefully
Deployment plan:
-
k8s test-staging-rke2 -
k8s archive-staging-rke2 -
k8s cluster-admin-rke2 -
k8s archive-production-rke2 -
jenkins nodes?
Edited by Nicolas Dandrimont