Migrate staging's elasticsearch inside k8s using the ECK operator
It's a POC to evaluate the capacity of the eck-operator to manage elasticsearch instances. The POC can be done without unplugging esnode0 until the deployment is fully validated and stressed.
- Deployed an elasticsearch cluster with 3 nodes
- Trigger a search journal client to reindex the data
- Monitor the elasticsearch cluster [1]
-
Chaos monkey tests
- Increase replica to 3 (from 1) ~> operator deals with the replicas and sharding
- Upgrade operator (current version 2.13, latest 2.15)
- Restart deployment (rollout)
- Kill a pod (out of a cluster with 3 replicas)
- Kill 2 pods (out of a cluster with 3 replicas)
- ...
- swh/infra/ci-cd/swh-charts!530 (merged): Migrate the staging search instance using the new elasticsearch instance running in kube
Note: It's the ELK stack but the official operator is named eck-operator
[1] https://grafana.softwareheritage.org/goto/Yvb3a2VHz?orgId=1
Designs
- Show closed items
Activity
-
Newest first Oldest first
-
Show all activity Show comments only Show history only
- Antoine R. Dumont changed title from Migrate staging's elasticsearch inside kubertes using the ELK operator to Migrate staging's elasticsearch inside k8s using the ELK operator
changed title from Migrate staging's elasticsearch inside kubertes using the ELK operator to Migrate staging's elasticsearch inside k8s using the ELK operator
- Antoine R. Dumont assigned to @ardumont
assigned to @ardumont
- Antoine R. Dumont changed the description
changed the description
- Antoine R. Dumont mentioned in commit swh/infra/ci-cd/swh-charts@f3d7899e
mentioned in commit swh/infra/ci-cd/swh-charts@f3d7899e
- Antoine R. Dumont mentioned in commit swh/infra/ci-cd/swh-charts@599ac7d8
mentioned in commit swh/infra/ci-cd/swh-charts@599ac7d8
- Antoine R. Dumont mentioned in commit swh/infra/ci-cd/swh-charts@a65d8378
mentioned in commit swh/infra/ci-cd/swh-charts@a65d8378
- Owner
[x] Deployed an elasticsearch cluster with 3 nodes
It's deployed with 3 nodes and accessible through service search-es-http [1] [2]
[1] 3 nodes
$ kubectl --context archive-staging-rke2 get pods -A -l common.k8s.elastic.co/type=elasticsearch -l elasticsearch.k8s.elastic.co/cluster-name=search NAMESPACE NAME READY STATUS RESTARTS AGE swh-cassandra search-es-default-0 1/1 Running 0 7m8s swh-cassandra search-es-default-1 1/1 Running 0 7m8s swh-cassandra search-es-default-2 1/1 Running 0 7m8s
[2]
bash-4.4# curl http://search-es-http:9200 { "name" : "search-es-default-0", "cluster_name" : "search", "cluster_uuid" : "5stw4hoSSUeQ7Wxd0S9eUA", "version" : { "number" : "7.15.2", "build_flavor" : "default", "build_type" : "docker", "build_hash" : "93d5a7f6192e8a1a12e154a2b81bf6fa7309da0c", "build_date" : "2021-11-04T14:04:42.515624022Z", "build_snapshot" : false, "lucene_version" : "8.9.0", "minimum_wire_compatibility_version" : "6.8.0", "minimum_index_compatibility_version" : "6.0.0-beta1" }, "tagline" : "You Know, for Search" }
- Antoine R. Dumont mentioned in commit swh/infra/ci-cd/swh-charts@815d3b66
mentioned in commit swh/infra/ci-cd/swh-charts@815d3b66
- Antoine R. Dumont mentioned in commit swh/infra/ci-cd/swh-charts@06adca1f
mentioned in commit swh/infra/ci-cd/swh-charts@06adca1f
- Antoine R. Dumont mentioned in commit swh/infra/ci-cd/swh-charts@0132bf97
mentioned in commit swh/infra/ci-cd/swh-charts@0132bf97
- Antoine R. Dumont mentioned in commit swh/infra/ci-cd/swh-charts@1f6d5ad3
mentioned in commit swh/infra/ci-cd/swh-charts@1f6d5ad3
- Antoine R. Dumont mentioned in commit swh/infra/ci-cd/swh-charts@f6d996ed
mentioned in commit swh/infra/ci-cd/swh-charts@f6d996ed
- Antoine R. Dumont mentioned in commit swh/infra/ci-cd/swh-charts@f8b8ce1b
mentioned in commit swh/infra/ci-cd/swh-charts@f8b8ce1b
- Antoine R. Dumont mentioned in commit swh/infra/ci-cd/swh-charts@52f2e5fc
mentioned in commit swh/infra/ci-cd/swh-charts@52f2e5fc
- Antoine R. Dumont mentioned in commit swh/infra/ci-cd/swh-charts@74a088cf
mentioned in commit swh/infra/ci-cd/swh-charts@74a088cf
- Antoine R. Dumont mentioned in commit swh/infra/ci-cd/swh-charts@7865649d
mentioned in commit swh/infra/ci-cd/swh-charts@7865649d
- Antoine R. Dumont mentioned in commit swh/infra/ci-cd/swh-charts@c99847fe
mentioned in commit swh/infra/ci-cd/swh-charts@c99847fe
- Antoine R. Dumont mentioned in commit swh/infra/ci-cd/swh-charts@d0e83764
mentioned in commit swh/infra/ci-cd/swh-charts@d0e83764
- Antoine R. Dumont mentioned in commit swh/infra/ci-cd/swh-charts@2da7c31f
mentioned in commit swh/infra/ci-cd/swh-charts@2da7c31f
- Antoine R. Dumont mentioned in commit swh/infra/ci-cd/swh-charts@937d2970
mentioned in commit swh/infra/ci-cd/swh-charts@937d2970
- Antoine R. Dumont mentioned in commit swh/infra/ci-cd/swh-charts@e5d7e060
mentioned in commit swh/infra/ci-cd/swh-charts@e5d7e060
- Antoine R. Dumont marked the checklist item Trigger a search journal client to reindex the data as completed
marked the checklist item Trigger a search journal client to reindex the data as completed
- Owner
[x] Trigger a search journal client to reindex the data
On the path to this, multiple things got done:
- Adapt the search rpc deployment template to allow multiple instance to run (this aligns it with most of our other swh templates)
- Adapt the actual deployments so it's compatible with the changes ^
- Adapt the journal client template so we can override values from the default
- Add a new rpc instance which allows communication with the new elasticsearch instance running in kube
- Add a new journal client deployment to use the new rpc instance
Monitoring board:
- journal clients hitting the rpc [1]
- rpc hitting the elasticsearch instance [2]
[1] https://grafana.softwareheritage.org/goto/sJ1lSO7Hk?orgId=1
[2] https://grafana.softwareheritage.org/goto/rbo2HO7Hk?orgId=1
Edited by Antoine R. Dumont - Antoine R. Dumont mentioned in commit swh/infra/ci-cd/swh-charts@64c7fd07
mentioned in commit swh/infra/ci-cd/swh-charts@64c7fd07
- Antoine R. Dumont mentioned in commit swh/infra/ci-cd/swh-charts@60afc040
mentioned in commit swh/infra/ci-cd/swh-charts@60afc040
- Owner
Bumped the number of replicas to 4 (to try and speed things up).
Grafana is being tagged along the way.
Edited by Antoine R. Dumont - Owner
Elasticsearch instance is getting filled:
bash-4.4# period=30; while true; do date; echo; curl http://search-es-http:9200/_cat/indices; sleep $period; done Thu Nov 21 13:59:59 UTC 2024 green open .geoip_databases GxLZNKk-TZaqbBF0Mz5QeQ 1 1 12 0 22.5mb 11.2mb green open origin-v0.11 3DU2O5uoTDC7HgK1G0x-mQ 1 1 359096 372904 1000.5mb 379mb Thu Nov 21 14:00:29 UTC 2024 green open .geoip_databases GxLZNKk-TZaqbBF0Mz5QeQ 1 1 12 0 22.5mb 11.2mb green open origin-v0.11 3DU2O5uoTDC7HgK1G0x-mQ 1 1 385729 399403 776.6mb 403.8mb Thu Nov 21 14:01:00 UTC 2024 green open .geoip_databases GxLZNKk-TZaqbBF0Mz5QeQ 1 1 12 0 22.5mb 11.2mb green open origin-v0.11 3DU2O5uoTDC7HgK1G0x-mQ 1 1 385729 399403 779.9mb 405.4mb Thu Nov 21 14:01:30 UTC 2024 green open .geoip_databases GxLZNKk-TZaqbBF0Mz5QeQ 1 1 12 0 22.5mb 11.2mb green open origin-v0.11 3DU2O5uoTDC7HgK1G0x-mQ 1 1 385729 399403 782.6mb 406.8mb Thu Nov 21 14:02:00 UTC 2024 green open .geoip_databases GxLZNKk-TZaqbBF0Mz5QeQ 1 1 12 0 22.5mb 11.2mb green open origin-v0.11 3DU2O5uoTDC7HgK1G0x-mQ 1 1 414560 416466 816.9mb 431.5mb
- Owner
Indexation almost done [1]. Eta is 4 days for the origin-visit-status topics [2] (origin topics is done).
[1]
Mon Nov 25 12:11:15 UTC 2024 green open .geoip_databases GxLZNKk-TZaqbBF0Mz5QeQ 1 1 37 31 70.5mb 35.2mb green open origin-v0.11 3DU2O5uoTDC7HgK1G0x-mQ 1 1 8855926 1026651 27.1gb 13.5gb
[2] https://grafana.softwareheritage.org/goto/yIFkcB7Nk?orgId=1
- Owner
Prior to actually do some chaos monkey, I wanted to enable the metrics on the elasticsearch cluster. To eventually have a trace of what's happening in the cluster (other than the plain logs).
For now, I've failed. The current operator (official one, eck-operator) used to install the elasticsearch cluster only allows to enable metrics for the operator itself and not the cluster(s) it manages.
It seems we can only enable metrics with specific tooling around elasticsearch (metricbeat, ...).
-
This is a bit britle to me since if we lose elasticsearch, we'll then lose also the monitoring on that front.
-
We already have our monitoring stack (prometheus, thanos) so we'll want to reuse it.
All in all, some investigation needs to happen on that front.
For now, i'll put this aside (some rest is in order ;).
-
Collapse replies - Owner
This article should help in addressing the monitoring of elasticsearch with prometheus [1].
[1] https://www.searchhub.io/monitor-elasticsearch-in-kubernetes-using-prometheus
- Antoine R. Dumont changed title from Migrate staging's elasticsearch inside k8s using the ELK operator to Migrate staging's elasticsearch inside k8s using the ECK operator
changed title from Migrate staging's elasticsearch inside k8s using the ELK operator to Migrate staging's elasticsearch inside k8s using the ECK operator
- Antoine R. Dumont changed the description
changed the description
- Owner
Given the time I took to get get back to this, the journal clients are done indexing [1] [2]
I'll still spend some time on figuring out how to provide metrics on the kube es instance though.
[2]
bash-4.4# period=30; while true; do date; echo; curl http://search-es-http:9200/_cat/indices; sleep $period; done Fri Dec 6 10:45:58 UTC 2024 green open .geoip_databases GxLZNKk-TZaqbBF0Mz5QeQ 1 1 36 37 69.7mb 34.8mb green open origin-v0.11 3DU2O5uoTDC7HgK1G0x-mQ 1 1 8882618 3065410 17.8gb 8.9gb
Edited by Antoine R. Dumont Collapse replies - Owner
Here we go.
The elasticsearch instance in kube can be scrapped with the following change [1] (i've tested it live in staging).
I've opened another dashboard for the elasticsearch [2]
[1] swh/infra/ci-cd/swh-charts!519 (closed)
[2] https://grafana.softwareheritage.org/goto/0xOl1h4Nz?orgId=1
- Antoine R. Dumont mentioned in commit swh/infra/ci-cd/swh-charts@6da1f47a
mentioned in commit swh/infra/ci-cd/swh-charts@6da1f47a
- Antoine R. Dumont mentioned in commit swh/infra/ci-cd/swh-charts@475e5592
mentioned in commit swh/infra/ci-cd/swh-charts@475e5592
- Antoine R. Dumont mentioned in merge request swh/infra/ci-cd/swh-charts!519 (closed)
mentioned in merge request swh/infra/ci-cd/swh-charts!519 (closed)
- Antoine R. Dumont mentioned in commit swh/infra/ci-cd/swh-charts@cd7f2449
mentioned in commit swh/infra/ci-cd/swh-charts@cd7f2449
- Antoine R. Dumont mentioned in commit swh/infra/ci-cd/swh-charts@cfcdc7b2
mentioned in commit swh/infra/ci-cd/swh-charts@cfcdc7b2
- Antoine R. Dumont mentioned in commit swh/infra/ci-cd/swh-charts@44fd43d2
mentioned in commit swh/infra/ci-cd/swh-charts@44fd43d2
- Antoine R. Dumont mentioned in commit swh/infra/ci-cd/swh-charts@befcacae
mentioned in commit swh/infra/ci-cd/swh-charts@befcacae
- Antoine R. Dumont mentioned in commit swh/infra/ci-cd/swh-charts@1668ae13
mentioned in commit swh/infra/ci-cd/swh-charts@1668ae13
- Antoine R. Dumont mentioned in commit swh/infra/ci-cd/swh-charts@7f1439a5
mentioned in commit swh/infra/ci-cd/swh-charts@7f1439a5
- Antoine R. Dumont mentioned in commit swh/infra/ci-cd/swh-charts@61a74323
mentioned in commit swh/infra/ci-cd/swh-charts@61a74323
- Antoine R. Dumont mentioned in commit swh/infra/ci-cd/swh-charts@8573d38d
mentioned in commit swh/infra/ci-cd/swh-charts@8573d38d
- Antoine R. Dumont mentioned in commit swh/infra/ci-cd/swh-charts@987adf77
mentioned in commit swh/infra/ci-cd/swh-charts@987adf77
- Antoine R. Dumont mentioned in commit swh/infra/ci-cd/swh-charts@1a47192e
mentioned in commit swh/infra/ci-cd/swh-charts@1a47192e
- Antoine R. Dumont mentioned in commit swh/infra/ci-cd/swh-charts@0b976e83
mentioned in commit swh/infra/ci-cd/swh-charts@0b976e83
- Antoine R. Dumont mentioned in commit swh/infra/ci-cd/swh-charts@a70c7e17
mentioned in commit swh/infra/ci-cd/swh-charts@a70c7e17
- Antoine R. Dumont mentioned in commit swh/infra/ci-cd/swh-charts@3f95c21c
mentioned in commit swh/infra/ci-cd/swh-charts@3f95c21c
- Antoine R. Dumont mentioned in commit swh/infra/puppet/puppet-swh-site@475021d9
mentioned in commit swh/infra/puppet/puppet-swh-site@475021d9
- Antoine R. Dumont changed the description
changed the description