staging: Migrate swh.counter rpc to dynamic infra
List of services to migrate:
- gunicorn-swh-counters
- swh-counters-journal-client
gunicorn-swh-counters access to:
- counter0.staging: redis backend. Not to be migrated. Stays on that node.
- counter0.staging: /srv/softwareheritage/counters holding *.json historic dataset [2]
journal client access to:
- counter0.staging: redis backend
Plan:
-
swh-apps: Create docker image -
swh-apps: Build and push image to gitlab registry -
swh-charts: Reference image -
swh/infra/ci-cd/swh-charts!257 (merged): swh-charts: Add template for counters (including journal client & rpc) -
swh-charts: Deploy journal client to dynamic infra (only dependency is access to redis) -
Deploy -
Deactivate journal client in counter0.staging -
Checks -
https://grafana.softwareheritage.org/goto/RhNvBmvSz?orgId=1: grafana board -
no lag [3]
-
-
Historic data served by gitlab in swh.counters repository as snippet [4] -
swh/infra/ci-cd/swh-charts!262 (merged): cronjob: Add counter cronjob to refresh periodically cache (conditional). This will replace a current deployed cron in production. -
swh/infra/ci-cd/swh-charts!262 (merged): counters/rpc: Add init-container to retrieve the historical dataset (currently installed by puppet on the counters nodes) -
swh/infra/ci-cd/swh-charts!262 (merged): swh-charts: Add staging configuration -
swh/infra/puppet/puppet-swh-site!671 (merged): counters0.staging: Decommission journal client -
swh/infra/puppet/puppet-swh-site!672 (closed): Adapt redis backend so it can be remotely accessed by the pods in the cluster (if needed) -
swh/infra/ci-cd/swh-charts!262 (merged): Deploy counters rpc service to dynamic infra -
Adapt firewall rule so staging workers can access thanos.internal.admin.swh.network -
swh/infra/ci-cd/swh-charts!266 (merged): swh-charts: Adapt dependent services to use the new counter in dynamic infra -
Stop gunicorn-swh-counters (manually) -
Checks
Post-migration:
-
swh/infra/puppet/puppet-swh-site!673 (merged): Decommission counter0.staging's swh services
[1]
root@counters0:~# systemctl list-units | grep swh
gunicorn-swh-counters.service loaded active running Gunicorn instance swh-counters
swh-counters-journal-client.service loaded active running Software Heritage Counters Journal Client
[2]
root@counters0:~# ls -lah /srv/softwareheritage/counters/
total 1.0M
drwxrwxr-x 2 swhstorage swhstorage 4.0K Dec 5 08:00 .
drwxr-xr-x 3 root root 4.0K Apr 9 2021 ..
-rw-r--r-- 1 root root 68K Apr 9 2021 history-counters.munin.json
-rw-r--r-- 1 swhstorage swhstorage 152K Dec 5 08:00 history.json
-rw-r--r-- 1 root root 350K Apr 9 2021 static-12.json
-rw-r--r-- 1 root root 175K Apr 9 2021 static-24.json
-rw-r--r-- 1 root root 267K Apr 9 2021 static.json
[3] for all topics/partitions, lag is 0
root@kafka1:~# $KAFKA_CONSUMER_GROUPS --bootstrap-server $SERVER --describe --group $group_id
GROUP TOPIC PARTITION CURRENT-OFFSET LOG-END-OFFSET LAG CONSUMER-ID HOST CLIENT-ID
swh.counters.journal_client swh.journal.objects.snapshot 58 123623 123623 0 rdkafka-63ab8785-cb29-49af-a6a0-36bcb44709b3 /192.168.130.145 rdkafka
...
[4] swh/devel/swh-counters$1617
Refs. #4780 (closed)
Edited by Antoine R. Dumont