swh/production: Add objstorage checker
Related to product-management/core-platform#23
These modifications will deploy two objstorage checker in production:
- one for banco backend (4 replicas);
- one for saam backend (4 replicas).
I really don't know if another objstorage backend should be added. Azure doesn't have to but what about aws and winery ?
The scrubber configuration initialization in database will be process in an init-container.
The content id will be read from kafka topic.
Helm diff
[swh] Comparing changes between branches production and objstorage_checker_production (per environment)...
Your branch is up to date with 'origin/production'.
[swh] Generate config in production branch for environment staging, namespace swh...
[swh] Generate config in production branch for environment staging, namespace swh-cassandra...
[swh] Generate config in production branch for environment staging, namespace swh-cassandra-next-version...
Your branch is up to date with 'origin/objstorage_checker_production'.
[swh] Generate config in objstorage_checker_production branch for environment staging...
[swh] Generate config in objstorage_checker_production branch for environment staging...
[swh] Generate config in objstorage_checker_production branch for environment staging...
Your branch is up to date with 'origin/production'.
[swh] Generate config in production branch for environment production, namespace swh...
[swh] Generate config in production branch for environment production, namespace swh-cassandra...
[swh] Generate config in production branch for environment production, namespace swh-cassandra-next-version...
Your branch is up to date with 'origin/objstorage_checker_production'.
[swh] Generate config in objstorage_checker_production branch for environment production...
[swh] Generate config in objstorage_checker_production branch for environment production...
[swh] Generate config in objstorage_checker_production branch for environment production...
------------- diff for environment staging namespace swh -------------
_ __ __
_| |_ _ / _|/ _| between /tmp/swh-chart.swh.PP2qFNRE/staging-swh.before, 113 documents
/ _' | | | | |_| |_ and /tmp/swh-chart.swh.PP2qFNRE/staging-swh.after, 113 documents
| (_| | |_| | _| _|
\__,_|\__, |_| |_| returned no differences
|___/
------------- diff for environment staging namespace swh-cassandra -------------
_ __ __
_| |_ _ / _|/ _| between /tmp/swh-chart.swh.PP2qFNRE/staging-swh-cassandra.before, 401 documents
/ _' | | | | |_| |_ and /tmp/swh-chart.swh.PP2qFNRE/staging-swh-cassandra.after, 401 documents
| (_| | |_| | _| _|
\__,_|\__, |_| |_| returned no differences
|___/
------------- diff for environment staging namespace swh-cassandra-next-version -------------
_ __ __
_| |_ _ / _|/ _| between /tmp/swh-chart.swh.PP2qFNRE/staging-swh-cassandra-next-version.before, 285 documents
/ _' | | | | |_| |_ and /tmp/swh-chart.swh.PP2qFNRE/staging-swh-cassandra-next-version.after, 285 documents
| (_| | |_| | _| _|
\__,_|\__, |_| |_| returned no differences
|___/
------------- diff for environment production namespace swh -------------
_ __ __
_| |_ _ / _|/ _| between /tmp/swh-chart.swh.PP2qFNRE/production-swh.before, 427 documents
/ _' | | | | |_| |_ and /tmp/swh-chart.swh.PP2qFNRE/production-swh.after, 427 documents
| (_| | |_| | _| _|
\__,_|\__, |_| |_| returned no differences
|___/
------------- diff for environment production namespace swh-cassandra -------------
_ __ __
_| |_ _ / _|/ _| between /tmp/swh-chart.swh.PP2qFNRE/production-swh-cassandra.before, 96 documents
/ _' | | | | |_| |_ and /tmp/swh-chart.swh.PP2qFNRE/production-swh-cassandra.after, 100 documents
| (_| | |_| | _| _|
\__,_|\__, |_| |_| returned one difference
|___/
(file level)
---
# Source: swh/templates/scrubber/objstorage-checker-configmap.yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: scrubber-objstoragechecker-banco-template
namespace: swh-cassandra
data:
config.yml.template: |
scrubber:
cls: postgresql
db: host=postgresql-scrubber-rw.internal.softwareheritage.org port=5432 user=swh-scrubber
dbname=swh-scrubber password=${SCRUBBER_POSTGRESQL_PASSWORD}
storage:
auth_provider:
cls: cassandra.auth.PlainTextAuthProvider
password: ${CASSANDRA_PASSWORD}
username: swh-ro
cls: cassandra
consistency_level: LOCAL_QUORUM
hosts:
- cassandra01.internal.softwareheritage.org
- cassandra02.internal.softwareheritage.org
- cassandra03.internal.softwareheritage.org
- cassandra04.internal.softwareheritage.org
- cassandra05.internal.softwareheritage.org
- cassandra06.internal.softwareheritage.org
- cassandra07.internal.softwareheritage.org
- cassandra08.internal.softwareheritage.org
- cassandra09.internal.softwareheritage.org
- cassandra10.internal.softwareheritage.org
keyspace: swh
journal:
brokers:
- kafka1.internal.softwareheritage.org:9094
- kafka2.internal.softwareheritage.org:9094
- kafka3.internal.softwareheritage.org:9094
- kafka4.internal.softwareheritage.org:9094
cls: kafka
group_id: swh-archive-prod-objstoragechecker
message.max.bytes: "524288000"
prefix: swh.journal.objects
sasl.mechanism: SCRAM-SHA-512
sasl.password: ${BROKER_USER_PASSWORD}
sasl.username: ${BROKER_USER}
security.protocol: SASL_SSL
objstorage:
cls: remote
name: banco
url: http://objstorage-ro-banco-xfs-rpc-ingress
# Source: swh/templates/scrubber/objstorage-checker-configmap.yaml
apiVersion: v1
kind: ConfigMap
metadata:
namespace: swh-cassandra
name: scrubber-objstoragechecker-saam-template
data:
config.yml.template: |
scrubber:
cls: postgresql
db: host=postgresql-scrubber-rw.internal.softwareheritage.org port=5432 user=swh-scrubber
dbname=swh-scrubber password=${SCRUBBER_POSTGRESQL_PASSWORD}
storage:
auth_provider:
cls: cassandra.auth.PlainTextAuthProvider
password: ${CASSANDRA_PASSWORD}
username: swh-ro
cls: cassandra
consistency_level: LOCAL_QUORUM
hosts:
- cassandra01.internal.softwareheritage.org
- cassandra02.internal.softwareheritage.org
- cassandra03.internal.softwareheritage.org
- cassandra04.internal.softwareheritage.org
- cassandra05.internal.softwareheritage.org
- cassandra06.internal.softwareheritage.org
- cassandra07.internal.softwareheritage.org
- cassandra08.internal.softwareheritage.org
- cassandra09.internal.softwareheritage.org
- cassandra10.internal.softwareheritage.org
keyspace: swh
journal:
brokers:
- kafka1.internal.softwareheritage.org:9094
- kafka2.internal.softwareheritage.org:9094
- kafka3.internal.softwareheritage.org:9094
- kafka4.internal.softwareheritage.org:9094
cls: kafka
group_id: swh-archive-prod-objstoragechecker
message.max.bytes: "524288000"
prefix: swh.journal.objects
sasl.mechanism: SCRAM-SHA-512
sasl.password: ${BROKER_USER_PASSWORD}
sasl.username: ${BROKER_USER}
security.protocol: SASL_SSL
objstorage:
cls: remote
name: saam
url: http://objstorage-ro-saam-zfs-rpc-ingress
# Source: swh/templates/scrubber/objstorage-checker-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: scrubber-objstoragechecker-banco
namespace: swh-cassandra
labels:
app: scrubber-objstoragechecker-banco
spec:
revisionHistoryLimit: 2
replicas: 4
selector:
matchLabels:
app: scrubber-objstoragechecker-banco
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 1
template:
metadata:
labels:
app: scrubber-objstoragechecker-banco
annotations:
# Force a rollout upgrade if the configuration changes
checksum/config: 8cc7281c6aa35ad7a65b109c235fe6c749a64c480d0ead51a8b7bf99fb52d528
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: swh/scrubber
operator: In
values:
- "true"
priorityClassName: swh-cassandra-background-workload
initContainers:
- name: prepare-configuration
image: "container-registry.softwareheritage.org/swh/infra/swh-apps/utils:20231211.1"
imagePullPolicy: IfNotPresent
env:
- name: BROKER_USER
valueFrom:
secretKeyRef:
key: BROKER_USER
name: swh-archive-broker-secret
optional: false
- name: BROKER_USER_PASSWORD
valueFrom:
secretKeyRef:
key: BROKER_USER_PASSWORD
name: swh-archive-broker-secret
optional: false
- name: CASSANDRA_PASSWORD
valueFrom:
secretKeyRef:
key: cassandra-swh-ro-password
name: common-secrets
optional: false
- name: SCRUBBER_POSTGRESQL_PASSWORD
valueFrom:
secretKeyRef:
key: postgres-swh-scrubber-password
name: swh-scrubber-postgresql-common-secret
optional: false
command:
- /entrypoints/prepare-configuration.sh
volumeMounts:
- name: config-utils
mountPath: /entrypoints
readOnly: true
- name: configuration
mountPath: /etc/swh
- name: configuration-template
mountPath: /etc/swh/configuration-template
- name: initialize-backend
image: "container-registry.softwareheritage.org/swh/infra/swh-apps/toolbox:20240618.1"
command:
- /entrypoints/initialize-backend.sh
env:
- name: MODULE
value: scrubber
- name: MODULE_CONFIG_KEY
value:
- name: SWH_CONFIG_FILENAME
value: /etc/swh/config.yml
- name: SWH_PGDATABASE
value: swh-scrubber
- name: SWH_PGPASSWORD
valueFrom:
secretKeyRef:
name: ${SCRUBBER_POSTGRESQL_PASSWORD}
key: password
- name: SWH_PGHOST
valueFrom:
secretKeyRef:
name: ${SCRUBBER_POSTGRESQL_PASSWORD}
key: host
volumeMounts:
- name: configuration
mountPath: /etc/swh
- name: database-utils
mountPath: /entrypoints
- name: check-scrubber-migration
# TODO: Add the "datastore" registration
# A workaround is needed as the registration is not idempotent
# and can't be launched each time a scrubber is launched
image: "container-registry.softwareheritage.org/swh/infra/swh-apps/scrubber:20240618.2"
command:
- /entrypoints/check-backend-version.sh
env:
- name: MODULE
value: scrubber
- name: MODULE_CONFIG_KEY
value:
- name: SWH_CONFIG_FILENAME
value: /etc/swh/config.yml
volumeMounts:
- name: configuration
mountPath: /etc/swh
- name: database-utils
mountPath: /entrypoints
- name: check-storage-migration
image: "container-registry.softwareheritage.org/swh/infra/swh-apps/scrubber:20240618.2"
command:
- /entrypoints/check-backend-version.sh
env:
- name: MODULE
value: storage
- name: MODULE_CONFIG_KEY
value:
- name: SWH_CONFIG_FILENAME
value: /etc/swh/config.yml
volumeMounts:
- name: configuration
mountPath: /etc/swh
- name: database-utils
mountPath: /entrypoints
containers:
- name: storage-checker
resources:
requests:
memory: 200Mi
cpu: 400m
image: "container-registry.softwareheritage.org/swh/infra/swh-apps/scrubber:20240618.2"
imagePullPolicy: IfNotPresent
command:
- /opt/swh/entrypoint.sh
args:
- swh
- scrubber
- check
- run
- objstorage-banco-content
- "--use-journal"
env:
- name: STATSD_HOST
value: prometheus-statsd-exporter
- name: STATSD_PORT
value: 9125
- name: STATSD_TAGS
value: "deployment:scrubber-objstoragechecker-banco"
- name: MAX_TASKS_PER_CHILD
value: 1
- name: SWH_LOG_LEVEL
value: INFO
- name: SWH_CONFIG_FILENAME
value: /etc/swh/config.yml
- name: SWH_SENTRY_ENVIRONMENT
value: production
- name: SWH_MAIN_PACKAGE
value: swh.scrubber
- name: SWH_SENTRY_DSN
valueFrom:
secretKeyRef:
name: common-secrets
key: scrubber-sentry-dsn
# 'name' secret must exist & include key "host"
optional: false
volumeMounts:
- name: configuration
mountPath: /etc/swh
volumes:
- name: configuration
emptyDir: {}
- name: configuration-template
configMap:
name: scrubber-objstoragechecker-banco-template
defaultMode: 0777
items:
- key: config.yml.template
path: config.yml.template
- name: database-utils
configMap:
name: database-utils
defaultMode: 0555
- name: config-utils
configMap:
name: config-utils
defaultMode: 0555
# Source: swh/templates/scrubber/objstorage-checker-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: scrubber-objstoragechecker-saam
namespace: swh-cassandra
labels:
app: scrubber-objstoragechecker-saam
spec:
revisionHistoryLimit: 2
replicas: 4
selector:
matchLabels:
app: scrubber-objstoragechecker-saam
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 1
template:
metadata:
labels:
app: scrubber-objstoragechecker-saam
annotations:
# Force a rollout upgrade if the configuration changes
checksum/config: b38150145c57d806ee1b3a2d049fc112aa46d59036dcadaaf0c0437477067bb3
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: swh/scrubber
operator: In
values:
- "true"
priorityClassName: swh-cassandra-background-workload
initContainers:
- name: prepare-configuration
image: "container-registry.softwareheritage.org/swh/infra/swh-apps/utils:20231211.1"
imagePullPolicy: IfNotPresent
env:
- name: BROKER_USER
valueFrom:
secretKeyRef:
key: BROKER_USER
name: swh-archive-broker-secret
optional: false
- name: BROKER_USER_PASSWORD
valueFrom:
secretKeyRef:
key: BROKER_USER_PASSWORD
name: swh-archive-broker-secret
optional: false
- name: CASSANDRA_PASSWORD
valueFrom:
secretKeyRef:
key: cassandra-swh-ro-password
name: common-secrets
optional: false
- name: SCRUBBER_POSTGRESQL_PASSWORD
valueFrom:
secretKeyRef:
key: postgres-swh-scrubber-password
name: swh-scrubber-postgresql-common-secret
optional: false
command:
- /entrypoints/prepare-configuration.sh
volumeMounts:
- name: config-utils
mountPath: /entrypoints
readOnly: true
- name: configuration
mountPath: /etc/swh
- name: configuration-template
mountPath: /etc/swh/configuration-template
- name: initialize-backend
image: "container-registry.softwareheritage.org/swh/infra/swh-apps/toolbox:20240618.1"
command:
- /entrypoints/initialize-backend.sh
env:
- name: MODULE
value: scrubber
- name: MODULE_CONFIG_KEY
value:
- name: SWH_CONFIG_FILENAME
value: /etc/swh/config.yml
- name: SWH_PGDATABASE
value: swh-scrubber
- name: SWH_PGPASSWORD
valueFrom:
secretKeyRef:
name: ${SCRUBBER_POSTGRESQL_PASSWORD}
key: password
- name: SWH_PGHOST
valueFrom:
secretKeyRef:
name: ${SCRUBBER_POSTGRESQL_PASSWORD}
key: host
volumeMounts:
- name: configuration
mountPath: /etc/swh
- name: database-utils
mountPath: /entrypoints
- name: check-scrubber-migration
# TODO: Add the "datastore" registration
# A workaround is needed as the registration is not idempotent
# and can't be launched each time a scrubber is launched
image: "container-registry.softwareheritage.org/swh/infra/swh-apps/scrubber:20240618.2"
command:
- /entrypoints/check-backend-version.sh
env:
- name: MODULE
value: scrubber
- name: MODULE_CONFIG_KEY
value:
- name: SWH_CONFIG_FILENAME
value: /etc/swh/config.yml
volumeMounts:
- name: configuration
mountPath: /etc/swh
- name: database-utils
mountPath: /entrypoints
- name: check-storage-migration
image: "container-registry.softwareheritage.org/swh/infra/swh-apps/scrubber:20240618.2"
command:
- /entrypoints/check-backend-version.sh
env:
- name: MODULE
value: storage
- name: MODULE_CONFIG_KEY
value:
- name: SWH_CONFIG_FILENAME
value: /etc/swh/config.yml
volumeMounts:
- name: configuration
mountPath: /etc/swh
- name: database-utils
mountPath: /entrypoints
containers:
- name: storage-checker
resources:
requests:
memory: 200Mi
cpu: 400m
image: "container-registry.softwareheritage.org/swh/infra/swh-apps/scrubber:20240618.2"
imagePullPolicy: IfNotPresent
command:
- /opt/swh/entrypoint.sh
args:
- swh
- scrubber
- check
- run
- objstorage-saam-content
- "--use-journal"
env:
- name: STATSD_HOST
value: prometheus-statsd-exporter
- name: STATSD_PORT
value: 9125
- name: STATSD_TAGS
value: "deployment:scrubber-objstoragechecker-saam"
- name: MAX_TASKS_PER_CHILD
value: 1
- name: SWH_LOG_LEVEL
value: INFO
- name: SWH_CONFIG_FILENAME
value: /etc/swh/config.yml
- name: SWH_SENTRY_ENVIRONMENT
value: production
- name: SWH_MAIN_PACKAGE
value: swh.scrubber
- name: SWH_SENTRY_DSN
valueFrom:
secretKeyRef:
name: common-secrets
key: scrubber-sentry-dsn
# 'name' secret must exist & include key "host"
optional: false
volumeMounts:
- name: configuration
mountPath: /etc/swh
volumes:
- name: configuration
emptyDir: {}
- name: configuration-template
configMap:
name: scrubber-objstoragechecker-saam-template
defaultMode: 0777
items:
- key: config.yml.template
path: config.yml.template
- name: database-utils
configMap:
name: database-utils
defaultMode: 0555
- name: config-utils
configMap:
name: config-utils
defaultMode: 0555