Skip to content

swh/production: Add objstorage checker

Guillaume Samson requested to merge objstorage_checker_production into production

Related to product-management/core-platform#23

These modifications will deploy two objstorage checker in production:

  • one for banco backend (4 replicas);
  • one for saam backend (4 replicas).

I really don't know if another objstorage backend should be added. Azure doesn't have to but what about aws and winery ?
The scrubber configuration initialization in database will be process in an init-container.
The content id will be read from kafka topic.

Helm diff
[swh] Comparing changes between branches production and objstorage_checker_production (per environment)...
Your branch is up to date with 'origin/production'.
[swh] Generate config in production branch for environment staging, namespace swh...
[swh] Generate config in production branch for environment staging, namespace swh-cassandra...
[swh] Generate config in production branch for environment staging, namespace swh-cassandra-next-version...
Your branch is up to date with 'origin/objstorage_checker_production'.
[swh] Generate config in objstorage_checker_production branch for environment staging...
[swh] Generate config in objstorage_checker_production branch for environment staging...
[swh] Generate config in objstorage_checker_production branch for environment staging...
Your branch is up to date with 'origin/production'.
[swh] Generate config in production branch for environment production, namespace swh...
[swh] Generate config in production branch for environment production, namespace swh-cassandra...
[swh] Generate config in production branch for environment production, namespace swh-cassandra-next-version...
Your branch is up to date with 'origin/objstorage_checker_production'.
[swh] Generate config in objstorage_checker_production branch for environment production...
[swh] Generate config in objstorage_checker_production branch for environment production...
[swh] Generate config in objstorage_checker_production branch for environment production...


------------- diff for environment staging namespace swh -------------

     _        __  __
   _| |_   _ / _|/ _|  between /tmp/swh-chart.swh.PP2qFNRE/staging-swh.before, 113 documents
 / _' | | | | |_| |_       and /tmp/swh-chart.swh.PP2qFNRE/staging-swh.after, 113 documents
| (_| | |_| |  _|  _|
 \__,_|\__, |_| |_|   returned no differences
        |___/



------------- diff for environment staging namespace swh-cassandra -------------

     _        __  __
   _| |_   _ / _|/ _|  between /tmp/swh-chart.swh.PP2qFNRE/staging-swh-cassandra.before, 401 documents
 / _' | | | | |_| |_       and /tmp/swh-chart.swh.PP2qFNRE/staging-swh-cassandra.after, 401 documents
| (_| | |_| |  _|  _|
 \__,_|\__, |_| |_|   returned no differences
        |___/



------------- diff for environment staging namespace swh-cassandra-next-version -------------

     _        __  __
   _| |_   _ / _|/ _|  between /tmp/swh-chart.swh.PP2qFNRE/staging-swh-cassandra-next-version.before, 285 documents
 / _' | | | | |_| |_       and /tmp/swh-chart.swh.PP2qFNRE/staging-swh-cassandra-next-version.after, 285 documents
| (_| | |_| |  _|  _|
 \__,_|\__, |_| |_|   returned no differences
        |___/



------------- diff for environment production namespace swh -------------

     _        __  __
   _| |_   _ / _|/ _|  between /tmp/swh-chart.swh.PP2qFNRE/production-swh.before, 427 documents
 / _' | | | | |_| |_       and /tmp/swh-chart.swh.PP2qFNRE/production-swh.after, 427 documents
| (_| | |_| |  _|  _|
 \__,_|\__, |_| |_|   returned no differences
        |___/



------------- diff for environment production namespace swh-cassandra -------------

     _        __  __
   _| |_   _ / _|/ _|  between /tmp/swh-chart.swh.PP2qFNRE/production-swh-cassandra.before, 96 documents
 / _' | | | | |_| |_       and /tmp/swh-chart.swh.PP2qFNRE/production-swh-cassandra.after, 100 documents
| (_| | |_| |  _|  _|
 \__,_|\__, |_| |_|   returned one difference
        |___/

(file level)
    ---
    # Source: swh/templates/scrubber/objstorage-checker-configmap.yaml
    apiVersion: v1
    kind: ConfigMap
    metadata:
      name: scrubber-objstoragechecker-banco-template
      namespace: swh-cassandra
    data:
      config.yml.template: |
        scrubber:
          cls: postgresql
          db: host=postgresql-scrubber-rw.internal.softwareheritage.org port=5432 user=swh-scrubber
            dbname=swh-scrubber password=${SCRUBBER_POSTGRESQL_PASSWORD}
        storage:
          auth_provider:
            cls: cassandra.auth.PlainTextAuthProvider
            password: ${CASSANDRA_PASSWORD}
            username: swh-ro
          cls: cassandra
          consistency_level: LOCAL_QUORUM
          hosts:
          - cassandra01.internal.softwareheritage.org
          - cassandra02.internal.softwareheritage.org
          - cassandra03.internal.softwareheritage.org
          - cassandra04.internal.softwareheritage.org
          - cassandra05.internal.softwareheritage.org
          - cassandra06.internal.softwareheritage.org
          - cassandra07.internal.softwareheritage.org
          - cassandra08.internal.softwareheritage.org
          - cassandra09.internal.softwareheritage.org
          - cassandra10.internal.softwareheritage.org
          keyspace: swh
        journal:
          brokers:
            - kafka1.internal.softwareheritage.org:9094
            - kafka2.internal.softwareheritage.org:9094
            - kafka3.internal.softwareheritage.org:9094
            - kafka4.internal.softwareheritage.org:9094
          cls: kafka
          group_id: swh-archive-prod-objstoragechecker
          message.max.bytes: "524288000"
          prefix: swh.journal.objects
          sasl.mechanism: SCRAM-SHA-512
          sasl.password: ${BROKER_USER_PASSWORD}
          sasl.username: ${BROKER_USER}
          security.protocol: SASL_SSL
        objstorage:
          cls: remote
          name: banco
          url: http://objstorage-ro-banco-xfs-rpc-ingress
        
    # Source: swh/templates/scrubber/objstorage-checker-configmap.yaml
    apiVersion: v1
    kind: ConfigMap
    metadata:
      namespace: swh-cassandra
      name: scrubber-objstoragechecker-saam-template
    data:
      config.yml.template: |
        scrubber:
          cls: postgresql
          db: host=postgresql-scrubber-rw.internal.softwareheritage.org port=5432 user=swh-scrubber
            dbname=swh-scrubber password=${SCRUBBER_POSTGRESQL_PASSWORD}
        storage:
          auth_provider:
            cls: cassandra.auth.PlainTextAuthProvider
            password: ${CASSANDRA_PASSWORD}
            username: swh-ro
          cls: cassandra
          consistency_level: LOCAL_QUORUM
          hosts:
          - cassandra01.internal.softwareheritage.org
          - cassandra02.internal.softwareheritage.org
          - cassandra03.internal.softwareheritage.org
          - cassandra04.internal.softwareheritage.org
          - cassandra05.internal.softwareheritage.org
          - cassandra06.internal.softwareheritage.org
          - cassandra07.internal.softwareheritage.org
          - cassandra08.internal.softwareheritage.org
          - cassandra09.internal.softwareheritage.org
          - cassandra10.internal.softwareheritage.org
          keyspace: swh
        journal:
          brokers:
            - kafka1.internal.softwareheritage.org:9094
            - kafka2.internal.softwareheritage.org:9094
            - kafka3.internal.softwareheritage.org:9094
            - kafka4.internal.softwareheritage.org:9094
          cls: kafka
          group_id: swh-archive-prod-objstoragechecker
          message.max.bytes: "524288000"
          prefix: swh.journal.objects
          sasl.mechanism: SCRAM-SHA-512
          sasl.password: ${BROKER_USER_PASSWORD}
          sasl.username: ${BROKER_USER}
          security.protocol: SASL_SSL
        objstorage:
          cls: remote
          name: saam
          url: http://objstorage-ro-saam-zfs-rpc-ingress
        
    # Source: swh/templates/scrubber/objstorage-checker-deployment.yaml
    apiVersion: apps/v1
    kind: Deployment
    metadata:
      name: scrubber-objstoragechecker-banco
      namespace: swh-cassandra
      labels:
        app: scrubber-objstoragechecker-banco
    spec:
      revisionHistoryLimit: 2
      replicas: 4
      selector:
        matchLabels:
          app: scrubber-objstoragechecker-banco
      strategy:
        type: RollingUpdate
        rollingUpdate:
          maxSurge: 1
      template:
        metadata:
          labels:
            app: scrubber-objstoragechecker-banco
          annotations:
            # Force a rollout upgrade if the configuration changes
    checksum/config: 8cc7281c6aa35ad7a65b109c235fe6c749a64c480d0ead51a8b7bf99fb52d528
        spec:
          affinity:
            nodeAffinity:
              requiredDuringSchedulingIgnoredDuringExecution:
                nodeSelectorTerms:
                - matchExpressions:
                  - key: swh/scrubber
                    operator: In
                    values:
                    - "true"
          priorityClassName: swh-cassandra-background-workload
          initContainers:
          - name: prepare-configuration
            image: "container-registry.softwareheritage.org/swh/infra/swh-apps/utils:20231211.1"
            imagePullPolicy: IfNotPresent
            env:
            - name: BROKER_USER
              valueFrom:
                secretKeyRef:
                  key: BROKER_USER
                  name: swh-archive-broker-secret
                  optional: false
            - name: BROKER_USER_PASSWORD
              valueFrom:
                secretKeyRef:
                  key: BROKER_USER_PASSWORD
                  name: swh-archive-broker-secret
                  optional: false
            - name: CASSANDRA_PASSWORD
              valueFrom:
                secretKeyRef:
                  key: cassandra-swh-ro-password
                  name: common-secrets
                  optional: false
            - name: SCRUBBER_POSTGRESQL_PASSWORD
              valueFrom:
                secretKeyRef:
                  key: postgres-swh-scrubber-password
                  name: swh-scrubber-postgresql-common-secret
                  optional: false
            command:
            - /entrypoints/prepare-configuration.sh
            volumeMounts:
            - name: config-utils
              mountPath: /entrypoints
              readOnly: true
            - name: configuration
              mountPath: /etc/swh
            - name: configuration-template
              mountPath: /etc/swh/configuration-template
          - name: initialize-backend
            image: "container-registry.softwareheritage.org/swh/infra/swh-apps/toolbox:20240618.1"
            command:
            - /entrypoints/initialize-backend.sh
            env:
            - name: MODULE
              value: scrubber
            - name: MODULE_CONFIG_KEY
              value: 
            - name: SWH_CONFIG_FILENAME
              value: /etc/swh/config.yml
            - name: SWH_PGDATABASE
              value: swh-scrubber
            - name: SWH_PGPASSWORD
              valueFrom:
                secretKeyRef:
                  name: ${SCRUBBER_POSTGRESQL_PASSWORD}
                  key: password
            - name: SWH_PGHOST
              valueFrom:
                secretKeyRef:
                  name: ${SCRUBBER_POSTGRESQL_PASSWORD}
                  key: host
            volumeMounts:
            - name: configuration
              mountPath: /etc/swh
            - name: database-utils
              mountPath: /entrypoints
          - name: check-scrubber-migration
    # TODO: Add the "datastore" registration
    #       A workaround is needed as the registration is not idempotent
    #       and can't be launched each time a scrubber is launched
            image: "container-registry.softwareheritage.org/swh/infra/swh-apps/scrubber:20240618.2"
            command:
            - /entrypoints/check-backend-version.sh
            env:
            - name: MODULE
              value: scrubber
            - name: MODULE_CONFIG_KEY
              value: 
            - name: SWH_CONFIG_FILENAME
              value: /etc/swh/config.yml
            volumeMounts:
            - name: configuration
              mountPath: /etc/swh
            - name: database-utils
              mountPath: /entrypoints
          - name: check-storage-migration
            image: "container-registry.softwareheritage.org/swh/infra/swh-apps/scrubber:20240618.2"
            command:
            - /entrypoints/check-backend-version.sh
            env:
            - name: MODULE
              value: storage
            - name: MODULE_CONFIG_KEY
              value: 
            - name: SWH_CONFIG_FILENAME
              value: /etc/swh/config.yml
            volumeMounts:
            - name: configuration
              mountPath: /etc/swh
            - name: database-utils
              mountPath: /entrypoints
          containers:
          - name: storage-checker
            resources:
              requests:
                memory: 200Mi
                cpu: 400m
            image: "container-registry.softwareheritage.org/swh/infra/swh-apps/scrubber:20240618.2"
            imagePullPolicy: IfNotPresent
            command:
            - /opt/swh/entrypoint.sh
            args:
            - swh
            - scrubber
            - check
            - run
            - objstorage-banco-content
            - "--use-journal"
            env:
            - name: STATSD_HOST
              value: prometheus-statsd-exporter
            - name: STATSD_PORT
              value: 9125
            - name: STATSD_TAGS
              value: "deployment:scrubber-objstoragechecker-banco"
            - name: MAX_TASKS_PER_CHILD
              value: 1
            - name: SWH_LOG_LEVEL
              value: INFO
            - name: SWH_CONFIG_FILENAME
              value: /etc/swh/config.yml
            - name: SWH_SENTRY_ENVIRONMENT
              value: production
            - name: SWH_MAIN_PACKAGE
              value: swh.scrubber
            - name: SWH_SENTRY_DSN
              valueFrom:
                secretKeyRef:
                  name: common-secrets
                  key: scrubber-sentry-dsn
                  # 'name' secret must exist & include key "host"
    optional: false
            volumeMounts:
            - name: configuration
              mountPath: /etc/swh
          volumes:
          - name: configuration
            emptyDir: {}
          - name: configuration-template
            configMap:
              name: scrubber-objstoragechecker-banco-template
              defaultMode: 0777
              items:
              - key: config.yml.template
                path: config.yml.template
          - name: database-utils
            configMap:
              name: database-utils
              defaultMode: 0555
          - name: config-utils
            configMap:
              name: config-utils
              defaultMode: 0555
    # Source: swh/templates/scrubber/objstorage-checker-deployment.yaml
    apiVersion: apps/v1
    kind: Deployment
    metadata:
      name: scrubber-objstoragechecker-saam
      namespace: swh-cassandra
      labels:
        app: scrubber-objstoragechecker-saam
    spec:
      revisionHistoryLimit: 2
      replicas: 4
      selector:
        matchLabels:
          app: scrubber-objstoragechecker-saam
      strategy:
        type: RollingUpdate
        rollingUpdate:
          maxSurge: 1
      template:
        metadata:
          labels:
            app: scrubber-objstoragechecker-saam
          annotations:
            # Force a rollout upgrade if the configuration changes
    checksum/config: b38150145c57d806ee1b3a2d049fc112aa46d59036dcadaaf0c0437477067bb3
        spec:
          affinity:
            nodeAffinity:
              requiredDuringSchedulingIgnoredDuringExecution:
                nodeSelectorTerms:
                - matchExpressions:
                  - key: swh/scrubber
                    operator: In
                    values:
                    - "true"
          priorityClassName: swh-cassandra-background-workload
          initContainers:
          - name: prepare-configuration
            image: "container-registry.softwareheritage.org/swh/infra/swh-apps/utils:20231211.1"
            imagePullPolicy: IfNotPresent
            env:
            - name: BROKER_USER
              valueFrom:
                secretKeyRef:
                  key: BROKER_USER
                  name: swh-archive-broker-secret
                  optional: false
            - name: BROKER_USER_PASSWORD
              valueFrom:
                secretKeyRef:
                  key: BROKER_USER_PASSWORD
                  name: swh-archive-broker-secret
                  optional: false
            - name: CASSANDRA_PASSWORD
              valueFrom:
                secretKeyRef:
                  key: cassandra-swh-ro-password
                  name: common-secrets
                  optional: false
            - name: SCRUBBER_POSTGRESQL_PASSWORD
              valueFrom:
                secretKeyRef:
                  key: postgres-swh-scrubber-password
                  name: swh-scrubber-postgresql-common-secret
                  optional: false
            command:
            - /entrypoints/prepare-configuration.sh
            volumeMounts:
            - name: config-utils
              mountPath: /entrypoints
              readOnly: true
            - name: configuration
              mountPath: /etc/swh
            - name: configuration-template
              mountPath: /etc/swh/configuration-template
          - name: initialize-backend
            image: "container-registry.softwareheritage.org/swh/infra/swh-apps/toolbox:20240618.1"
            command:
            - /entrypoints/initialize-backend.sh
            env:
            - name: MODULE
              value: scrubber
            - name: MODULE_CONFIG_KEY
              value: 
            - name: SWH_CONFIG_FILENAME
              value: /etc/swh/config.yml
            - name: SWH_PGDATABASE
              value: swh-scrubber
            - name: SWH_PGPASSWORD
              valueFrom:
                secretKeyRef:
                  name: ${SCRUBBER_POSTGRESQL_PASSWORD}
                  key: password
            - name: SWH_PGHOST
              valueFrom:
                secretKeyRef:
                  name: ${SCRUBBER_POSTGRESQL_PASSWORD}
                  key: host
            volumeMounts:
            - name: configuration
              mountPath: /etc/swh
            - name: database-utils
              mountPath: /entrypoints
          - name: check-scrubber-migration
    # TODO: Add the "datastore" registration
    #       A workaround is needed as the registration is not idempotent
    #       and can't be launched each time a scrubber is launched
            image: "container-registry.softwareheritage.org/swh/infra/swh-apps/scrubber:20240618.2"
            command:
            - /entrypoints/check-backend-version.sh
            env:
            - name: MODULE
              value: scrubber
            - name: MODULE_CONFIG_KEY
              value: 
            - name: SWH_CONFIG_FILENAME
              value: /etc/swh/config.yml
            volumeMounts:
            - name: configuration
              mountPath: /etc/swh
            - name: database-utils
              mountPath: /entrypoints
          - name: check-storage-migration
            image: "container-registry.softwareheritage.org/swh/infra/swh-apps/scrubber:20240618.2"
            command:
            - /entrypoints/check-backend-version.sh
            env:
            - name: MODULE
              value: storage
            - name: MODULE_CONFIG_KEY
              value: 
            - name: SWH_CONFIG_FILENAME
              value: /etc/swh/config.yml
            volumeMounts:
            - name: configuration
              mountPath: /etc/swh
            - name: database-utils
              mountPath: /entrypoints
          containers:
          - name: storage-checker
            resources:
              requests:
                memory: 200Mi
                cpu: 400m
            image: "container-registry.softwareheritage.org/swh/infra/swh-apps/scrubber:20240618.2"
            imagePullPolicy: IfNotPresent
            command:
            - /opt/swh/entrypoint.sh
            args:
            - swh
            - scrubber
            - check
            - run
            - objstorage-saam-content
            - "--use-journal"
            env:
            - name: STATSD_HOST
              value: prometheus-statsd-exporter
            - name: STATSD_PORT
              value: 9125
            - name: STATSD_TAGS
              value: "deployment:scrubber-objstoragechecker-saam"
            - name: MAX_TASKS_PER_CHILD
              value: 1
            - name: SWH_LOG_LEVEL
              value: INFO
            - name: SWH_CONFIG_FILENAME
              value: /etc/swh/config.yml
            - name: SWH_SENTRY_ENVIRONMENT
              value: production
            - name: SWH_MAIN_PACKAGE
              value: swh.scrubber
            - name: SWH_SENTRY_DSN
              valueFrom:
                secretKeyRef:
                  name: common-secrets
                  key: scrubber-sentry-dsn
                  # 'name' secret must exist & include key "host"
    optional: false
            volumeMounts:
            - name: configuration
              mountPath: /etc/swh
          volumes:
          - name: configuration
            emptyDir: {}
          - name: configuration-template
            configMap:
              name: scrubber-objstoragechecker-saam-template
              defaultMode: 0777
              items:
              - key: config.yml.template
                path: config.yml.template
          - name: database-utils
            configMap:
              name: database-utils
              defaultMode: 0555
          - name: config-utils
            configMap:
              name: config-utils
              defaultMode: 0555

Merge request reports