Deploy coar-notify in staging
- Prepare a coar-notify image of coar-notify
- Deploy a new CNPG database and update the procedure (deployment/backup/...) in snippets or hedgedoc
- Update swh-chart to deploy the new coar-notify django application
- Create the public hostname in puppet
- Define monitoring and alerting points (maybe too early for that)
Documentation draft: https://hedgedoc.softwareheritage.org/NqzRS2sKRU-oEjpG68m9mA#
Activity
-
Newest first Oldest first
-
Show all activity Show comments only Show history only
- Vincent Sellier assigned to @guillaume
assigned to @guillaume
- Guillaume Samson mentioned in commit swh-apps@ea746835
mentioned in commit swh-apps@ea746835
- Owner
[ ] Prepare a coar-notify image of coar-notify
fwiw, this requires the ci to be ready to push that new package to pypi. Following that doc [1], I seem to recall it's plugged already too.
And indeed [2], it's there. So only a git tag away. And then at least one swh.coarnotify version will be pushed in pypi.org (and then you could build the image so you can develop chart template with it).
[1] https://docs.softwareheritage.org/devel/tutorials/add-new-package.html
[2] https://jenkins.softwareheritage.org/job/swh-coarnotify/
Edited by Antoine R. Dumont - Vincent Sellier changed the description
changed the description
- Owner
Deploy swh-coar-notify db on
test-staging-rke2
with CloudNativePGInstall clusterImageCatalog.
curl -sLO https://raw.githubusercontent.com/cloudnative-pg/postgres-containers/main/Debian/ClusterImageCatalog-bookworm.yaml kbt apply -f ClusterImageCatalog-bookworm.yaml
Create
cnpg-coar-notify
namespace.kbt create ns cnpg-coar-notify
Deploy secrets.
ᐅ kbt apply -f swh-coar-notify-secrets.yaml secret/cnpg-swh-coar-notify-creds created secret/cnpg-swh-coar-notify-guest-creds created secret/cnpg-minio-creds created
Deploy swh-coar-notify cnpg cluster.
ᐅ kbt apply -f swh-coar-notify-cluster.yaml
ᐅ kbt get clusters.postgresql.cnpg.io,po,svc,pvc -n cnpg-coar-notify NAME AGE INSTANCES READY STATUS PRIMARY cluster.postgresql.cnpg.io/swh-coar-notify 6m49s 3 3 Cluster in healthy state swh-coar-notify-1 NAME READY STATUS RESTARTS AGE pod/swh-coar-notify-1 1/1 Running 0 5m45s pod/swh-coar-notify-2 1/1 Running 0 3m32s pod/swh-coar-notify-3 1/1 Running 0 89s NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE service/swh-coar-notify-r ClusterIP 10.43.208.185 <none> 5432/TCP 6m49s service/swh-coar-notify-ro ClusterIP 10.43.12.67 <none> 5432/TCP 6m49s service/swh-coar-notify-rw ClusterIP 10.43.223.47 <none> 5432/TCP 6m49s NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE persistentvolumeclaim/swh-coar-notify-1 Bound pvc-af9aace2-d1de-4829-99b7-f253bd5839f1 5Gi RWO local-persistent 6m49s persistentvolumeclaim/swh-coar-notify-1-wal Bound pvc-ff2267b0-4466-47e2-b3fb-ea659f639af7 1Gi RWO local-persistent 6m49s persistentvolumeclaim/swh-coar-notify-2 Bound pvc-7eb9dcfc-e766-4933-b709-1c2731ff06f2 5Gi RWO local-persistent 5m13s persistentvolumeclaim/swh-coar-notify-2-wal Bound pvc-4ee5918d-8103-4869-aa0e-f05494549be8 1Gi RWO local-persistent 5m13s persistentvolumeclaim/swh-coar-notify-3 Bound pvc-5bfaa386-3a9e-48af-b748-23e8a31b157e 5Gi RWO local-persistent 2m41s persistentvolumeclaim/swh-coar-notify-3-wal Bound pvc-2acaee98-16b8-4c5d-a3be-0b065150be81 1Gi RWO local-persistent 2m41s
Check connections and accounts/services permissions:
- port-forward rw service
ᐅ kbt port-forward svc/swh-coar-notify-rw 5432:5432 -n cnpg-coar-notify Forwarding from 127.0.0.1:5432 -> 5432 Forwarding from [::1]:5432 -> 5432
- check rw user
ᐅ psql -h localhost -U swh-coar-notify -d swh-coar-notify Password for user swh-coar-notify: psql (17.4 (Debian 17.4-1.pgdg120+2)) SSL connection (protocol: TLSv1.3, cipher: TLS_AES_256_GCM_SHA384, compression: off, ALPN: postgresql) Type "help" for help. swh-coar-notify=> create table test (id int primary key not null,nom varchar(100)); CREATE TABLE
- check ro user
ᐅ psql -h localhost -U guest -d swh-coar-notify Password for user guest: psql (17.4 (Debian 17.4-1.pgdg120+2)) SSL connection (protocol: TLSv1.3, cipher: TLS_AES_256_GCM_SHA384, compression: off, ALPN: postgresql) Type "help" for help. swh-coar-notify=> create table test2 (id int primary key not null,nom varchar(100)); ERROR: permission denied for schema public LINE 1: create table test2 (id int primary key not null,nom varchar(...
- check ro service
ᐅ kbt port-forward svc/swh-coar-notify-ro 5432:5432 -n cnpg-coar-notify Forwarding from 127.0.0.1:5432 -> 5432 Forwarding from [::1]:5432 -> 5432
ᐅ psql -h localhost -U swh-coar-notify -d swh-coar-notify Password for user swh-coar-notify: psql (17.4 (Debian 17.4-1.pgdg120+2)) SSL connection (protocol: TLSv1.3, cipher: TLS_AES_256_GCM_SHA384, compression: off, ALPN: postgresql) Type "help" for help. swh-coar-notify=> create table test2 (id int primary key not null,nom varchar(100)); ERROR: cannot execute CREATE TABLE in a read-only transaction
Everything seems to work as expected.
Deploy swh-coar-notify pooler (pg_bouncer).
ᐅ kbt apply -f swh-coar-notify-poolers.yaml pooler.postgresql.cnpg.io/pooler-swh-coar-notify-rw created pooler.postgresql.cnpg.io/pooler-swh-coar-notify-ro created
ᐅ kbt get clusters.postgresql.cnpg.io,po,svc -n cnpg-coar-notify NAME AGE INSTANCES READY STATUS PRIMARY cluster.postgresql.cnpg.io/swh-coar-notify 67m 3 3 Cluster in healthy state swh-coar-notify-3 NAME READY STATUS RESTARTS AGE pod/pooler-swh-coar-notify-ro-95dc5c994-bddrg 1/1 Running 0 34m pod/pooler-swh-coar-notify-rw-748df8d954-hxznp 1/1 Running 0 34m pod/swh-coar-notify-1 1/1 Running 1 (11m ago) 66m pod/swh-coar-notify-2 1/1 Running 0 64m pod/swh-coar-notify-3 1/1 Running 0 62m NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE service/pooler-swh-coar-notify-ro ClusterIP 10.43.201.207 <none> 5432/TCP 34m service/pooler-swh-coar-notify-rw ClusterIP 10.43.249.173 <none> 5432/TCP 34m service/swh-coar-notify-r ClusterIP 10.43.208.185 <none> 5432/TCP 67m service/swh-coar-notify-ro ClusterIP 10.43.12.67 <none> 5432/TCP 67m service/swh-coar-notify-rw ClusterIP 10.43.223.47 <none> 5432/TCP 67m
Check poolers services permissions
- check ro service
ᐅ kbt port-forward svc/pooler-swh-coar-notify-ro 5432:5432 -n cnpg-coar-notify Forwarding from 127.0.0.1:5432 -> 5432 Forwarding from [::1]:5432 -> 5432 Handling connection for 5432
ᐅ psql -h localhost -U swh-coar-notify -d swh-coar-notify Password for user swh-coar-notify: psql (17.4 (Debian 17.4-1.pgdg120+2)) SSL connection (protocol: TLSv1.3, cipher: TLS_AES_256_GCM_SHA384, compression: off, ALPN: none) Type "help" for help. swh-coar-notify=> create table test2 (id int primary key not null,nom varchar(100)); ERROR: cannot execute CREATE TABLE in a read-only transaction
- check rw service
ᐅ kbt port-forward svc/pooler-swh-coar-notify-rw 5432:5432 -n cnpg-coar-notify Forwarding from 127.0.0.1:5432 -> 5432 Forwarding from [::1]:5432 -> 5432 Handling connection for 5432 Handling connection for 5432
ᐅ psql -h localhost -U swh-coar-notify -d swh-coar-notify Password for user swh-coar-notify: psql (17.4 (Debian 17.4-1.pgdg120+2)) SSL connection (protocol: TLSv1.3, cipher: TLS_AES_256_GCM_SHA384, compression: off, ALPN: none) Type "help" for help. swh-coar-notify=> create table test2 (id int primary key not null,nom varchar(100)); CREATE TABLE swh-coar-notify=> \dt public | test | table | swh-coar-notify public | test2 | table | swh-coar-notify
Use kubectl
cnpg
pluginGet cluster infos.
ᐅ kubectl cnpg status swh-coar-notify -n cnpg-coar-notify --context test-staging-rke2 Cluster Summary Name cnpg-coar-notify/swh-coar-notify System ID: 7485720651796250647 PostgreSQL Image: ghcr.io/cloudnative-pg/postgresql:17.4-7-bookworm@sha256:e5e3d8bd81a71bb13c4cd7f0d9d2be43432ef37b06480cb8eff3baceceda8645 Primary instance: swh-coar-notify-1 Primary start time: 2025-03-25 12:20:00 +0000 UTC (uptime 52m32s) Status: Cluster in healthy state Instances: 3 Ready instances: 3 Size: 7.6M Current Write LSN: 0/12000060 (Timeline: 1 - WAL File: 000000010000000000000009) Continuous Backup status First Point of Recoverability: Not Available Working WAL archiving: OK WALs waiting to be archived: 0 Last Archived WAL: 000000010000000000000008 @ 2025-03-25T12:53:43.733022Z Last Failed WAL: - Streaming Replication status Replication Slots Enabled Name Sent LSN Write LSN Flush LSN Replay LSN Write Lag Flush Lag Replay Lag State Sync State Sync Priority Replication Slot ---- -------- --------- --------- ---------- --------- --------- ---------- ----- ---------- ------------- ---------------- swh-coar-notify-2 0/12000060 0/12000060 0/12000060 0/12000060 00:00:00 00:00:00 00:00:00 streaming quorum 1 active swh-coar-notify-3 0/12000060 0/12000060 0/12000060 0/12000060 00:00:00 00:00:00 00:00:00 streaming quorum 1 active Instances status Name Current LSN Replication role Status QoS Manager Version Node ---- ----------- ---------------- ------ --- --------------- ---- swh-coar-notify-1 0/12000060 Primary OK Guaranteed 1.25.0 rancher-node-test-rke2-worker3 swh-coar-notify-2 0/12000060 Standby (sync) OK Guaranteed 1.25.0 rancher-node-test-rke2-worker2 swh-coar-notify-3 0/12000060 Standby (sync) OK Guaranteed 1.25.0 rancher-node-test-rke2-worker1
Choose primary server.
ᐅ kubectl cnpg promote swh-coar-notify swh-coar-notify-3 -n cnpg-coar-notify --context test-staging-rke2
ᐅ kubectl cnpg status swh-coar-notify -n cnpg-coar-notify --context test-staging-rke2 [...] Instances status Name Current LSN Replication role Status QoS Manager Version Node ---- ----------- ---------------- ------ --- --------------- ---- swh-coar-notify-3 0/1400F4B0 Primary OK Guaranteed 1.25.0 rancher-node-test-rke2-worker1 swh-coar-notify-1 0/1400F4B0 Standby (sync) OK Guaranteed 1.25.0 rancher-node-test-rke2-worker3 swh-coar-notify-2 0/1400F4B0 Standby (sync) OK Guaranteed 1.25.0 rancher-node-test-rke2-worker2
Benchmaking.
Backups
- on demand
with manifest
ᐅ kbt apply -f swh-coar-notify-on-demand-bckp.yaml
ᐅ kbt get backups.postgresql.cnpg.io -n cnpg-coar-notify NAME AGE CLUSTER METHOD PHASE ERROR cnpg-swh-coar-notify-on-demand-backup 82s swh-coar-notify barmanObjectStore completed
with kubectl
cnpg
pluginᐅ kubectl cnpg backup swh-coar-notify -n cnpg-coar-notify --context test-staging-rke2 backup/swh-coar-notify-20250325151826 created
ᐅ kbt get backups.postgresql.cnpg.io -n cnpg-coar-notify NAME AGE CLUSTER METHOD PHASE ERROR cnpg-swh-coar-notify-on-demand-backup 9m9s swh-coar-notify barmanObjectStore completed cnpg-swh-coar-notify-scheduled-backup-20250325141433 4m4s swh-coar-notify barmanObjectStore completed swh-coar-notify-20250325151826 11s swh-coar-notify barmanObjectStore running
- scheduled
ᐅ kbt apply -f swh-coar-notify-scheduled-bckp.yaml scheduledbackup.postgresql.cnpg.io/cnpg-swh-coar-notify-scheduled-backup created
ᐅ kbt get scheduledbackups.postgresql.cnpg.io -n cnpg-coar-notify NAME AGE CLUSTER LAST BACKUP cnpg-swh-coar-notify-scheduled-backup 66s swh-coar-notify 66s
ᐅ kbt get backups.postgresql.cnpg.io -n cnpg-coar-notify NAME AGE CLUSTER METHOD PHASE ERROR cnpg-swh-coar-notify-on-demand-backup 6m21s swh-coar-notify barmanObjectStore completed cnpg-swh-coar-notify-scheduled-backup-20250325141433 76s swh-coar-notify barmanObjectStore completed
Data on MinIO
- backups
ᐅ mc ls -r cpng/backup-cpng/test-staging-rke2/swh-coar-notify/base [2025-03-25 15:09:55 CET] 1.4KiB STANDARD 20250325T140934/backup.info [2025-03-25 15:09:55 CET] 3.9MiB STANDARD 20250325T140934/data.tar.gz [2025-03-25 15:14:59 CET] 1.4KiB STANDARD 20250325T141438/backup.info [2025-03-25 15:14:59 CET] 3.9MiB STANDARD 20250325T141438/data.tar.gz [2025-03-25 15:18:53 CET] 1.4KiB STANDARD 20250325T141831/backup.info [2025-03-25 15:18:53 CET] 3.9MiB STANDARD 20250325T141831/data.tar.gz
- WALs
ᐅ mc ls -r cpng/backup-cpng/test-staging-rke2/swh-coar-notify/wals [2025-03-25 13:20:13 CET] 2.4MiB STANDARD 0000000100000000/000000010000000000000001.gz [2025-03-25 13:21:10 CET] 472KiB STANDARD 0000000100000000/000000010000000000000002.gz [2025-03-25 13:21:42 CET] 205B STANDARD 0000000100000000/000000010000000000000003.00000060.backup.gz [2025-03-25 13:21:39 CET] 88KiB STANDARD 0000000100000000/000000010000000000000003.gz [2025-03-25 13:23:35 CET] 32KiB STANDARD 0000000100000000/000000010000000000000004.gz [2025-03-25 13:23:42 CET] 201B STANDARD 0000000100000000/000000010000000000000005.00000028.backup.gz [2025-03-25 13:23:40 CET] 32KiB STANDARD 0000000100000000/000000010000000000000005.gz [2025-03-25 13:28:42 CET] 32KiB STANDARD 0000000100000000/000000010000000000000006.gz [2025-03-25 13:33:42 CET] 70KiB STANDARD 0000000100000000/000000010000000000000007.gz [2025-03-25 13:53:43 CET] 76KiB STANDARD 0000000100000000/000000010000000000000008.gz [2025-03-25 14:14:48 CET] 32KiB STANDARD 0000000100000000/000000010000000000000009.gz [2025-03-25 14:15:11 CET] 32KiB STANDARD 0000000100000000/00000001000000000000000A.partial.gz [2025-03-25 14:15:07 CET] 60B STANDARD 00000002.history.gz [2025-03-25 14:20:08 CET] 44KiB STANDARD 0000000200000000/00000002000000000000000A.gz [2025-03-25 14:25:08 CET] 32KiB STANDARD 0000000200000000/00000002000000000000000B.gz [2025-03-25 14:35:09 CET] 71KiB STANDARD 0000000200000000/00000002000000000000000C.gz [2025-03-25 14:40:09 CET] 32KiB STANDARD 0000000200000000/00000002000000000000000D.gz [2025-03-25 14:59:12 CET] 41KiB STANDARD 0000000200000000/00000002000000000000000E.gz [2025-03-25 15:04:23 CET] 32KiB STANDARD 0000000200000000/00000002000000000000000F.gz
swh-coar-notify-cluster.yaml
--- apiVersion: postgresql.cnpg.io/v1 kind: Cluster metadata: name: swh-coar-notify namespace: cnpg-coar-notify labels: cnpg.swh: coar-notify spec: instances: 3 startDelay: 300 stopDelay: 300 imageCatalogRef: apiGroup: postgresql.cnpg.io kind: ClusterImageCatalog name: postgresql major: 17 primaryUpdateStrategy: unsupervised storage: size: 5Gi storageClass: local-persistent walStorage: size: 1Gi storageClass: local-persistent managed: roles: - name: guest ensure: present comment: read-only user login: true inherit: false connectionLimit: 10 passwordSecret: name: cnpg-swh-coar-notify-guest-creds # validUntil: "2053-04-12T15:04:05Z" postgresql: parameters: # memory dedicated to the PostgreSQL server for caching data # 25% of total memory (best practice) shared_buffers: "128MB" pg_stat_statements.max: '10000' pg_stat_statements.track: all auto_explain.log_min_duration: '10s' pg_hba: - host all all all md5 # quorum-based synchronous replication ensures that transaction commits wait # until their WAL records are replicated to a specified number of standbys synchronous: method: any number: 1 # set limits and requests to the same value => "Guaranteed" QoS class # https://kubernetes.io/docs/tasks/configure-pod-container/quality-service-pod/ resources: requests: memory: "512Mi" cpu: 0.5 limits: memory: "512Mi" cpu: 0.5 bootstrap: initdb: database: swh-coar-notify owner: swh-coar-notify encoding: 'UTF8' localeCollate: 'en_US.utf8' localeCType: 'en_US.utf8' dataChecksums: true walSegmentSize: 32 secret: name: cnpg-swh-coar-notify-creds monitoring: enablePodMonitor: true backup: barmanObjectStore: destinationPath: s3://backup-cpng/test-staging-rke2 endpointURL: https://minio.admin.swh.network s3Credentials: accessKeyId: name: cnpg-minio-creds key: MINIO_ACCESS_KEY secretAccessKey: name: cnpg-minio-creds key: MINIO_SECRET_KEY wal: compression: gzip data: compression: gzip additionalCommandArgs: - "--min-chunk-size=5MB" - "--read-timeout=60" - "-vv" retentionPolicy: "1d" # `primary` or `prefer-standby` target: prefer-standby #backup: # # Volume snapshot backups # volumeSnapshot: # className: longhorn # online: true # onlineConfiguration: # immediateCheckpoint: true
swh-coar-notify-poolers.yaml
--- apiVersion: postgresql.cnpg.io/v1 kind: Pooler metadata: name: pooler-swh-coar-notify-rw namespace: cnpg-coar-notify spec: cluster: name: swh-coar-notify instances: 1 type: rw pgbouncer: poolMode: session parameters: max_client_conn: "1000" default_pool_size: "10" template: metadata: labels: app: pooler-swh-coar-notify-rw spec: containers: [] affinity: podAntiAffinity: requiredDuringSchedulingIgnoredDuringExecution: - labelSelector: matchExpressions: - key: app operator: In values: - pooler-swh-coar-notify-rw topologyKey: "kubernetes.io/hostname" --- apiVersion: postgresql.cnpg.io/v1 kind: Pooler metadata: name: pooler-swh-coar-notify-ro namespace: cnpg-coar-notify spec: cluster: name: swh-coar-notify instances: 1 type: ro pgbouncer: poolMode: session parameters: max_client_conn: "1000" default_pool_size: "10" template: metadata: labels: app: pooler-swh-coar-notify-ro spec: containers: [] affinity: podAntiAffinity: requiredDuringSchedulingIgnoredDuringExecution: - labelSelector: matchExpressions: - key: app operator: In values: - pooler-swh-coar-notify-ro topologyKey: "kubernetes.io/hostname"
1 - Owner
Backup every two hours...
~ ᐅ kbt get backups.postgresql.cnpg.io -n cnpg-coar-notify NAME AGE CLUSTER METHOD PHASE ERROR cnpg-swh-coar-notify-on-demand-backup 20h swh-coar-notify barmanObjectStore completed cnpg-swh-coar-notify-scheduled-backup-20250325141433 20h swh-coar-notify barmanObjectStore completed cnpg-swh-coar-notify-scheduled-backup-20250325142500 20h swh-coar-notify barmanObjectStore completed cnpg-swh-coar-notify-scheduled-backup-20250325162500 18h swh-coar-notify barmanObjectStore completed cnpg-swh-coar-notify-scheduled-backup-20250325182500 16h swh-coar-notify barmanObjectStore completed cnpg-swh-coar-notify-scheduled-backup-20250325202500 14h swh-coar-notify barmanObjectStore completed cnpg-swh-coar-notify-scheduled-backup-20250325222500 12h swh-coar-notify barmanObjectStore completed cnpg-swh-coar-notify-scheduled-backup-20250326002500 10h swh-coar-notify barmanObjectStore completed cnpg-swh-coar-notify-scheduled-backup-20250326022500 8h swh-coar-notify barmanObjectStore completed cnpg-swh-coar-notify-scheduled-backup-20250326042500 6h20m swh-coar-notify barmanObjectStore completed cnpg-swh-coar-notify-scheduled-backup-20250326062500 4h20m swh-coar-notify barmanObjectStore completed cnpg-swh-coar-notify-scheduled-backup-20250326082500 140m swh-coar-notify barmanObjectStore completed cnpg-swh-coar-notify-scheduled-backup-20250326102500 20m swh-coar-notify barmanObjectStore completed swh-coar-notify-20250325151826 20h swh-coar-notify barmanObjectStore completed ~ ᐅ mc ls cpng/backup-cpng/test-staging-rke2/swh-coar-notify/base [2025-03-26 11:47:29 CET] 0B 20250325T140934/ [2025-03-26 11:47:29 CET] 0B 20250325T141438/ [2025-03-26 11:47:29 CET] 0B 20250325T141831/ [2025-03-26 11:47:29 CET] 0B 20250325T142504/ [2025-03-26 11:47:29 CET] 0B 20250325T162505/ [2025-03-26 11:47:29 CET] 0B 20250325T182505/ [2025-03-26 11:47:29 CET] 0B 20250325T202504/ [2025-03-26 11:47:29 CET] 0B 20250325T222504/ [2025-03-26 11:47:29 CET] 0B 20250326T002504/ [2025-03-26 11:47:29 CET] 0B 20250326T022504/ [2025-03-26 11:47:29 CET] 0B 20250326T042504/ [2025-03-26 11:47:29 CET] 0B 20250326T062504/ [2025-03-26 11:47:29 CET] 0B 20250326T082504/ [2025-03-26 11:47:29 CET] 0B 20250326T102504/
as scheduled:
~ ᐅ kbt get scheduledbackups.postgresql.cnpg.io \ cnpg-swh-coar-notify-scheduled-backup -n cnpg-coar-notify \ -o jsonpath='{.spec.schedule}' 0 25 0/2 * * *
- Guillaume Samson mentioned in commit swh/infra/ci-cd/swh-charts@f7a24708
mentioned in commit swh/infra/ci-cd/swh-charts@f7a24708
- Guillaume Samson mentioned in commit swh/infra/ci-cd/swh-charts@fe94faf2
mentioned in commit swh/infra/ci-cd/swh-charts@fe94faf2
- Guillaume Samson mentioned in commit swh/infra/ci-cd/swh-charts@f37e5807
mentioned in commit swh/infra/ci-cd/swh-charts@f37e5807
- Owner
Finally, create as small a cluster as possible: no pooler, only 2 replicas, only 1 service (
rw
, forro
access we can useguest
user):ᐅ kbt get clusters.postgresql.cnpg.io,po,svc,pvc -n swh-coar-notify NAME AGE INSTANCES READY STATUS PRIMARY cluster.postgresql.cnpg.io/swh-coar-notify 41h 2 2 Cluster in healthy state swh-coar-notify-1 NAME READY STATUS RESTARTS AGE pod/swh-coar-notify-1 1/1 Running 0 41h pod/swh-coar-notify-2 1/1 Running 0 41h NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE service/swh-coar-notify-rw ClusterIP 10.43.210.210 <none> 5432/TCP 41h NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE persistentvolumeclaim/swh-coar-notify-1 Bound pvc-1c048c7f-8823-4fa3-8f94-9b40df890a88 5Gi RWO local-persistent 41h persistentvolumeclaim/swh-coar-notify-1-wal Bound pvc-5c0560cf-cdf9-45a6-965b-bdab43fffdfd 1Gi RWO local-persistent 41h persistentvolumeclaim/swh-coar-notify-2 Bound pvc-73a3d191-bda5-407e-9e05-5303051dc943 5Gi RWO local-persistent 41h persistentvolumeclaim/swh-coar-notify-2-wal Bound pvc-eb678a31-69a5-4f84-a955-fdad6b6289bf 1Gi RWO local-persistent 41h
with minimum resources 1:
ᐅ kbt get clusters.postgresql.cnpg.io -n swh-coar-notify swh-coar-notify -o jsonpath='{.spec.resources}' | jq { "limits": { "cpu": "400m", "memory": "256Mi" }, "requests": { "cpu": "400m", "memory": "256Mi" } }
Retention policy seems to work fine:
~ ᐅ kbt get clusters.postgresql.cnpg.io -n swh-coar-notify swh-coar-notify -o jsonpath='{.spec.backup.retentionPolicy}' 1d ~ ᐅ kbt get scheduledbackups.postgresql.cnpg.io,backups.postgresql.cnpg.io -n swh-coar-notify NAME AGE CLUSTER LAST BACKUP scheduledbackup.postgresql.cnpg.io/cnpg-swh-coar-notify-scheduled-backup 41h swh-coar-notify 70m NAME AGE CLUSTER METHOD PHASE ERROR backup.postgresql.cnpg.io/cnpg-swh-coar-notify-scheduled-backup-20250327122500 25h swh-coar-notify barmanObjectStore completed backup.postgresql.cnpg.io/cnpg-swh-coar-notify-scheduled-backup-20250327142500 23h swh-coar-notify barmanObjectStore completed backup.postgresql.cnpg.io/cnpg-swh-coar-notify-scheduled-backup-20250327162500 21h swh-coar-notify barmanObjectStore completed backup.postgresql.cnpg.io/cnpg-swh-coar-notify-scheduled-backup-20250327182500 19h swh-coar-notify barmanObjectStore completed backup.postgresql.cnpg.io/cnpg-swh-coar-notify-scheduled-backup-20250327202500 17h swh-coar-notify barmanObjectStore completed backup.postgresql.cnpg.io/cnpg-swh-coar-notify-scheduled-backup-20250327222500 15h swh-coar-notify barmanObjectStore completed backup.postgresql.cnpg.io/cnpg-swh-coar-notify-scheduled-backup-20250328002500 13h swh-coar-notify barmanObjectStore completed backup.postgresql.cnpg.io/cnpg-swh-coar-notify-scheduled-backup-20250328022500 11h swh-coar-notify barmanObjectStore completed backup.postgresql.cnpg.io/cnpg-swh-coar-notify-scheduled-backup-20250328042500 9h swh-coar-notify barmanObjectStore completed backup.postgresql.cnpg.io/cnpg-swh-coar-notify-scheduled-backup-20250328062500 7h10m swh-coar-notify barmanObjectStore completed backup.postgresql.cnpg.io/cnpg-swh-coar-notify-scheduled-backup-20250328082500 5h10m swh-coar-notify barmanObjectStore completed backup.postgresql.cnpg.io/cnpg-swh-coar-notify-scheduled-backup-20250328102500 3h10m swh-coar-notify barmanObjectStore completed backup.postgresql.cnpg.io/cnpg-swh-coar-notify-scheduled-backup-20250328122500 70m swh-coar-notify barmanObjectStore completed
I'm going to test the restoration.
-
when trying lower CPU values (100m or 200m) the first pod crashed on cluster initializatiion. ↩
Edited by Guillaume Samson -
- Owner
Restoring cnpg cluster from MinIO s3 backups:
ᐅ diff -u swh-coar-notify-cluster.yaml swh-coar-notify-recovery-cluster.yaml --- swh-coar-notify-cluster.yaml 2025-03-28 15:29:34.544548822 +0100 +++ swh-coar-notify-recovery-cluster.yaml 2025-03-28 15:33:12.043466936 +0100 @@ -2,7 +2,7 @@ apiVersion: postgresql.cnpg.io/v1 kind: Cluster metadata: - name: swh-coar-notify + name: swh-coar-notify-restored namespace: swh-coar-notify labels: cnpg.swh: coar-notify @@ -68,16 +68,31 @@ cpu: "400m" bootstrap: - initdb: - database: swh-coar-notify - owner: swh-coar-notify - encoding: 'UTF8' - localeCollate: 'en_US.utf8' - localeCType: 'en_US.utf8' - dataChecksums: true - walSegmentSize: 32 - secret: - name: cnpg-swh-coar-notify-creds + recovery: + source: swh-coar-notify + + # /!\ externalClusters.name must match the source backup name + externalClusters: + - name: swh-coar-notify + barmanObjectStore: + destinationPath: s3://backup-cpng/test-staging-rke2/ + endpointURL: https://minio.admin.swh.network + s3Credentials: + accessKeyId: + name: cnpg-minio-creds + key: MINIO_ACCESS_KEY + secretAccessKey: + name: cnpg-minio-creds + key: MINIO_SECRET_KEY + wal: + compression: gzip + maxParallel: 8 + data: + compression: gzip + additionalCommandArgs: + - "--min-chunk-size=5MB" + - "--read-timeout=60" + - "-vv" monitoring: enablePodMonitor: true
ᐅ kbt apply -f swh-coar-notify-recovery-cluster.yaml cluster.postgresql.cnpg.io/swh-coar-notify-restored created
Restored cluster deployed 1:
ᐅ kbt get clusters.postgresql.cnpg.io,po,svc,pvc -n swh-coar-notify NAME AGE INSTANCES READY STATUS PRIMARY cluster.postgresql.cnpg.io/swh-coar-notify 43h 2 2 Cluster in healthy state swh-coar-notify-1 cluster.postgresql.cnpg.io/swh-coar-notify-restored 15m 2 2 Cluster in healthy state swh-coar-notify-restored-1 NAME READY STATUS RESTARTS AGE pod/swh-coar-notify-1 1/1 Running 0 42h pod/swh-coar-notify-2 1/1 Running 0 42h pod/swh-coar-notify-restored-1 1/1 Running 0 12m pod/swh-coar-notify-restored-2 1/1 Running 0 10m NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE service/swh-coar-notify-restored-rw ClusterIP 10.43.28.69 <none> 5432/TCP 15m service/swh-coar-notify-rw ClusterIP 10.43.210.210 <none> 5432/TCP 43h NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE persistentvolumeclaim/swh-coar-notify-1 Bound pvc-1c048c7f-8823-4fa3-8f94-9b40df890a88 5Gi RWO local-persistent 43h persistentvolumeclaim/swh-coar-notify-1-wal Bound pvc-5c0560cf-cdf9-45a6-965b-bdab43fffdfd 1Gi RWO local-persistent 43h persistentvolumeclaim/swh-coar-notify-2 Bound pvc-73a3d191-bda5-407e-9e05-5303051dc943 5Gi RWO local-persistent 42h persistentvolumeclaim/swh-coar-notify-2-wal Bound pvc-eb678a31-69a5-4f84-a955-fdad6b6289bf 1Gi RWO local-persistent 42h persistentvolumeclaim/swh-coar-notify-restored-1 Bound pvc-0f0bb19e-2416-4a37-8f50-47b5157f2540 5Gi RWO local-persistent 15m persistentvolumeclaim/swh-coar-notify-restored-1-wal Bound pvc-f87fd674-dd0a-414c-b307-185699d0b039 1Gi RWO local-persistent 15m persistentvolumeclaim/swh-coar-notify-restored-2 Bound pvc-4bf7f7e3-834c-41ee-92c6-921fa6196abf 5Gi RWO local-persistent 12m persistentvolumeclaim/swh-coar-notify-restored-2-wal Bound pvc-e8648e3d-6c7e-4ced-b099-44da60f63662 1Gi RWO local-persistent 12m
Check databases in source cluster:
ᐅ kbt exec -ti pod/swh-coar-notify-1 -n swh-coar-notify -c postgres -- psql -t -c '\l' postgres | postgres | UTF8 | libc | en_US.utf8 | en_US.utf8 | | | swh-coar-notify | swh-coar-notify | UTF8 | libc | en_US.utf8 | en_US.utf8 | | | template0 | postgres | UTF8 | libc | en_US.utf8 | en_US.utf8 | | | =c/postgres + | | | | | | | | postgres=CTc/postgres template1 | postgres | UTF8 | libc | en_US.utf8 | en_US.utf8 | | | =c/postgres + | | | | | | | | postgres=CTc/postgres
Check databases in restored cluster:
ᐅ kbt exec -ti pod/swh-coar-notify-restored-1 -n swh-coar-notify -c postgres -- psql -t -c '\l' app | app | UTF8 | libc | en_US.utf8 | en_US.utf8 | | | postgres | postgres | UTF8 | libc | en_US.utf8 | en_US.utf8 | | | swh-coar-notify | swh-coar-notify | UTF8 | libc | en_US.utf8 | en_US.utf8 | | | template0 | postgres | UTF8 | libc | en_US.utf8 | en_US.utf8 | | | =c/postgres + | | | | | | | | postgres=CTc/postgres template1 | postgres | UTF8 | libc | en_US.utf8 | en_US.utf8 | | | =c/postgres + | | | | | | | | postgres=CTc/postgres
There's an unwantedapp
db (default db name in a cluster).Check
swh-coar-notify
database in source cluster:ᐅ kbt exec -ti pod/swh-coar-notify-1 -n swh-coar-notify -c postgres -- psql -t -d swh-coar-notify -c '\dt' public | compte | table | swh-coar-notify public | utilisateur | table | swh-coar-notify
Check
swh-coar-notify
database in restored cluster:ᐅ kbt exec -ti pod/swh-coar-notify-restored-1 -n swh-coar-notify -c postgres -- psql -t -d swh-coar-notify -c '\dt' public | compte | table | swh-coar-notify public | utilisateur | table | swh-coar-notify
Everything seems to work fine except the creation of a default db
app
.
Deleting restored cluster and its archived WALs:ᐅ kbt delete -f swh-coar-notify-recovery-cluster.yaml cluster.postgresql.cnpg.io "swh-coar-notify-restored" deleted ᐅ mc rm -r --force cpng/backup-cpng/test-staging-rke2/swh-coar-notify-restored Removed `cpng/backup-cpng/test-staging-rke2/swh-coar-notify-restored/wals/00000002.history.gz`. Removed `cpng/backup-cpng/test-staging-rke2/swh-coar-notify-restored/wals/0000000200000000/000000020000000000000006.gz`. Removed `cpng/backup-cpng/test-staging-rke2/swh-coar-notify-restored/wals/0000000200000000/000000020000000000000007.gz`. Removed `cpng/backup-cpng/test-staging-rke2/swh-coar-notify-restored/wals/0000000200000000/000000020000000000000008.gz`. Removed `cpng/backup-cpng/test-staging-rke2/swh-coar-notify-restored/wals/0000000200000000/000000020000000000000009.00000060.backup.gz`. Removed `cpng/backup-cpng/test-staging-rke2/swh-coar-notify-restored/wals/0000000200000000/000000020000000000000009.gz`. Removed `cpng/backup-cpng/test-staging-rke2/swh-coar-notify-restored/wals/0000000200000000/00000002000000000000000A.gz`.
-
I forgot to change the labels on the restored cluster manifest so the output cannot be filtered. ↩
Edited by Guillaume Samson -
- Owner
Second cnpg cluster restoration from MinIO s3 backups:
ᐅ diff -u swh-coar-notify-{,recovery-}cluster.yaml --- swh-coar-notify-cluster.yaml 2025-03-28 16:22:24.231272432 +0100 +++ swh-coar-notify-recovery-cluster.yaml 2025-03-28 16:23:26.952099484 +0100 @@ -2,10 +2,10 @@ apiVersion: postgresql.cnpg.io/v1 kind: Cluster metadata: - name: swh-coar-notify + name: swh-coar-notify-restored namespace: swh-coar-notify labels: - cnpg.swh: coar-notify + cnpg.swh: coar-notify-restored spec: instances: 2 startDelay: 300 @@ -68,16 +68,32 @@ cpu: "400m" bootstrap: - initdb: + recovery: database: swh-coar-notify - owner: swh-coar-notify - encoding: 'UTF8' - localeCollate: 'en_US.utf8' - localeCType: 'en_US.utf8' - dataChecksums: true - walSegmentSize: 32 - secret: - name: cnpg-swh-coar-notify-creds + source: swh-coar-notify + + # /!\ externalClusters.name must match the source backup name + externalClusters: + - name: swh-coar-notify + barmanObjectStore: + destinationPath: s3://backup-cpng/test-staging-rke2/ + endpointURL: https://minio.admin.swh.network + s3Credentials: + accessKeyId: + name: cnpg-minio-creds + key: MINIO_ACCESS_KEY + secretAccessKey: + name: cnpg-minio-creds + key: MINIO_SECRET_KEY + wal: + compression: gzip + maxParallel: 8 + data: + compression: gzip + additionalCommandArgs: + - "--min-chunk-size=5MB" + - "--read-timeout=60" + - "-vv" monitoring: enablePodMonitor: true
Check databases in restored cluster:
ᐅ kbt exec -ti pod/swh-coar-notify-restored-1 -n swh-coar-notify -c postgres -- psql -t -c '\l' postgres | postgres | UTF8 | libc | en_US.utf8 | en_US.utf8 | | | swh-coar-notify | swh-coar-notify | UTF8 | libc | en_US.utf8 | en_US.utf8 | | | template0 | postgres | UTF8 | libc | en_US.utf8 | en_US.utf8 | | | =c/postgres + | | | | | | | | postgres=CTc/postgres template1 | postgres | UTF8 | libc | en_US.utf8 | en_US.utf8 | | | =c/postgres + | | | | | | | | postgres=CTc/postgres
This time it seems ok.
Check
swh-coar-notify
database access in restored cluster:ᐅ kbt port-forward svc/swh-coar-notify-restored-rw 5432:5432 -n swh-coar-notify Forwarding from 127.0.0.1:5432 -> 5432 Forwarding from [::1]:5432 -> 5432 Handling connection for 5432 E0328 16:55:23.859864 602955 portforward.go:394] error copying from local connection to remote stream: read tcp6 [::1]:5432->[::1]:41610: read: connection reset by peer Handling connection for 5432 Handling connection for 5432 Handling connection for 5432 [...]
ᐅ username=$(kbt get secrets -n swh-coar-notify cnpg-swh-coar-notify-creds -o jsonpath='{.data.username}' | base64 -d) ᐅ password=$(kbt get secrets -n swh-coar-notify cnpg-swh-coar-notify-creds -o jsonpath='{.data.password}' | base64 -d) ᐅ psql -d "dbname=swh-coar-notify user=$username host=localhost password=$password" psql: error: connection to server at "localhost" (::1), port 5432 failed: FATAL: password authentication failed for user "swh-coar-notify" connection to server at "localhost" (::1), port 5432 failed: FATAL: password authentication failed for user "swh-coar-notify" ᐅ psql -d "dbname=swh-coar-notify user=guest host=localhost" Password for user guest: psql (17.4 (Debian 17.4-1.pgdg120+2)) SSL connection (protocol: TLSv1.3, cipher: TLS_AES_256_GCM_SHA384, compression: off, ALPN: postgresql) Type "help" for help. swh-coar-notify=>
Userswh-coar-notify
can not access restored database. - Owner
Third cnpg cluster restoration from MinIO s3 backups:
ᐅ diff -u swh-coar-notify-{,recovery-}cluster.yaml --- swh-coar-notify-cluster.yaml 2025-03-28 16:22:24.231272432 +0100 +++ swh-coar-notify-recovery-cluster.yaml 2025-03-28 17:23:56.868689409 +0100 @@ -2,10 +2,10 @@ apiVersion: postgresql.cnpg.io/v1 kind: Cluster metadata: - name: swh-coar-notify + name: swh-coar-notify-restored namespace: swh-coar-notify labels: - cnpg.swh: coar-notify + cnpg.swh: coar-notify-restored spec: instances: 2 startDelay: 300 @@ -68,16 +68,35 @@ cpu: "400m" bootstrap: - initdb: + recovery: database: swh-coar-notify owner: swh-coar-notify - encoding: 'UTF8' - localeCollate: 'en_US.utf8' - localeCType: 'en_US.utf8' - dataChecksums: true - walSegmentSize: 32 secret: name: cnpg-swh-coar-notify-creds + source: swh-coar-notify + + # /!\ externalClusters.name must match the source backup name + externalClusters: + - name: swh-coar-notify + barmanObjectStore: + destinationPath: s3://backup-cpng/test-staging-rke2/ + endpointURL: https://minio.admin.swh.network + s3Credentials: + accessKeyId: + name: cnpg-minio-creds + key: MINIO_ACCESS_KEY + secretAccessKey: + name: cnpg-minio-creds + key: MINIO_SECRET_KEY + wal: + compression: gzip + maxParallel: 8 + data: + compression: gzip + additionalCommandArgs: + - "--min-chunk-size=5MB" + - "--read-timeout=60" + - "-vv" monitoring: enablePodMonitor: true
Check
swh-coar-notify
database access:ᐅ kbt port-forward svc/swh-coar-notify-restored-rw 5432:5432 -n swh-coar-notify Forwarding from 127.0.0.1:5432 -> 5432 Forwarding from [::1]:5432 -> 5432 Handling connection for 5432 [...]
ᐅ password=$(kbt get secrets -n swh-coar-notify cnpg-swh-coar-notify-creds -o jsonpath='{.data.password}' | base64 -d) ᐅ username=$(kbt get secrets -n swh-coar-notify cnpg-swh-coar-notify-creds -o jsonpath='{.data.username}' | base64 -d) ᐅ psql -d "dbname=swh-coar-notify user=$username host=localhost password=$password" psql (17.4 (Debian 17.4-1.pgdg120+2)) SSL connection (protocol: TLSv1.3, cipher: TLS_AES_256_GCM_SHA384, compression: off, ALPN: postgresql) Type "help" for help. swh-coar-notify=> create table toto (id int primary key not null, name varchar(250)); CREATE TABLE swh-coar-notify=> \dt List of relations Schema | Name | Type | Owner --------+-------------+-------+----------------- public | compte | table | swh-coar-notify public | toto | table | swh-coar-notify public | utilisateur | table | swh-coar-notify (3 rows)
ᐅ psql -d "dbname=swh-coar-notify user=guest host=localhost" Password for user guest: psql (17.4 (Debian 17.4-1.pgdg120+2)) SSL connection (protocol: TLSv1.3, cipher: TLS_AES_256_GCM_SHA384, compression: off, ALPN: postgresql) Type "help" for help. swh-coar-notify=> create table titi (id int primary key not null, name varchar(250)); ERROR: permission denied for schema public LINE 1: create table titi (id int primary key not null, name varchar... ^
Finally everything seems to work as expected.
- Guillaume Samson mentioned in commit swh/infra/ci-cd/k8s-clusters-conf@c38f8928
mentioned in commit swh/infra/ci-cd/k8s-clusters-conf@c38f8928
- Guillaume Samson mentioned in commit swh/infra/ci-cd/swh-charts@686ab70d
mentioned in commit swh/infra/ci-cd/swh-charts@686ab70d
- Guillaume Samson mentioned in commit swh/infra/ci-cd/swh-charts@5bd24f83
mentioned in commit swh/infra/ci-cd/swh-charts@5bd24f83
- Guillaume Samson mentioned in commit swh/infra/ci-cd/swh-charts@c422a0fe
mentioned in commit swh/infra/ci-cd/swh-charts@c422a0fe
- Guillaume Samson mentioned in commit swh/infra/ci-cd/swh-charts@8d1c51fd
mentioned in commit swh/infra/ci-cd/swh-charts@8d1c51fd