diff --git a/vsellier/swh-c4-model/docs/provenance-v0.3/.adr-dir b/vsellier/swh-c4-model/docs/provenance-v0.3/.adr-dir new file mode 100644 index 0000000000000000000000000000000000000000..d0a7d896b535fc40acc9e00f5bf4952bbf29a6bd --- /dev/null +++ b/vsellier/swh-c4-model/docs/provenance-v0.3/.adr-dir @@ -0,0 +1 @@ +adrs diff --git a/vsellier/swh-c4-model/docs/provenance-v0.3/adrs/.adr-dir b/vsellier/swh-c4-model/docs/provenance-v0.3/adrs/.adr-dir new file mode 100644 index 0000000000000000000000000000000000000000..9c558e357c41674e39880abb6c3209e539de42e2 --- /dev/null +++ b/vsellier/swh-c4-model/docs/provenance-v0.3/adrs/.adr-dir @@ -0,0 +1 @@ +. diff --git a/vsellier/swh-c4-model/docs/provenance-v0.3/provenance.md b/vsellier/swh-c4-model/docs/provenance-v0.3/provenance.md new file mode 100644 index 0000000000000000000000000000000000000000..b3e2d74d2568088eb9bc84e4e051afda58dd5077 --- /dev/null +++ b/vsellier/swh-c4-model/docs/provenance-v0.3/provenance.md @@ -0,0 +1,186 @@ +# swh-provenance ops documentation + +## References + +- [Provenance v0.3.3 deployment spec](https://hedgedoc.softwareheritage.org/scsWvzQZRO2HW2gisANXBw?view) +- [Source code](https://gitlab.softwareheritage.org/swh/devel/snippets/-/merge_requests/27) + +## Communication + +*Put here the communication points or where to find the contacts to not expose email addresses* + +### Contacts + +*who* + +### Status.io + +*Where/when* + +## Global infra + +*Coarse grained process* + +The provenance application is a standalone grpc server. It does not depend on +other swh services. + +This application is exposed on the vpn and used by the web api clients. + +There is no writing, only read-only queries. + +Its backend relies on parquet files. + +## Authentication + +Through the standard web api authentication mechanism. + +# Volume + +### datasets + +The provenance needs 2 datasets: +- parquet files which is the main database queries by the provenance server +- graph files: + - graph.pthash + - graph.pthash.order + - graph.node2swhid.bin + - graph.node2type.bin + - graph.nodes.count.txt + - graph.property.message.bin + - graph.property.message.offset.bin + - graph.property.tag_name.bin + - graph.property.tag_name.offset.bin + +Note: the .bin files are stored compressed (.zst) in s3. + +#### production (with the 2024-12-06 graph) + +- Memory consumption: at least 30GB, up to 70GB would be nice. More is better + (kernel cache of mmaped files) but has diminishing returns. Probably not + worth dedicating more than 200GB +- Disk (17.5TB): + - provenance database: 16TB [1] + - needs some of the graph files locally (~2.7TB): from a matching graph + version + +[1] In the future, we could use remote files too [e.g. s3, minio] at the cost +of performance + +#### staging (with the 2024-08-23-popular-500-python graph) + +- Memory consumption: TBD +- Disk 32GB: + - provenance database: 30GB + (on `/srv/softwareheritage/ssd/data/vlorentz/provenance-all/2024-08-23-popular-500-python`) + - graph files: 1.5GB + +## Internal Domains + +As the provenance will be used through the webapi, there is no public domain, +only internal. + +The hostnames will be: +- staging: `provenance.internal.staging.swh.network` +- production: `provenance.internal.softwareheritage.org` + +### Sentry + +The sentry configuration is already defined from the previous version. + +### Rate limit + +The standard web api rate limit mechanism will be used. Actual rate limit for +the provenance api remains to be defined. + +As this rate limit mechanism is per user though, this won't prevent burst of +requests. + +The rpc service (from the previous implementation) was defacto limited by the +number of (gunicorn) workers used. As the new implementation is a grpc rust +process, we cannot limit the number of connections. In that regard, we'll be +adding a max concurrent parallel connection configuration at the ingress level +to limit. + +### Timeout chain + +Top timeout should be greater than the sum of the dependent timeouts including +the retries. + +| Id | Origin | Target | Dependency | Retry count/Timeout (s) | Value (s) | +| --- | ------------ | ---------------- | :--------: | :---------------------: | :-------: | +| 1 | client | web-ingress | 2 | ? | TBD | +| 2 | web-ingress | webapp | 3 | 0 | TBD | +| 3 | webapp | grpc-ingress | 4 | 0 | TBD | +| 4 | grpc-ingress | grpc | 5 | 0 | TBD | +| 3 | grpc | parquet files | X | 0 | TBD | + +### Side impacts + +None + +## Deployment + +### Preliminary tasks + +- [ ] https://gitlab.softwareheritage.org/swh/infra/swh-apps/-/merge_requests/53: Adapt the swh-provenance docker image with rust tools +- [x] ~~Create sentry project (if needed)~~ +- [ ] Update the deployment charts to readapt the swh-provenance application + - [ ] Allow type declaration of the server: rpc or grpc (as both exist) + - [ ] Grpc: Prepare graph and provenance datasets in pv + - [ ] Grpc: Allow to run service (behind ingress) +- [ ] Prepare the `swh-next-version` and `staging` configurations + - [ ] Share the pvc/pv with the same dataset + - [ ] Expose service + +### Staging deployment + +The staging deployment will be deployed with the new grpc server. + + + +### Production deployment + +The project is currently in MVP stage and planned to be only accessible in +staging. + +The production deployment will be adapted later once we tested the staging +instance. We need to determine the test scenarios that will give a go to run +the equivalent instance in production. + +In the mean time, production remains as-is (running the rpc service that hit +the graph grpc): + + + +## Monitoring + +*Add the monitoring points and information here* + +TODO: +- Add a Link to the ingresses here (staging/production) +- Any other metrics + +## Procedures + +### Backup/Restore + +The service is a read-only service. And the dataset files are stored in s3. + +As the service initializes its dataset from s3 (through an init-container), if +any data loss occurs, we need to restart the dataset installation process. + +### User management + +Users management happen with keycloak. + +For provenance, users allows to manipulate the provenance have the role +`swh.web.api.provenance`. + +### Red button + +As the service is only exposed through the webapp, we can simply deactivate +the provenance configuration in the webapp deployment. + +### Reaction to alerts + +TBD diff --git a/vsellier/swh-c4-model/workspace.dsl b/vsellier/swh-c4-model/workspace.dsl index 7b05964f9cee3b372fe54e59bf17f47bbba0532e..ac3131b079ad49b750d3da3098aef03d0f86a9cb 100644 --- a/vsellier/swh-c4-model/workspace.dsl +++ b/vsellier/swh-c4-model/workspace.dsl @@ -39,7 +39,7 @@ workspace { swh = softwareSystem "SoftwareHeritage" { !decisions docs/coarnotify/adrs - keycloak = container "keycloak" "" """provenance" + keycloak = container "keycloak" "" "provenance,provenance_v2" gitlab = container "Gitlab" { tags external,add-forge-now @@ -55,6 +55,15 @@ workspace { technology PostgreSQL } } + + group swh-provenance { + provenance-grpc = container "provenance-grpc" { + !docs docs/provenance-v0.3 + !decisions docs/provenance-v0.3/adrs + technology rust,parquet + } + } + alter = container "swh-alter" { tags tdn,search @@ -75,9 +84,10 @@ workspace { technology python tags "tdn" } + graph_grpc = container "swh-graph grpc" { technology "java/rust" - tags provenance,tdn + tags "provenance_v2,tdn" } lister = container "listers" @@ -88,7 +98,17 @@ workspace { provenance_rpc = container "swh-provenance-rpc" { technology python - tags provenance + tags provenance_v2 + } + + provenance_grpc = container "swh-provenance-grpc" { + technology rust + tags "provenance_v2,provenance" + } + + provenance_parquet_files = container "swh-parquet-files" { + technology rust + tags provenance,parquet } rabbitmq = container "RabbitMQ" { @@ -99,8 +119,10 @@ workspace { tags scn scheduler_rpc = component "RPC API" - scheduler_runner = component "scheduler runner" - scheduler_listener = component "scheduler listerner" + scheduler_runner = component "scheduler runner (lister, addforgenow, cook, deposit)" + scheduler_runner_priority = component "scheduler runner priority (save-code-now)" + scheduler_schedule_recurrent = component "scheduler of recurrent origins to load" + scheduler_listener = component "scheduler listener" scheduler_journal_client = component "journal client" scheduler_db = component "Scheduler database" { @@ -117,7 +139,7 @@ workspace { masking_proxy = component "masking-proxy" { technology "python" - + description "Filters names and objects" } @@ -225,7 +247,7 @@ workspace { } webapp = container "Webapp" { - tags scn, vault, provenance, citation,search, add-forge-now + tags scn, vault, provenance, provenance_v2, citation,search, add-forge-now } group "winery" { @@ -288,7 +310,7 @@ workspace { graph_rpc -> graph_grpc "Starts" "grpc" "graph" // indexer - indexer_storage_rpc -> indexer_db "reads and writes graph" "sql" + indexer_storage_rpc -> indexer_db "reads and writes content metadata" "sql" // mirrors mirrors -> kafka "follows objects stream" "kafka" @@ -315,7 +337,11 @@ workspace { // scheduler scheduler_runner -> rabbitmq "posts tasks" "celery" - scheduler_runner -> scheduler_rpc "selects origins to schedule" "sql" + scheduler_runner -> scheduler_rpc "selects tasks to schedule" "sql" + scheduler_runner_priority -> rabbitmq "posts tasks" "celery" + scheduler_runner_priority -> scheduler_rpc "selects tasks to schedule" "sql" + scheduler_schedule_recurrent -> rabbitmq "posts tasks" "celery" + scheduler_schedule_recurrent -> scheduler_rpc "selects origins to schedule" "sql" scheduler_journal_client -> kafka "reads messages" "tcp" scheduler_rpc -> scheduler_db "reads and Writes" "sql" scheduler_listener -> scheduler_rpc "updates task status" @@ -344,8 +370,8 @@ workspace { vault_rpc -> storage_rpc "???" "rpc" "to_check,vault" vault_cookers -> rabbitmq "Gets a cooking task" "" "vault" vault_cookers -> graph_rpc "Asks the swhid to cook" "rpc" "to_check,vault" - vault_cookers -> storage_rpc "???" "rpc" "to_check,vault" - vault_cookers -> vault_rpc "Sends the bundle" "" vault" + vault_cookers -> storage_rpc "Retrieves data to cook" "rpc" "to_check,vault" + vault_cookers -> vault_rpc "Sends the bundle" "" "vault" vault_rpc -> vaultAzureBucket "Stores the bundle" "" "vault" // search @@ -365,11 +391,13 @@ workspace { role "XXXXX" } } - webapp -> provenance_rpc "Sends requests" "grpc" "provenance,,overlapped" { - } + webapp -> provenance_rpc "Sends requests" "rpc" "provenance,,overlapped" provenance_rpc -> graph_grpc "Sends requests" "grpc" "provenance,overlapped" + webapp -> provenance_grpc "Sends requests" "grpc" "provenance,,overlapped" + provenance_grpc -> provenance_parquet_files "Queries files" "grpc" "provenance,overlapped" + // alter/takedown codeOwner -> dpo "requests a takedown or a name change" "" "tdn" dpo -> systemAdministrator "notifies of a takedown to proceed" "" "tdn" @@ -457,7 +485,7 @@ workspace { containerInstance "webapp" "pg" { tags "Kubernetes - dep" description "archive webapp" - tags provenance + tags provenance_v2,provenance } } } @@ -488,7 +516,7 @@ workspace { tags "Kubernetes - ing" url "http://provenance-local" - containerInstance "provenance_rpc" "cassandra,pg" { + containerInstance "provenance_grpc" "cassandra,pg" { tags "Kubernetes - deploy" } } @@ -509,7 +537,7 @@ workspace { tags db } } - + deploymentNode "archive-webapp-ingress" { tags "Kubernetes - ing" url "http://webapp.staging.swh.network,http://webapp-cassandra.internal.staging.swh.network" @@ -579,7 +607,7 @@ workspace { } deploymentNode "kelvingrove" { - containerInstance "keycloak" "cassandra,pg" "provenance" + containerInstance "keycloak" "cassandra,pg" "provenance,provenance_v2" } deploymentNode "search-esnodeX" { @@ -588,8 +616,8 @@ workspace { } } - deploymentNode "granet" { - containerInstance "graph_grpc" "cassandra,pg" "provenance" + deploymentNode "rancher-node-highmem0[1-2]" { + containerInstance "graph_grpc" "cassandra,pg" "provenance_v2" } deploymentNode "kafkaX" { @@ -727,7 +755,7 @@ workspace { gloin001 = deploymentNode "gloin001" { gloin001_haproxy = infrastructureNode "HaProxy" { description "LoadBalancer" - } + } gloin001_patroni = infrastructureNode "Patroni" { description "HA PG" } @@ -748,7 +776,7 @@ workspace { gloin002 = deploymentNode "gloin002" { gloin002_haproxy = infrastructureNode "HaProxy" { description "LoadBalancer" - } + } gloin002_patroni = infrastructureNode "Patroni" { description "HA PG" } @@ -772,7 +800,7 @@ workspace { gloin002_patroni -> gloin002_postgresql "checks" gloin001_postgresql -> gloin002_postgresql "Replicates" gloin002_postgresql -> gloin001_postgresql "Replicates" - gloin001_haproxy -> gloin001_winery_reader "Reads contents" "http" + gloin001_haproxy -> gloin001_winery_reader "Reads contents" "http" gloin001_haproxy -> gloin002_winery_reader "Reads contents (backup)" "http" "backup,overlapped" gloin002_haproxy -> gloin002_winery_writer "Reads/Writes contents" "http" "overlapped" gloin002_haproxy -> gloin001_winery_writer "Reads/Writes contents (backup)" "http" "backup,overlapped" @@ -812,12 +840,13 @@ workspace { deployment * staging "staging_provenance" { title "swh-provenance Staging deployment" include "element.tag==provenance" - autolayout + autolayout } deployment * production "production_provenance" { - include "element.tag==provenance" + include "element.tag==provenance_v2" + autolayout } @@ -847,6 +876,21 @@ workspace { autolayout } + container swh "provenance_pre_v3_infra" { + title "Provenance pre-v0.3 Infrastructure" + include provenance_rpc + include graph_grpc + autoLayout + } + + container swh "provenance_v3_infra" { + title "Provenance v0.3 Infrastructure" + include provenance_grpc + include provenance_parquet_files + + autoLayout + } + container swh "coarnotify_infra" { title "Coar Notify infrastructure" include coarnotify_rpc @@ -975,7 +1019,7 @@ workspace { dynamic swh "winery-shards-writing" { title "Winery shard preparation steps" - + winery_rpc -> winery_db "Creates a new shard (status WRITING)" winery_rpc -> winery_db "Adds contents to the shard" winery_rpc -> winery_db "Adds id -> shard reference" @@ -986,7 +1030,7 @@ workspace { dynamic swh "winery-shards-packing" { title "Winery shard preparation steps" - + winery_shard_packer -> winery_db "updates FULL shards to PACKING" winery_shard_packer -> winery_db "Read shard contents" winery_shard_packer -> ceph "Save the shard into the rbd image" @@ -997,7 +1041,7 @@ workspace { dynamic swh "winery-shards-mounting" { title "Winery shard preparation steps" - + winery_rbd -> winery_db "Waits for PACKED shards" winery_rbd -> winery_os "Mounts the rbd image" winery_rbd -> winery_db "Updates the shard mount status" @@ -1007,7 +1051,7 @@ workspace { dynamic swh "winery-shards-cleaning" { title "Winery shard preparation steps" - + winery_shard_cleaner -> winery_db "Waits for a PACKED and mounted shard" winery_shard_cleaner -> winery_db "Updates status to CLEANING" winery_shard_cleaner -> winery_db "Removes shard content" @@ -1059,7 +1103,7 @@ workspace { shape RoundedBox background lightblue } - + relationship overlapped { position 75 }