Skip to content
Snippets Groups Projects

deployment/provenance: Adapt template to manage grpc or rpc service

Closed Antoine R. Dumont requested to merge mr/adapt-provenance-deployment into production

This makes the provenance template evolve to be able to declare a new grpc provenance server. It's still possible to deploy the rpc provenance server.

For now, it's focused on not touching the provenance rpc servers already deployed in staging and production (some environment variables got moved so there is a slight configuration checksum change but no behavioral change).

And this adds 2 grpc server instances with the smallest dataset possible:

  • one in the local-cluster [1] (connection tested through toolbox [2])
  • another in the next-version [3]

In another mr, we'll deploy the provenance server in staging.

TODO:

  • local-cluster
    • Check response of grpc service with grpcurl (determine queries first) [2]
    • Check api provenance response with webapp connected to grpc [3]
    • Adapt webapp configuration to use the new provenance [4]
  • next-version: Deploy provenance-grpc & webapp accordingly ^
  • provenance: Add downloading witness file (to avoid concurrent downloads)
  • graph: same ^
[1] local-cluster: Provenance Grpc running ok
2025-03-20T15:04:06.823292798Z Starting the swh-provenance GRPC server
2025-03-20T15:04:06.828395082Z 2025-03-20T15:04:06.828326Z  INFO swh_provenance_grpc_serve: Loading graph properties and database
2025-03-20T15:04:06.843855546Z 2025-03-20T15:04:06.843767Z  INFO swh_provenance::utils: Graph loaded
2025-03-20T15:04:06.904579636Z 2025-03-20T15:04:06.904494Z  INFO swh_provenance::utils: Database loaded
2025-03-20T15:04:06.904596958Z 2025-03-20T15:04:06.904532Z  INFO swh_provenance_grpc_serve: Starting server
2025-03-21T09:15:11.549147527Z graph-grpc-popular-20240823 2025-03-21T09:15:11.549070Z  INFO swh_graph_grpc_serve: Loading graph
2025-03-21T09:15:11.556021379Z graph-grpc-popular-20240823 2025-03-21T09:15:11.555943Z  INFO swh_graph_grpc_serve: Starting server
2025-03-21T09:15:31.921485587Z graph-grpc-popular-20240823 2025-03-21T09:15:31.921332Z  INFO request{id=0}:stats: swh_graph_grpc_server: StatsRequest
2025-03-21T09:15:31.930900836Z graph-grpc-popular-20240823 2025-03-21T09:15:31.930779Z ERROR request{id=0}:stats: swh_graph_grpc_server: Missing compratio in /
srv/graph/2024-08-23_popular-4-shell/compressed/graph.properties
2025-03-21T09:15:31.930938605Z graph-grpc-popular-20240823 2025-03-21T09:15:31.930796Z ERROR request{id=0}:stats: swh_graph_grpc_server: Missing bitspernode in
 /srv/graph/2024-08-23_popular-4-shell/compressed/graph.properties
2025-03-21T09:15:31.930951990Z graph-grpc-popular-20240823 2025-03-21T09:15:31.930802Z ERROR request{id=0}:stats: swh_graph_grpc_server: Missing bitsperlink in
 /srv/graph/2024-08-23_popular-4-shell/compressed/graph.properties
2025-03-21T09:15:31.930963150Z graph-grpc-popular-20240823 2025-03-21T09:15:31.930808Z ERROR request{id=0}:stats: swh_graph_grpc_server: Missing avglocality in
 /srv/graph/2024-08-23_popular-4-shell/compressed/graph.stats
2025-03-21T09:15:31.930990010Z graph-grpc-popular-20240823 2025-03-21T09:15:31.930897Z  INFO request{id=0}: swh_graph_grpc_server::metrics: 200 OK - /swh.graph
.TraversalService/Stats - response: 9.64976ms - streaming: 44.992µs
2025-03-21T09:17:55.699423515Z graph-grpc-popular-20240823 2025-03-21T09:17:55.699342Z  INFO request{id=1}:traverse: swh_graph_grpc_server: TraversalRequest {
src: ["swh:1:cnt:dcb2d732994e615aab0777bfe625bd1f07e486ac"], direction: Backward, edges: None, max_edges: None, min_depth: None, max_depth: None, return_nodes:
 None, mask: None, max_matching_nodes: None }
2025-03-21T09:17:55.704288980Z graph-grpc-popular-20240823 2025-03-21T09:17:55.704221Z  INFO request{id=1}:traverse: swh_graph_grpc_server: error=status: NotFo
und, message: "Unknown SWHID: swh:1:cnt:dcb2d732994e615aab0777bfe625bd1f07e486ac", details: [], metadata: MetadataMap { headers: {} }
2025-03-21T09:17:55.704312002Z graph-grpc-popular-20240823 2025-03-21T09:17:55.704277Z  INFO request{id=1}: swh_graph_grpc_server::metrics: 200 OK - /swh.graph
.TraversalService/Traverse - response: 4.972231ms - streaming: 431ns
2025-03-21T09:27:40.574344623Z graph-grpc-popular-20240823 2025-03-21T09:27:40.574270Z  INFO request{id=2}:traverse: swh_graph_grpc_server: TraversalRequest {
src: ["swh:1:cnt:dcb2d732994e615aab0777bfe625bd1f07e486ac"], direction: Backward, edges: None, max_edges: None, min_depth: None, max_depth: None, return_nodes:
 None, mask: None, max_matching_nodes: None }
2025-03-21T09:27:40.574390096Z graph-grpc-popular-20240823 2025-03-21T09:27:40.574319Z  INFO request{id=2}:traverse: swh_graph_grpc_server: error=status: NotFo
und, message: "Unknown SWHID: swh:1:cnt:dcb2d732994e615aab0777bfe625bd1f07e486ac", details: [], metadata: MetadataMap { headers: {} }
2025-03-21T09:27:40.574395456Z graph-grpc-popular-20240823 2025-03-21T09:27:40.574339Z  INFO request{id=2}: swh_graph_grpc_server::metrics: 200 OK - /swh.graph
.TraversalService/Traverse - response: 100.985µs - streaming: 230ns
[2] local-cluster: Connection to grpc provenance ok
swh@swh-toolbox-5548445f74-x8vmb:~$ server=provenance-grpc-popular-ingress:80
grpcurl --plaintext $server list swh.provenance.ProvenanceService
swh.provenance.ProvenanceService.WhereAreOne
swh.provenance.ProvenanceService.WhereIsOne
swh@swh-toolbox-5548445f74-x8vmb:~$ unknown_swhid="swh:1:cnt:27766b99cdcab4e9b68501c3b50f1712e016c945"
grpcurl -d "{\"swhid\": \"${unknown_swhid}\"}"   --plaintext $server   swh.provenance.ProvenanceService.WhereIsOne
ERROR:
  Code: Internal
  Message: status: NotFound, message: "Unknown SWHID: swh:1:cnt:27766b99cdcab4e9b68501c3b50f1712e016c945", details: [], metadata: MetadataMap { headers: {} }
swh@swh-toolbox-5548445f74-x8vmb:~$ swhid="swh:1:cnt:07d9e8c75f4f7e7dba04b5b4e8589a158c8a6892"
grpcurl -d "{\"swhid\": \"${swhid}\"}"   --plaintext $server   swh.provenance.ProvenanceService.WhereIsOne
{
  "swhid": "swh:1:cnt:07d9e8c75f4f7e7dba04b5b4e8589a158c8a6892",
  "anchor": "swh:1:rev:aef9e137acd823aa0097f195b613f96aae619923"
}
[4] local-cluster: Connection through webapp to the new grpc provenance
$ curl -s http://web-local-archive-ingress/api/1/provenance/whereis/swh:1:cnt:07d9e8c75f4f7e7dba04b5b4e8589a158c8a6892/ | jq .

"swh:1:cnt:07d9e8c75f4f7e7dba04b5b4e8589a158c8a6892;anchor=swh:1:rev:aef9e137acd823aa0097f195b613f96aae619923"

# Missing contents
$ curl -s http://web-local-archive-ingress/api/1/provenance/whereis/swh:1:cnt:dcb2d732994e615aab0777bfe625bd1f07e486ac/ | jq .

{
  "exception": "_InactiveRpcError",
  "reason": "<_InactiveRpcError of RPC that terminated with:\n\tstatus = StatusCode.INTERNAL\n\tdetails = \"status: NotFound, message: \"Unknown SWHID: swh:1:cnt:dcb2d732994e615aab0777bfe625bd1f07e486ac\", details: [], metadata: MetadataMap { headers: {} }\"\n\tdebug_error_string = \"UNKNOWN:Error received from peer  {grpc_message:\"status: NotFound, message: \\\"Unknown SWHID: swh:1:cnt:dcb2d732994e615aab0777bfe625bd1f07e486ac\\\", details: [], metadata: MetadataMap { headers: {} }\", grpc_status:13, created_time:\"2025-03-21T15:42:58.066805842+00:00\"}\"\n>"
}
$ curl -s http://web-local-archive-ingress/api/1/provenance/whereis/swh:1:cnt:dcb2d732994e615aab0777bfe625bd1f07e486ac/ | jq .
{
  "exception": "_InactiveRpcError",
  "reason": "<_InactiveRpcError of RPC that terminated with:\n\tstatus = StatusCode.INTERNAL\n\tdetails = \"status: NotFound, message: \"Unknown SWHID: swh:1:cnt:dcb2d732994e615aab0777bfe625bd1f07e486ac\", details: [], metadata: MetadataMap { headers: {} }\"\n\tdebug_error_string = \"UNKNOWN:Error received from peer  {grpc_message:\"status: NotFound, message: \\\"Unknown SWHID: swh:1:cnt:dcb2d732994e615aab0777bfe625bd1f07e486ac\\\", details: [], metadata: MetadataMap { headers: {} }\", grpc_status:13, created_time:\"2025-03-21T15:43:39.432082899+00:00\"}\"\n>"
}
[3] Deploy next-version provenance grpc instance
[swh] Comparing changes between branches production and mr/adapt-provenance-deployment (per environment)...
Your branch is up to date with 'origin/production'.
[swh] Generate config in production branch for environment staging, namespace swh...
[swh] Generate config in production branch for environment staging, namespace swh-cassandra...
[swh] Generate config in production branch for environment staging, namespace next-version...
[swh] Generate config in mr/adapt-provenance-deployment branch for environment staging...
[swh] Generate config in mr/adapt-provenance-deployment branch for environment staging...
[swh] Generate config in mr/adapt-provenance-deployment branch for environment staging...
Your branch is up to date with 'origin/production'.
[swh] Generate config in production branch for environment production, namespace swh...
[swh] Generate config in production branch for environment production, namespace swh-cassandra...
[swh] Generate config in production branch for environment production, namespace next-version...
[swh] Generate config in mr/adapt-provenance-deployment branch for environment production...
[swh] Generate config in mr/adapt-provenance-deployment branch for environment production...
[swh] Generate config in mr/adapt-provenance-deployment branch for environment production...


------------- diff for environment staging namespace swh -------------

--- /tmp/swh-chart.swh.lQqjoZXu/staging-swh.before	2025-03-21 17:04:33.345117955 +0100
+++ /tmp/swh-chart.swh.lQqjoZXu/staging-swh.after	2025-03-21 17:04:33.997092876 +0100
@@ -1914,21 +1914,21 @@
     fi
 
     graph_transposed_name=${GRAPH_NAME}-transposed.graph
     if [ -L ${DATASET_LOCATION}/${graph_transposed_name} ] || ! [ -f ${DATASET_LOCATION}/${graph_transposed_name} ]; then
       cp -v --remove-destination ${DATASET_SOURCE}/${graph_transposed_name} ${DATASET_LOCATION}/;
     fi
 
     # Finally, we make explicit the graph is ready
     touch ${WITNESS_FILE}
 
-  graph-wait-for-dataset.sh: |
+  wait-for-dataset.sh: |
     #!/usr/bin/env bash
     # Uses env variables WITNESS_FILE
     [ -z "${WITNESS_FILE}" ] && \
       echo "<WITNESS_FILE> env variable must be set" && exit 1
 
     while [ ! -f ${WITNESS_FILE} ]; do
         echo "${WITNESS_FILE} not present, wait for it to start the graph..."
         sleep $PERIOD
     done
 
@@ -2010,20 +2010,125 @@
         echo "${WITNESS_SOURCE_FILE} missing, waiting graph dataset installation..."
         sleep $PERIOD
     done
 
     # For old datasets missing a .ef or in the wrong format, this fails with
     # `Cannot map Elias-Fano pointer list .../graph.ef`. The solution is to
     # reindex the dataset
     swh graph reindex --ef ${DATASET_LOCATION}/${GRAPH_NAME} && \
       touch $WITNESS_REINDEX_FILE
 
+  provenance-fetch-datasets.sh: |
+    #!/usr/bin/env bash
+    [ -z "${WITNESS_FETCH_FILE}" ] && \
+      echo "<WITNESS_FETCH_FILE> env variable must be set" && exit 1
+    [ -z "${DATASET_VERSION}" ] && \
+      echo "<DATASET_VERSION> env variable must be set" && exit 1
+    [ -z "${PROVENANCE_PATH}" ] && \
+      echo "<PROVENANCE_PATH> env variable must be set" && exit 1
+    [ -z "${GRAPH_PATH}" ] && \
+      echo "<GRAPH_PATH> env variable must be set" && exit 1
+
+    [ -f ${WITNESS_FETCH_FILE} ] && \
+        echo "Datasets graph & provenance <${DATASET_VERSION}> already present. Skip." && \
+        exit 0
+
+    set -e
+
+    # Create destination paths
+    mkdir -p ${PROVENANCE_PATH} ${GRAPH_PATH}
+
+    echo "Fetching datasets..."
+
+    if [ ${PROVENANCE_DATASET_FULL} = true ]; then
+        # Retrieve all the provenance dataset
+        REFS=all
+    else
+        # This excludes revisions not targetted by a snapshot
+        # Ok to use for test purposes
+        REFS=heads
+    fi
+
+    URL_PROVENANCE="s3://softwareheritage/derived_datasets/${DATASET_VERSION}/provenance/${REFS}/"
+
+    CMD_GET="aws s3 cp --no-progress --no-sign-request"
+
+    echo "1. Fetching provenance dataset (parquet files)..."
+    ${CMD_GET} --recursive "${URL_PROVENANCE}" "${PROVENANCE_PATH}"
+    echo "1. Provenance datasets installed!"
+
+    echo "2. Fetching extra graph files..."
+    URL_GRAPH="s3://softwareheritage/graph/${DATASET_VERSION}/compressed"
+
+    mkdir -p "${GRAPH_PATH}"
+    for filename in graph.pthash graph.pthash.order graph.nodes.count.txt \
+                    graph.property.message.bin.zst \
+                    graph.property.message.offset.bin.zst \
+                    graph.property.tag_name.bin.zst \
+                    graph.property.tag_name.offset.bin.zst \
+                    graph.node2swhid.bin.zst graph.node2type.bin.zst; do
+        ${CMD_GET} "${URL_GRAPH}/${filename}" "${GRAPH_PATH}"
+    done
+    echo "2. Extra graph files installed!"
+
+    echo "3. Uncompressing graph files..."
+    set -x
+    # Uncompress the compressed graph *.zst files
+    for filepath in $(ls ${GRAPH_PATH}/*.zst); do
+        # Uncompress and delete the .zst file
+        [ -f "${filepath}" ] && unzstd --force --rm "${filepath}"
+    done
+    set +x
+    echo "3. Graph files uncompressed!"
+
+    # Make explicit the provenance datasets are fetched
+    touch ${WITNESS_FETCH_FILE}
+
+    echo "Provenance datasets installed!"
+
+  provenance-index-dataset.sh: |
+    #!/usr/bin/env bash
+    [ -z "${WITNESS_DATASETS_FILE}" ] && \
+      echo "<WITNESS_DATASETS_FILE> env variable must be set" && exit 1
+    [ -z "${WITNESS_INDEX_FILE}" ] && \
+      echo "<WITNESS_INDEX_FILE> env variable must be set" && exit 1
+    [ -z "${PERIOD}" ] && \
+      echo "<PERIOD> env variable must be set" && exit 1
+    [ -z "${PROVENANCE_PATH}" ] && \
+      echo "<PROVENANCE_PATH> env variable must be set" && exit 1
+
+    [ -f ${WITNESS_INDEX_FILE} ] && echo "Provenance already indexed, do nothing." && \
+      exit 0
+
+    set -eu
+
+    # Let's wait for the dataset installation
+    while [ ! -f "${WITNESS_DATASETS_FILE}" ]; do
+        echo "${WITNESS_DATASETS_FILE} missing, waiting provenance dataset installation..."
+        sleep $PERIOD
+    done
+
+    echo "Datasets file installed, build provenance dataset indexes..."
+
+    echo "provenance path: $PROVENANCE_PATH"
+    set -x
+
+    # To make the query faster, the provenance needs to build index out of the
+    # current dataset files. We store the output indexes in the same path as
+    # the dataset.
+    swh-provenance-index \
+      --database file://${PROVENANCE_PATH} && \
+      touch "${WITNESS_INDEX_FILE}" && \
+      echo "Provenance indexes built!" || \
+
+    echo "Provenance indexes failed!"
+
   initialize-search-backend.sh: |
     #!/usr/bin/env bash
 
     set -eux
 
     # Uses internally the environment variable SWH_CONFIG_FILENAME
     swh search initialize
   register-task-types.sh: |
     #!/usr/bin/env bash
 
@@ -3974,21 +4079,21 @@
   strategy:
     type: RollingUpdate
     rollingUpdate:
       maxSurge: 1
   template:
     metadata:
       labels:
         app: storage-replayer-content
       annotations:
         checksum/config: 6dab6ee0c1a4a8b88f25c2ae5ae03e8f9123247ab730bc299f03ad5e552fdd2e
-        checksum/config_utils: 82ab9d2291625dd17a30267f551b45870420eefd4a90bb40f14412553c45556a
+        checksum/config_utils: 233f1b432787895386fcdfff598b35a77ca1f18d4a8f7f0136af55928674c9a9
     spec:
       affinity:
         nodeAffinity:
           requiredDuringSchedulingIgnoredDuringExecution:
             nodeSelectorTerms:
             - matchExpressions:
               - key: swh/replayer
                 operator: In
                 values:
                 - "true"
@@ -4114,21 +4219,21 @@
   strategy:
     type: RollingUpdate
     rollingUpdate:
       maxSurge: 1
   template:
     metadata:
       labels:
         app: storage-replayer-directory
       annotations:
         checksum/config: 6e201e0b3d31c59906f2f6bb40eed69c44f72f5fc46aa224a9115980194888b5
-        checksum/config_utils: 82ab9d2291625dd17a30267f551b45870420eefd4a90bb40f14412553c45556a
+        checksum/config_utils: 233f1b432787895386fcdfff598b35a77ca1f18d4a8f7f0136af55928674c9a9
     spec:
       affinity:
         nodeAffinity:
           requiredDuringSchedulingIgnoredDuringExecution:
             nodeSelectorTerms:
             - matchExpressions:
               - key: swh/replayer
                 operator: In
                 values:
                 - "true"
@@ -4254,21 +4359,21 @@
   strategy:
     type: RollingUpdate
     rollingUpdate:
       maxSurge: 1
   template:
     metadata:
       labels:
         app: storage-replayer-extid
       annotations:
         checksum/config: fb5ef271dff488758f430d36ab153d3d71c2bcd5466401f8b4b8f56eecdbab09
-        checksum/config_utils: 82ab9d2291625dd17a30267f551b45870420eefd4a90bb40f14412553c45556a
+        checksum/config_utils: 233f1b432787895386fcdfff598b35a77ca1f18d4a8f7f0136af55928674c9a9
     spec:
       affinity:
         nodeAffinity:
           requiredDuringSchedulingIgnoredDuringExecution:
             nodeSelectorTerms:
             - matchExpressions:
               - key: swh/replayer
                 operator: In
                 values:
                 - "true"
@@ -4394,21 +4499,21 @@
   strategy:
     type: RollingUpdate
     rollingUpdate:
       maxSurge: 1
   template:
     metadata:
       labels:
         app: storage-replayer-metadata
       annotations:
         checksum/config: 59c200997e2837af6c79d357abf2b9d3887ddd9fa1da218e6867372137a6c12c
-        checksum/config_utils: 82ab9d2291625dd17a30267f551b45870420eefd4a90bb40f14412553c45556a
+        checksum/config_utils: 233f1b432787895386fcdfff598b35a77ca1f18d4a8f7f0136af55928674c9a9
     spec:
       affinity:
         nodeAffinity:
           requiredDuringSchedulingIgnoredDuringExecution:
             nodeSelectorTerms:
             - matchExpressions:
               - key: swh/replayer
                 operator: In
                 values:
                 - "true"
@@ -4534,21 +4639,21 @@
   strategy:
     type: RollingUpdate
     rollingUpdate:
       maxSurge: 1
   template:
     metadata:
       labels:
         app: storage-replayer-origin
       annotations:
         checksum/config: 8a444fbf8b876ca0413f40eced16d8f67f37e21f9d9af30dc0e9d230f99353f0
-        checksum/config_utils: 82ab9d2291625dd17a30267f551b45870420eefd4a90bb40f14412553c45556a
+        checksum/config_utils: 233f1b432787895386fcdfff598b35a77ca1f18d4a8f7f0136af55928674c9a9
     spec:
       affinity:
         nodeAffinity:
           requiredDuringSchedulingIgnoredDuringExecution:
             nodeSelectorTerms:
             - matchExpressions:
               - key: swh/replayer
                 operator: In
                 values:
                 - "true"
@@ -4674,21 +4779,21 @@
   strategy:
     type: RollingUpdate
     rollingUpdate:
       maxSurge: 1
   template:
     metadata:
       labels:
         app: storage-replayer-origin-visit
       annotations:
         checksum/config: 0232863ea6af9728905e510a2c7cda793b30e51282bb61e15951295b1b4ae5be
-        checksum/config_utils: 82ab9d2291625dd17a30267f551b45870420eefd4a90bb40f14412553c45556a
+        checksum/config_utils: 233f1b432787895386fcdfff598b35a77ca1f18d4a8f7f0136af55928674c9a9
     spec:
       affinity:
         nodeAffinity:
           requiredDuringSchedulingIgnoredDuringExecution:
             nodeSelectorTerms:
             - matchExpressions:
               - key: swh/replayer
                 operator: In
                 values:
                 - "true"
@@ -4814,21 +4919,21 @@
   strategy:
     type: RollingUpdate
     rollingUpdate:
       maxSurge: 1
   template:
     metadata:
       labels:
         app: storage-replayer-origin-visit-status
       annotations:
         checksum/config: 50ccfa4fef14a26d4e62b4a4a7e9548e96ecd33bff9ed38f4caee11ad4872f50
-        checksum/config_utils: 82ab9d2291625dd17a30267f551b45870420eefd4a90bb40f14412553c45556a
+        checksum/config_utils: 233f1b432787895386fcdfff598b35a77ca1f18d4a8f7f0136af55928674c9a9
     spec:
       affinity:
         nodeAffinity:
           requiredDuringSchedulingIgnoredDuringExecution:
             nodeSelectorTerms:
             - matchExpressions:
               - key: swh/replayer
                 operator: In
                 values:
                 - "true"
@@ -4954,21 +5059,21 @@
   strategy:
     type: RollingUpdate
     rollingUpdate:
       maxSurge: 1
   template:
     metadata:
       labels:
         app: storage-replayer-raw-extrinsic-metadata
       annotations:
         checksum/config: f7c02005d918fc96845fbe96ca5b80e02cbbe8db33c52304977fdf5775a7b39b
-        checksum/config_utils: 82ab9d2291625dd17a30267f551b45870420eefd4a90bb40f14412553c45556a
+        checksum/config_utils: 233f1b432787895386fcdfff598b35a77ca1f18d4a8f7f0136af55928674c9a9
     spec:
       affinity:
         nodeAffinity:
           requiredDuringSchedulingIgnoredDuringExecution:
             nodeSelectorTerms:
             - matchExpressions:
               - key: swh/replayer
                 operator: In
                 values:
                 - "true"
@@ -5094,21 +5199,21 @@
   strategy:
     type: RollingUpdate
     rollingUpdate:
       maxSurge: 1
   template:
     metadata:
       labels:
         app: storage-replayer-release
       annotations:
         checksum/config: 2080d9cc684b645df5f3408a379de43116e1cd92dda2b81e90d60dbeee8d4b7a
-        checksum/config_utils: 82ab9d2291625dd17a30267f551b45870420eefd4a90bb40f14412553c45556a
+        checksum/config_utils: 233f1b432787895386fcdfff598b35a77ca1f18d4a8f7f0136af55928674c9a9
     spec:
       affinity:
         nodeAffinity:
           requiredDuringSchedulingIgnoredDuringExecution:
             nodeSelectorTerms:
             - matchExpressions:
               - key: swh/replayer
                 operator: In
                 values:
                 - "true"
@@ -5234,21 +5339,21 @@
   strategy:
     type: RollingUpdate
     rollingUpdate:
       maxSurge: 1
   template:
     metadata:
       labels:
         app: storage-replayer-revision
       annotations:
         checksum/config: 576ef144c0f7a064b95e0584e23ed8e1bb49d8d54ca2314992484b79b549b116
-        checksum/config_utils: 82ab9d2291625dd17a30267f551b45870420eefd4a90bb40f14412553c45556a
+        checksum/config_utils: 233f1b432787895386fcdfff598b35a77ca1f18d4a8f7f0136af55928674c9a9
     spec:
       affinity:
         nodeAffinity:
           requiredDuringSchedulingIgnoredDuringExecution:
             nodeSelectorTerms:
             - matchExpressions:
               - key: swh/replayer
                 operator: In
                 values:
                 - "true"
@@ -5374,21 +5479,21 @@
   strategy:
     type: RollingUpdate
     rollingUpdate:
       maxSurge: 1
   template:
     metadata:
       labels:
         app: storage-replayer-skipped-content
       annotations:
         checksum/config: 7bb2dae23e587e50bb4d18ddbd484ea63585e1d728b9dfdcdd27fc62435cee7f
-        checksum/config_utils: 82ab9d2291625dd17a30267f551b45870420eefd4a90bb40f14412553c45556a
+        checksum/config_utils: 233f1b432787895386fcdfff598b35a77ca1f18d4a8f7f0136af55928674c9a9
     spec:
       affinity:
         nodeAffinity:
           requiredDuringSchedulingIgnoredDuringExecution:
             nodeSelectorTerms:
             - matchExpressions:
               - key: swh/replayer
                 operator: In
                 values:
                 - "true"
@@ -5514,21 +5619,21 @@
   strategy:
     type: RollingUpdate
     rollingUpdate:
       maxSurge: 1
   template:
     metadata:
       labels:
         app: storage-replayer-snapshot
       annotations:
         checksum/config: cddbe0bf9d91c24560eb3af9f843a00986a2d524dba0fdc5c111a6502caeed38
-        checksum/config_utils: 82ab9d2291625dd17a30267f551b45870420eefd4a90bb40f14412553c45556a
+        checksum/config_utils: 233f1b432787895386fcdfff598b35a77ca1f18d4a8f7f0136af55928674c9a9
     spec:
       affinity:
         nodeAffinity:
           requiredDuringSchedulingIgnoredDuringExecution:
             nodeSelectorTerms:
             - matchExpressions:
               - key: swh/replayer
                 operator: In
                 values:
                 - "true"
@@ -5655,21 +5760,21 @@
     type: RollingUpdate
     rollingUpdate:
       maxSurge: 1
   template:
     metadata:
       labels:
         app: storage-postgresql-read-only
       annotations:
         checksum/config: 557a29778d601193c743a35a7075315560447fc8740ba7922d5d52ce3f0c621e
         checksum/config-logging: cca6d0318bd776cd9bee0901e67e4db9fa401456f6f03f569c15468c5e62bea7
-        checksum/backend-utils: 82ab9d2291625dd17a30267f551b45870420eefd4a90bb40f14412553c45556a
+        checksum/backend-utils: 233f1b432787895386fcdfff598b35a77ca1f18d4a8f7f0136af55928674c9a9
         checksum/config-utils: d75ca13b805bce6a8ab59c8e24c938f2283108f6a79134f6e71db86308651dc6
     spec:
       affinity:
         nodeAffinity:
           requiredDuringSchedulingIgnoredDuringExecution:
             nodeSelectorTerms:
             - matchExpressions:
               - key: swh/storage
                 operator: In
                 values:
@@ -5820,21 +5925,21 @@
     type: RollingUpdate
     rollingUpdate:
       maxSurge: 1
   template:
     metadata:
       labels:
         app: storage-postgresql-read-write
       annotations:
         checksum/config: 7673b003223acd4a3a0d130516efd4f89740163d29c4800d53c9afe16b8d21a2
         checksum/config-logging: c9f05b677492d0f7443fc8193c82673ce3c550f351b82dcf616a247f7477fae0
-        checksum/backend-utils: 82ab9d2291625dd17a30267f551b45870420eefd4a90bb40f14412553c45556a
+        checksum/backend-utils: 233f1b432787895386fcdfff598b35a77ca1f18d4a8f7f0136af55928674c9a9
         checksum/config-utils: d75ca13b805bce6a8ab59c8e24c938f2283108f6a79134f6e71db86308651dc6
     spec:
       affinity:
         nodeAffinity:
           requiredDuringSchedulingIgnoredDuringExecution:
             nodeSelectorTerms:
             - matchExpressions:
               - key: swh/storage
                 operator: In
                 values:


------------- diff for environment staging namespace swh-cassandra -------------

--- /tmp/swh-chart.swh.lQqjoZXu/staging-swh-cassandra.before	2025-03-21 17:04:33.705104109 +0100
+++ /tmp/swh-chart.swh.lQqjoZXu/staging-swh-cassandra.after	2025-03-21 17:04:34.321080414 +0100
@@ -8875,21 +8875,21 @@
     fi
 
     graph_transposed_name=${GRAPH_NAME}-transposed.graph
     if [ -L ${DATASET_LOCATION}/${graph_transposed_name} ] || ! [ -f ${DATASET_LOCATION}/${graph_transposed_name} ]; then
       cp -v --remove-destination ${DATASET_SOURCE}/${graph_transposed_name} ${DATASET_LOCATION}/;
     fi
 
     # Finally, we make explicit the graph is ready
     touch ${WITNESS_FILE}
 
-  graph-wait-for-dataset.sh: |
+  wait-for-dataset.sh: |
     #!/usr/bin/env bash
     # Uses env variables WITNESS_FILE
     [ -z "${WITNESS_FILE}" ] && \
       echo "<WITNESS_FILE> env variable must be set" && exit 1
 
     while [ ! -f ${WITNESS_FILE} ]; do
         echo "${WITNESS_FILE} not present, wait for it to start the graph..."
         sleep $PERIOD
     done
 
@@ -8971,20 +8971,125 @@
         echo "${WITNESS_SOURCE_FILE} missing, waiting graph dataset installation..."
         sleep $PERIOD
     done
 
     # For old datasets missing a .ef or in the wrong format, this fails with
     # `Cannot map Elias-Fano pointer list .../graph.ef`. The solution is to
     # reindex the dataset
     swh graph reindex --ef ${DATASET_LOCATION}/${GRAPH_NAME} && \
       touch $WITNESS_REINDEX_FILE
 
+  provenance-fetch-datasets.sh: |
+    #!/usr/bin/env bash
+    [ -z "${WITNESS_FETCH_FILE}" ] && \
+      echo "<WITNESS_FETCH_FILE> env variable must be set" && exit 1
+    [ -z "${DATASET_VERSION}" ] && \
+      echo "<DATASET_VERSION> env variable must be set" && exit 1
+    [ -z "${PROVENANCE_PATH}" ] && \
+      echo "<PROVENANCE_PATH> env variable must be set" && exit 1
+    [ -z "${GRAPH_PATH}" ] && \
+      echo "<GRAPH_PATH> env variable must be set" && exit 1
+
+    [ -f ${WITNESS_FETCH_FILE} ] && \
+        echo "Datasets graph & provenance <${DATASET_VERSION}> already present. Skip." && \
+        exit 0
+
+    set -e
+
+    # Create destination paths
+    mkdir -p ${PROVENANCE_PATH} ${GRAPH_PATH}
+
+    echo "Fetching datasets..."
+
+    if [ ${PROVENANCE_DATASET_FULL} = true ]; then
+        # Retrieve all the provenance dataset
+        REFS=all
+    else
+        # This excludes revisions not targetted by a snapshot
+        # Ok to use for test purposes
+        REFS=heads
+    fi
+
+    URL_PROVENANCE="s3://softwareheritage/derived_datasets/${DATASET_VERSION}/provenance/${REFS}/"
+
+    CMD_GET="aws s3 cp --no-progress --no-sign-request"
+
+    echo "1. Fetching provenance dataset (parquet files)..."
+    ${CMD_GET} --recursive "${URL_PROVENANCE}" "${PROVENANCE_PATH}"
+    echo "1. Provenance datasets installed!"
+
+    echo "2. Fetching extra graph files..."
+    URL_GRAPH="s3://softwareheritage/graph/${DATASET_VERSION}/compressed"
+
+    mkdir -p "${GRAPH_PATH}"
+    for filename in graph.pthash graph.pthash.order graph.nodes.count.txt \
+                    graph.property.message.bin.zst \
+                    graph.property.message.offset.bin.zst \
+                    graph.property.tag_name.bin.zst \
+                    graph.property.tag_name.offset.bin.zst \
+                    graph.node2swhid.bin.zst graph.node2type.bin.zst; do
+        ${CMD_GET} "${URL_GRAPH}/${filename}" "${GRAPH_PATH}"
+    done
+    echo "2. Extra graph files installed!"
+
+    echo "3. Uncompressing graph files..."
+    set -x
+    # Uncompress the compressed graph *.zst files
+    for filepath in $(ls ${GRAPH_PATH}/*.zst); do
+        # Uncompress and delete the .zst file
+        [ -f "${filepath}" ] && unzstd --force --rm "${filepath}"
+    done
+    set +x
+    echo "3. Graph files uncompressed!"
+
+    # Make explicit the provenance datasets are fetched
+    touch ${WITNESS_FETCH_FILE}
+
+    echo "Provenance datasets installed!"
+
+  provenance-index-dataset.sh: |
+    #!/usr/bin/env bash
+    [ -z "${WITNESS_DATASETS_FILE}" ] && \
+      echo "<WITNESS_DATASETS_FILE> env variable must be set" && exit 1
+    [ -z "${WITNESS_INDEX_FILE}" ] && \
+      echo "<WITNESS_INDEX_FILE> env variable must be set" && exit 1
+    [ -z "${PERIOD}" ] && \
+      echo "<PERIOD> env variable must be set" && exit 1
+    [ -z "${PROVENANCE_PATH}" ] && \
+      echo "<PROVENANCE_PATH> env variable must be set" && exit 1
+
+    [ -f ${WITNESS_INDEX_FILE} ] && echo "Provenance already indexed, do nothing." && \
+      exit 0
+
+    set -eu
+
+    # Let's wait for the dataset installation
+    while [ ! -f "${WITNESS_DATASETS_FILE}" ]; do
+        echo "${WITNESS_DATASETS_FILE} missing, waiting provenance dataset installation..."
+        sleep $PERIOD
+    done
+
+    echo "Datasets file installed, build provenance dataset indexes..."
+
+    echo "provenance path: $PROVENANCE_PATH"
+    set -x
+
+    # To make the query faster, the provenance needs to build index out of the
+    # current dataset files. We store the output indexes in the same path as
+    # the dataset.
+    swh-provenance-index \
+      --database file://${PROVENANCE_PATH} && \
+      touch "${WITNESS_INDEX_FILE}" && \
+      echo "Provenance indexes built!" || \
+
+    echo "Provenance indexes failed!"
+
   initialize-search-backend.sh: |
     #!/usr/bin/env bash
 
     set -eux
 
     # Uses internally the environment variable SWH_CONFIG_FILENAME
     swh search initialize
   register-task-types.sh: |
     #!/usr/bin/env bash
 
@@ -11668,21 +11773,21 @@
     type: RollingUpdate
     rollingUpdate:
       maxSurge: 1
   template:
     metadata:
       labels:
         app: graph-grpc-python3k
       annotations:
         checksum/config: b73b013412ed4679009823fcd1967f46c6c74ce0cac466277d96a7303b1ee5e9
         checksum/config-utils: 13a26f6add17e96ce01550153c77dcd48de60241a3f4db3c93d5467234be2a7f
-        checksum/backend-utils: 498bb7b35f4e2d6996251c8615bff7661ffa9981f2f69734acb80824fc37d2b1
+        checksum/backend-utils: bfb79c03c5f47eacbf1cedde17860a255223193cdeb60f9e3455084af3bab88c
     spec:
       nodeSelector:
         kubernetes.io/hostname: rancher-node-staging-rke2-metal01
       affinity:
         nodeAffinity:
           requiredDuringSchedulingIgnoredDuringExecution:
             nodeSelectorTerms:
             - matchExpressions:
               - key: swh/graph
                 operator: In
@@ -11737,21 +11842,21 @@
         
           - name: graph-python3k-persistent
             mountPath: /srv/dataset
             readOnly: false
         
         
         - name: wait-for-dataset
           image: container-registry.softwareheritage.org/swh/infra/swh-apps/utils:20250211.1
           imagePullPolicy: IfNotPresent
           command:
-          - /entrypoints/graph-wait-for-dataset.sh
+          - /entrypoints/wait-for-dataset.sh
           env:
             - name: WITNESS_FILE
               value: /srv/graph/2021-03-23-popular-3k-python/compressed/.graph-is-initialized
             - name: DATASET_LOCATION
               value: /srv/graph/2021-03-23-popular-3k-python/compressed
             - name: PERIOD
               value: "3"
           volumeMounts:
           - name: backend-utils
             mountPath: /entrypoints
@@ -11882,21 +11987,21 @@
     type: RollingUpdate
     rollingUpdate:
       maxSurge: 1
   template:
     metadata:
       labels:
         app: graph-rpc-python3k
       annotations:
         checksum/config: cd2257ef14a7e6adb8b613ab147d995cfc4b4f250daba77c5b5cdbd63dcb1a35
         checksum/config-utils: 13a26f6add17e96ce01550153c77dcd48de60241a3f4db3c93d5467234be2a7f
-        checksum/backend-utils: 498bb7b35f4e2d6996251c8615bff7661ffa9981f2f69734acb80824fc37d2b1
+        checksum/backend-utils: bfb79c03c5f47eacbf1cedde17860a255223193cdeb60f9e3455084af3bab88c
     spec:
       affinity:
         nodeAffinity:
           requiredDuringSchedulingIgnoredDuringExecution:
             nodeSelectorTerms:
             - matchExpressions:
               - key: swh/graph
                 operator: In
                 values:
                 - "true"
@@ -12129,21 +12234,21 @@
     type: RollingUpdate
     rollingUpdate:
       maxSurge: 1
   template:
     metadata:
       labels:
         app: indexer-storage-rpc
       annotations:
         checksum/config: aba89c9cffc506b56207b7cc0377f0183f9e75493e42ee5a8e199dbc7d573ade
         checksum/config-logging: 7d9616b680a77c6ec7ba4c1a1c0f3fbf6343c4fc132847f8f4e313d965014749
-        checksum/backend-utils: 498bb7b35f4e2d6996251c8615bff7661ffa9981f2f69734acb80824fc37d2b1
+        checksum/backend-utils: bfb79c03c5f47eacbf1cedde17860a255223193cdeb60f9e3455084af3bab88c
     spec:
       affinity:
         nodeAffinity:
           requiredDuringSchedulingIgnoredDuringExecution:
             nodeSelectorTerms:
             - matchExpressions:
               - key: swh/rpc
                 operator: In
                 values:
                 - "true"
@@ -19984,20 +20089,21 @@
     rollingUpdate:
       maxSurge: 1
   template:
     metadata:
       labels:
         app: provenance-graph-granet
       annotations:
         checksum/config: e466b9f13c5124eedcaa557de44e64caf7c7ff4aa9ab5dab35b7ceede6a09568
         checksum/config-logging: ddcd27d991938c46f4fc0ad7ee028cb3005f186b3db022596c9ae94363881e4f
         checksum/config-utils: 13a26f6add17e96ce01550153c77dcd48de60241a3f4db3c93d5467234be2a7f
+        checksum/backend-utils: bfb79c03c5f47eacbf1cedde17860a255223193cdeb60f9e3455084af3bab88c
     spec:
       affinity:
         nodeAffinity:
           requiredDuringSchedulingIgnoredDuringExecution:
             nodeSelectorTerms:
             - matchExpressions:
               - key: swh/rpc
                 operator: In
                 values:
                 - "true"
@@ -20017,21 +20123,21 @@
             mountPath: /etc/swh/configuration-template
           - name: config-utils
             mountPath: /entrypoints
             readOnly: true
       containers:
         - name: provenance-graph-granet
           resources:
             requests:
               memory: 512Mi
               cpu: 500m
-          image: container-registry.softwareheritage.org/swh/infra/swh-apps/provenance:20250319.1
+          image: container-registry.softwareheritage.org/swh/infra/swh-apps/provenance:20250321.1
           imagePullPolicy: IfNotPresent
           ports:
             - containerPort: 5014
               name: rpc
           readinessProbe:
             httpGet:
               path: /
               port: rpc
             initialDelaySeconds: 15
             failureThreshold: 30
@@ -20040,76 +20146,88 @@
             tcpSocket:
               port: rpc
             initialDelaySeconds: 10
             periodSeconds: 5
           command:
           - /bin/bash
           args:
           - -c
           - /opt/swh/entrypoint.sh
           env:
+            - name: PROVENANCE_TYPE
+              value: rpc
+            - name: PORT
+              value: "5014"
             - name: WORKERS
               value: "4"
             - name: THREADS
               value: "1"
             - name: TIMEOUT
               value: "60"
+            - name: SWH_CONFIG_FILENAME
+              value: /etc/swh/config.yml
+            - name: SWH_LOG_CONFIG_JSON
+              value: /etc/swh/logging/logging-gunicorn.json
+            - name: STATSD_SERVICE_TYPE
+              value: provenance-graph-granet
             - name: STATSD_HOST
               value: prometheus-statsd-exporter
             - name: STATSD_PORT
               value: "9125"
             - name: STATSD_TAGS
               value: deployment:provenance-graph-granet
-            - name: STATSD_SERVICE_TYPE
-              value: provenance-graph-granet
             - name: SWH_LOG_LEVEL
-              value: "INFO"
-            - name: SWH_LOG_CONFIG_JSON
-              value: /etc/swh/logging/logging-gunicorn.json
+              value: INFO
             - name: SWH_SENTRY_ENVIRONMENT
               value: staging
             - name: SWH_MAIN_PACKAGE
               value: swh.provenance
             - name: SWH_SENTRY_DSN
               valueFrom:
                 secretKeyRef:
                   name: common-secrets
                   key: provenance-sentry-dsn
                   # 'name' secret should exist & include key
                   # if the setting doesn't exist, sentry pushes will be disabled
                   optional: true
             - name: SWH_SENTRY_DISABLE_LOGGING_EVENTS
               value: "true"
           volumeMounts:
           - name: configuration
             mountPath: /etc/swh
           - name: configuration-logging
             mountPath: /etc/swh/logging
+          
       volumes:
       - name: configuration
         emptyDir: {}
       - name: configuration-template
         configMap:
           name: provenance-graph-granet-configuration-template
           items:
           - key: "config.yml.template"
             path: "config.yml.template"
       - name: configuration-logging
         configMap:
           name: provenance-graph-granet-configuration-logging
           items:
           - key: "logging-gunicorn.json"
             path: "logging-gunicorn.json"
+      
       - name: config-utils
         configMap:
           name: config-utils
           defaultMode: 0555
+      - name: backend-utils
+        configMap:
+          name: backend-utils
+          defaultMode: 0555
 ---
 # Source: swh/templates/scheduler/extra-services-deployment.yaml
 apiVersion: apps/v1
 kind: Deployment
 metadata:
   namespace: swh-cassandra
   name: scheduler-listener
   labels:
     app: scheduler-listener
 spec:
@@ -21523,21 +21641,21 @@
     rollingUpdate:
       maxSurge: 1
   template:
     metadata:
       labels:
         app: search-rpc
       annotations:
         checksum/config: cb3458d5372d0f78a475da3ac1b4f474cebe95188ad4f0931eba0a7c9657122e
         checksum/config-logging: 7bffbc6ce2cb11d88208ef0c5f1d8e6822659c361717afb51dcf0f4da02fe1f7
         checksum/config-utils: 13a26f6add17e96ce01550153c77dcd48de60241a3f4db3c93d5467234be2a7f
-        checksum/backend-utils: 498bb7b35f4e2d6996251c8615bff7661ffa9981f2f69734acb80824fc37d2b1
+        checksum/backend-utils: bfb79c03c5f47eacbf1cedde17860a255223193cdeb60f9e3455084af3bab88c
     spec:
       affinity:
         nodeAffinity:
           requiredDuringSchedulingIgnoredDuringExecution:
             nodeSelectorTerms:
             - matchExpressions:
               - key: swh/rpc
                 operator: In
                 values:
                 - "true"
@@ -21679,21 +21797,21 @@
     rollingUpdate:
       maxSurge: 1
   template:
     metadata:
       labels:
         app: search-static-rpc
       annotations:
         checksum/config: 7a5b75d976f3579a1c831aa35c012f5f15ff2e2b488fe1ae7f1da9ee4bb8ca3e
         checksum/config-logging: bc2025f41b3eb8aa28b66033b96fdc1cb963f5c01fe33b2417c2378f715dbc32
         checksum/config-utils: 13a26f6add17e96ce01550153c77dcd48de60241a3f4db3c93d5467234be2a7f
-        checksum/backend-utils: 498bb7b35f4e2d6996251c8615bff7661ffa9981f2f69734acb80824fc37d2b1
+        checksum/backend-utils: bfb79c03c5f47eacbf1cedde17860a255223193cdeb60f9e3455084af3bab88c
     spec:
       affinity:
         nodeAffinity:
           requiredDuringSchedulingIgnoredDuringExecution:
             nodeSelectorTerms:
             - matchExpressions:
               - key: swh/rpc
                 operator: In
                 values:
                 - "true"
@@ -21879,21 +21997,21 @@
   strategy:
     type: RollingUpdate
     rollingUpdate:
       maxSurge: 1
   template:
     metadata:
       labels:
         app: storage-replayer-content
       annotations:
         checksum/config: 524d18d676bdcddf14d63ac8397a0e82dfc86601e69a4684fef07cfaa6953fd8
-        checksum/config_utils: 498bb7b35f4e2d6996251c8615bff7661ffa9981f2f69734acb80824fc37d2b1
+        checksum/config_utils: bfb79c03c5f47eacbf1cedde17860a255223193cdeb60f9e3455084af3bab88c
     spec:
       affinity:
         nodeAffinity:
           requiredDuringSchedulingIgnoredDuringExecution:
             nodeSelectorTerms:
             - matchExpressions:
               - key: swh/replayer
                 operator: In
                 values:
                 - "true"
@@ -22016,21 +22134,21 @@
   strategy:
     type: RollingUpdate
     rollingUpdate:
       maxSurge: 1
   template:
     metadata:
       labels:
         app: storage-replayer-directory
       annotations:
         checksum/config: 0cacef7c155ca3e3df82cb77ae2b6bb8bb9ac49b19209f2e065c96bc4ba76ef8
-        checksum/config_utils: 498bb7b35f4e2d6996251c8615bff7661ffa9981f2f69734acb80824fc37d2b1
+        checksum/config_utils: bfb79c03c5f47eacbf1cedde17860a255223193cdeb60f9e3455084af3bab88c
     spec:
       affinity:
         nodeAffinity:
           requiredDuringSchedulingIgnoredDuringExecution:
             nodeSelectorTerms:
             - matchExpressions:
               - key: swh/replayer
                 operator: In
                 values:
                 - "true"
@@ -22153,21 +22271,21 @@
   strategy:
     type: RollingUpdate
     rollingUpdate:
       maxSurge: 1
   template:
     metadata:
       labels:
         app: storage-replayer-extid
       annotations:
         checksum/config: a05be94c147c42ea8e51b234f50f2049496d777e0a00887817305ca00e31b924
-        checksum/config_utils: 498bb7b35f4e2d6996251c8615bff7661ffa9981f2f69734acb80824fc37d2b1
+        checksum/config_utils: bfb79c03c5f47eacbf1cedde17860a255223193cdeb60f9e3455084af3bab88c
     spec:
       affinity:
         nodeAffinity:
           requiredDuringSchedulingIgnoredDuringExecution:
             nodeSelectorTerms:
             - matchExpressions:
               - key: swh/replayer
                 operator: In
                 values:
                 - "true"
@@ -22290,21 +22408,21 @@
   strategy:
     type: RollingUpdate
     rollingUpdate:
       maxSurge: 1
   template:
     metadata:
       labels:
         app: storage-replayer-metadata
       annotations:
         checksum/config: d237c40e4c2887e8c5268cd357ec129e5ef9d1e9d0fc6ef9603878a0a4f18acf
-        checksum/config_utils: 498bb7b35f4e2d6996251c8615bff7661ffa9981f2f69734acb80824fc37d2b1
+        checksum/config_utils: bfb79c03c5f47eacbf1cedde17860a255223193cdeb60f9e3455084af3bab88c
     spec:
       affinity:
         nodeAffinity:
           requiredDuringSchedulingIgnoredDuringExecution:
             nodeSelectorTerms:
             - matchExpressions:
               - key: swh/replayer
                 operator: In
                 values:
                 - "true"
@@ -22427,21 +22545,21 @@
   strategy:
     type: RollingUpdate
     rollingUpdate:
       maxSurge: 1
   template:
     metadata:
       labels:
         app: storage-replayer-origin
       annotations:
         checksum/config: 6fb97eb4a6d4f4fa34cb922c7c7a2d54079ade1818d0146e5c7bb9d299b4fd34
-        checksum/config_utils: 498bb7b35f4e2d6996251c8615bff7661ffa9981f2f69734acb80824fc37d2b1
+        checksum/config_utils: bfb79c03c5f47eacbf1cedde17860a255223193cdeb60f9e3455084af3bab88c
     spec:
       affinity:
         nodeAffinity:
           requiredDuringSchedulingIgnoredDuringExecution:
             nodeSelectorTerms:
             - matchExpressions:
               - key: swh/replayer
                 operator: In
                 values:
                 - "true"
@@ -22564,21 +22682,21 @@
   strategy:
     type: RollingUpdate
     rollingUpdate:
       maxSurge: 1
   template:
     metadata:
       labels:
         app: storage-replayer-origin-visit
       annotations:
         checksum/config: 8feb5282dc24309e969ad447d420801a0cd826820817bcef0862df9805aea973
-        checksum/config_utils: 498bb7b35f4e2d6996251c8615bff7661ffa9981f2f69734acb80824fc37d2b1
+        checksum/config_utils: bfb79c03c5f47eacbf1cedde17860a255223193cdeb60f9e3455084af3bab88c
     spec:
       affinity:
         nodeAffinity:
           requiredDuringSchedulingIgnoredDuringExecution:
             nodeSelectorTerms:
             - matchExpressions:
               - key: swh/replayer
                 operator: In
                 values:
                 - "true"
@@ -22701,21 +22819,21 @@
   strategy:
     type: RollingUpdate
     rollingUpdate:
       maxSurge: 1
   template:
     metadata:
       labels:
         app: storage-replayer-origin-visit-status
       annotations:
         checksum/config: a345553096d29938201239f63e9d271d80fb1e516a63745da5c165060b2de2b8
-        checksum/config_utils: 498bb7b35f4e2d6996251c8615bff7661ffa9981f2f69734acb80824fc37d2b1
+        checksum/config_utils: bfb79c03c5f47eacbf1cedde17860a255223193cdeb60f9e3455084af3bab88c
     spec:
       affinity:
         nodeAffinity:
           requiredDuringSchedulingIgnoredDuringExecution:
             nodeSelectorTerms:
             - matchExpressions:
               - key: swh/replayer
                 operator: In
                 values:
                 - "true"
@@ -22838,21 +22956,21 @@
   strategy:
     type: RollingUpdate
     rollingUpdate:
       maxSurge: 1
   template:
     metadata:
       labels:
         app: storage-replayer-raw-extrinsic-metadata
       annotations:
         checksum/config: 43306ba69c4c8557fbd4bfca185bd5edfb8f70fc84bee6667019bd333f602054
-        checksum/config_utils: 498bb7b35f4e2d6996251c8615bff7661ffa9981f2f69734acb80824fc37d2b1
+        checksum/config_utils: bfb79c03c5f47eacbf1cedde17860a255223193cdeb60f9e3455084af3bab88c
     spec:
       affinity:
         nodeAffinity:
           requiredDuringSchedulingIgnoredDuringExecution:
             nodeSelectorTerms:
             - matchExpressions:
               - key: swh/replayer
                 operator: In
                 values:
                 - "true"
@@ -22975,21 +23093,21 @@
   strategy:
     type: RollingUpdate
     rollingUpdate:
       maxSurge: 1
   template:
     metadata:
       labels:
         app: storage-replayer-release
       annotations:
         checksum/config: eb1a796fdffbeba1776f6709ddd4855d41e82fd600284d8095e95fe1419c651d
-        checksum/config_utils: 498bb7b35f4e2d6996251c8615bff7661ffa9981f2f69734acb80824fc37d2b1
+        checksum/config_utils: bfb79c03c5f47eacbf1cedde17860a255223193cdeb60f9e3455084af3bab88c
     spec:
       affinity:
         nodeAffinity:
           requiredDuringSchedulingIgnoredDuringExecution:
             nodeSelectorTerms:
             - matchExpressions:
               - key: swh/replayer
                 operator: In
                 values:
                 - "true"
@@ -23112,21 +23230,21 @@
   strategy:
     type: RollingUpdate
     rollingUpdate:
       maxSurge: 1
   template:
     metadata:
       labels:
         app: storage-replayer-revision
       annotations:
         checksum/config: cf1bf19391e5330e49b63bd2a1baf795b6fc08a8229fa4e774ca960fb05eaaf9
-        checksum/config_utils: 498bb7b35f4e2d6996251c8615bff7661ffa9981f2f69734acb80824fc37d2b1
+        checksum/config_utils: bfb79c03c5f47eacbf1cedde17860a255223193cdeb60f9e3455084af3bab88c
     spec:
       affinity:
         nodeAffinity:
           requiredDuringSchedulingIgnoredDuringExecution:
             nodeSelectorTerms:
             - matchExpressions:
               - key: swh/replayer
                 operator: In
                 values:
                 - "true"
@@ -23249,21 +23367,21 @@
   strategy:
     type: RollingUpdate
     rollingUpdate:
       maxSurge: 1
   template:
     metadata:
       labels:
         app: storage-replayer-skipped-content
       annotations:
         checksum/config: 4213e5e6ff3ae51225dc1ec6f9f48a8ee3f206e6b8e69fd61dfe6335437d9bf9
-        checksum/config_utils: 498bb7b35f4e2d6996251c8615bff7661ffa9981f2f69734acb80824fc37d2b1
+        checksum/config_utils: bfb79c03c5f47eacbf1cedde17860a255223193cdeb60f9e3455084af3bab88c
     spec:
       affinity:
         nodeAffinity:
           requiredDuringSchedulingIgnoredDuringExecution:
             nodeSelectorTerms:
             - matchExpressions:
               - key: swh/replayer
                 operator: In
                 values:
                 - "true"
@@ -23386,21 +23504,21 @@
   strategy:
     type: RollingUpdate
     rollingUpdate:
       maxSurge: 1
   template:
     metadata:
       labels:
         app: storage-replayer-snapshot
       annotations:
         checksum/config: d10ea86ff29eebbdd9fbf159d297fed599fe1833a8140149f0229df7eaba1b34
-        checksum/config_utils: 498bb7b35f4e2d6996251c8615bff7661ffa9981f2f69734acb80824fc37d2b1
+        checksum/config_utils: bfb79c03c5f47eacbf1cedde17860a255223193cdeb60f9e3455084af3bab88c
     spec:
       affinity:
         nodeAffinity:
           requiredDuringSchedulingIgnoredDuringExecution:
             nodeSelectorTerms:
             - matchExpressions:
               - key: swh/replayer
                 operator: In
                 values:
                 - "true"
@@ -23524,21 +23642,21 @@
     type: RollingUpdate
     rollingUpdate:
       maxSurge: 1
   template:
     metadata:
       labels:
         app: storage-cassandra
       annotations:
         checksum/config: b840eaf8faafacbf7f7f08c78e69c3b7028b992f5681acb5de76b177f0a2b3a9
         checksum/config-logging: 2f7a56936b194188f70175c52dc180320fcc071e5c110562a9f031116fadefd2
-        checksum/backend-utils: 498bb7b35f4e2d6996251c8615bff7661ffa9981f2f69734acb80824fc37d2b1
+        checksum/backend-utils: bfb79c03c5f47eacbf1cedde17860a255223193cdeb60f9e3455084af3bab88c
         checksum/config-utils: 13a26f6add17e96ce01550153c77dcd48de60241a3f4db3c93d5467234be2a7f
     spec:
       affinity:
         nodeAffinity:
           requiredDuringSchedulingIgnoredDuringExecution:
             nodeSelectorTerms:
             - matchExpressions:
               - key: swh/storage
                 operator: In
                 values:
@@ -23695,21 +23813,21 @@
     type: RollingUpdate
     rollingUpdate:
       maxSurge: 1
   template:
     metadata:
       labels:
         app: storage-cassandra-read-only
       annotations:
         checksum/config: 5fe1511b81d6079cf97c2918094816926a283f83778bec40e117a7100441a4c9
         checksum/config-logging: 7403d71b4a2e4da28cc9c1af0b9d022e85bf0b5ffcb738dc9f2b6dcfa3e14456
-        checksum/backend-utils: 498bb7b35f4e2d6996251c8615bff7661ffa9981f2f69734acb80824fc37d2b1
+        checksum/backend-utils: bfb79c03c5f47eacbf1cedde17860a255223193cdeb60f9e3455084af3bab88c
         checksum/config-utils: 13a26f6add17e96ce01550153c77dcd48de60241a3f4db3c93d5467234be2a7f
     spec:
       affinity:
         nodeAffinity:
           requiredDuringSchedulingIgnoredDuringExecution:
             nodeSelectorTerms:
             - matchExpressions:
               - key: swh/storage
                 operator: In
                 values:


------------- diff for environment staging namespace next-version -------------

--- /tmp/swh-chart.swh.lQqjoZXu/staging-next-version.before	2025-03-21 17:04:33.857098261 +0100
+++ /tmp/swh-chart.swh.lQqjoZXu/staging-next-version.after	2025-03-21 17:04:34.469074721 +0100
@@ -4005,21 +4005,21 @@
     fi
 
     graph_transposed_name=${GRAPH_NAME}-transposed.graph
     if [ -L ${DATASET_LOCATION}/${graph_transposed_name} ] || ! [ -f ${DATASET_LOCATION}/${graph_transposed_name} ]; then
       cp -v --remove-destination ${DATASET_SOURCE}/${graph_transposed_name} ${DATASET_LOCATION}/;
     fi
 
     # Finally, we make explicit the graph is ready
     touch ${WITNESS_FILE}
 
-  graph-wait-for-dataset.sh: |
+  wait-for-dataset.sh: |
     #!/usr/bin/env bash
     # Uses env variables WITNESS_FILE
     [ -z "${WITNESS_FILE}" ] && \
       echo "<WITNESS_FILE> env variable must be set" && exit 1
 
     while [ ! -f ${WITNESS_FILE} ]; do
         echo "${WITNESS_FILE} not present, wait for it to start the graph..."
         sleep $PERIOD
     done
 
@@ -4101,20 +4101,125 @@
         echo "${WITNESS_SOURCE_FILE} missing, waiting graph dataset installation..."
         sleep $PERIOD
     done
 
     # For old datasets missing a .ef or in the wrong format, this fails with
     # `Cannot map Elias-Fano pointer list .../graph.ef`. The solution is to
     # reindex the dataset
     swh graph reindex --ef ${DATASET_LOCATION}/${GRAPH_NAME} && \
       touch $WITNESS_REINDEX_FILE
 
+  provenance-fetch-datasets.sh: |
+    #!/usr/bin/env bash
+    [ -z "${WITNESS_FETCH_FILE}" ] && \
+      echo "<WITNESS_FETCH_FILE> env variable must be set" && exit 1
+    [ -z "${DATASET_VERSION}" ] && \
+      echo "<DATASET_VERSION> env variable must be set" && exit 1
+    [ -z "${PROVENANCE_PATH}" ] && \
+      echo "<PROVENANCE_PATH> env variable must be set" && exit 1
+    [ -z "${GRAPH_PATH}" ] && \
+      echo "<GRAPH_PATH> env variable must be set" && exit 1
+
+    [ -f ${WITNESS_FETCH_FILE} ] && \
+        echo "Datasets graph & provenance <${DATASET_VERSION}> already present. Skip." && \
+        exit 0
+
+    set -e
+
+    # Create destination paths
+    mkdir -p ${PROVENANCE_PATH} ${GRAPH_PATH}
+
+    echo "Fetching datasets..."
+
+    if [ ${PROVENANCE_DATASET_FULL} = true ]; then
+        # Retrieve all the provenance dataset
+        REFS=all
+    else
+        # This excludes revisions not targetted by a snapshot
+        # Ok to use for test purposes
+        REFS=heads
+    fi
+
+    URL_PROVENANCE="s3://softwareheritage/derived_datasets/${DATASET_VERSION}/provenance/${REFS}/"
+
+    CMD_GET="aws s3 cp --no-progress --no-sign-request"
+
+    echo "1. Fetching provenance dataset (parquet files)..."
+    ${CMD_GET} --recursive "${URL_PROVENANCE}" "${PROVENANCE_PATH}"
+    echo "1. Provenance datasets installed!"
+
+    echo "2. Fetching extra graph files..."
+    URL_GRAPH="s3://softwareheritage/graph/${DATASET_VERSION}/compressed"
+
+    mkdir -p "${GRAPH_PATH}"
+    for filename in graph.pthash graph.pthash.order graph.nodes.count.txt \
+                    graph.property.message.bin.zst \
+                    graph.property.message.offset.bin.zst \
+                    graph.property.tag_name.bin.zst \
+                    graph.property.tag_name.offset.bin.zst \
+                    graph.node2swhid.bin.zst graph.node2type.bin.zst; do
+        ${CMD_GET} "${URL_GRAPH}/${filename}" "${GRAPH_PATH}"
+    done
+    echo "2. Extra graph files installed!"
+
+    echo "3. Uncompressing graph files..."
+    set -x
+    # Uncompress the compressed graph *.zst files
+    for filepath in $(ls ${GRAPH_PATH}/*.zst); do
+        # Uncompress and delete the .zst file
+        [ -f "${filepath}" ] && unzstd --force --rm "${filepath}"
+    done
+    set +x
+    echo "3. Graph files uncompressed!"
+
+    # Make explicit the provenance datasets are fetched
+    touch ${WITNESS_FETCH_FILE}
+
+    echo "Provenance datasets installed!"
+
+  provenance-index-dataset.sh: |
+    #!/usr/bin/env bash
+    [ -z "${WITNESS_DATASETS_FILE}" ] && \
+      echo "<WITNESS_DATASETS_FILE> env variable must be set" && exit 1
+    [ -z "${WITNESS_INDEX_FILE}" ] && \
+      echo "<WITNESS_INDEX_FILE> env variable must be set" && exit 1
+    [ -z "${PERIOD}" ] && \
+      echo "<PERIOD> env variable must be set" && exit 1
+    [ -z "${PROVENANCE_PATH}" ] && \
+      echo "<PROVENANCE_PATH> env variable must be set" && exit 1
+
+    [ -f ${WITNESS_INDEX_FILE} ] && echo "Provenance already indexed, do nothing." && \
+      exit 0
+
+    set -eu
+
+    # Let's wait for the dataset installation
+    while [ ! -f "${WITNESS_DATASETS_FILE}" ]; do
+        echo "${WITNESS_DATASETS_FILE} missing, waiting provenance dataset installation..."
+        sleep $PERIOD
+    done
+
+    echo "Datasets file installed, build provenance dataset indexes..."
+
+    echo "provenance path: $PROVENANCE_PATH"
+    set -x
+
+    # To make the query faster, the provenance needs to build index out of the
+    # current dataset files. We store the output indexes in the same path as
+    # the dataset.
+    swh-provenance-index \
+      --database file://${PROVENANCE_PATH} && \
+      touch "${WITNESS_INDEX_FILE}" && \
+      echo "Provenance indexes built!" || \
+
+    echo "Provenance indexes failed!"
+
   initialize-search-backend.sh: |
     #!/usr/bin/env bash
 
     set -eux
 
     # Uses internally the environment variable SWH_CONFIG_FILENAME
     swh search initialize
   register-task-types.sh: |
     #!/usr/bin/env bash
 
@@ -4669,23 +4774,22 @@
       enable_requests_retry: true
       url: http://storage-ro-cassandra:5002
   
     corner_ribbon_text: StagingNextVersion
     show_corner_ribbon: "true"
     search:
       cls: remote
       enable_requests_retry: true
       url: http://search-rpc:5010
     provenance:
-      cls: remote
-      enable_requests_retry: true
-      url: http://webapp-provenance-ingress-next-version
+      cls: grpc
+      url: provenance-grpc-next-version-ingress:80
     scheduler:
       cls: remote
       url: http://scheduler-rpc:5008
     vault:
       cls: remote
       enable_requests_retry: true
       url: http://vault-rpc:5005
     graph:
       max_edges:
         anonymous: 1000
@@ -4803,20 +4907,37 @@
         - kafka-cluster-kafka-brokers:9092
       auto_offset_reset: latest
       group_id: staging-next-version-archive-webhooks
       object_types:
       - origin_visit_status
 ---
 # Source: swh/templates/volumes/persistent-volume-claims.yaml
 apiVersion: v1
 kind: PersistentVolumeClaim
 metadata:
+  name: provenance-popular-persistent-pvc
+  namespace: swh-cassandra-next-version
+  labels:
+    app: provenance-grpc
+spec:
+  accessModes:
+  - ReadWriteOnce
+  resources:
+    requests:
+      storage: 1Gi
+  storageClassName: local-persistent
+  volumeMode: Filesystem
+---
+# Source: swh/templates/volumes/persistent-volume-claims.yaml
+apiVersion: v1
+kind: PersistentVolumeClaim
+metadata:
   name: swh-graph-grpc-dataset-example-pvc
   namespace: swh-cassandra-next-version
   labels:
     app: graph-grpc-example
 spec:
   accessModes:
   - ReadWriteOnce
   resources:
     requests:
       storage: 1Gi
@@ -5015,20 +5136,30 @@
   name: webapp-provenance-ingress-next-version
   namespace: swh-cassandra-next-version
 spec:
   type: ExternalName
   externalName: archive-staging-rke2-ingress-nginx-controller.ingress-nginx.svc.cluster.local
 ---
 # Source: swh/templates/external-services/cname.yaml
 apiVersion: v1
 kind: Service
 metadata:
+  name: provenance-grpc-next-version-ingress
+  namespace: swh-cassandra-next-version
+spec:
+  type: ExternalName
+  externalName: archive-staging-rke2-ingress-nginx-controller.ingress-nginx.svc.cluster.local
+---
+# Source: swh/templates/external-services/cname.yaml
+apiVersion: v1
+kind: Service
+metadata:
   name: graph-rpc-ingress
   namespace: swh-cassandra-next-version
 spec:
   type: ExternalName
   externalName: archive-staging-rke2-ingress-nginx-controller.ingress-nginx.svc.cluster.local
 ---
 # Source: swh/templates/external-services/cname.yaml
 apiVersion: v1
 kind: Service
 metadata:
@@ -5231,20 +5362,37 @@
     app: provenance-graph-granet
 spec:
   type: ClusterIP
   selector:
     app: provenance-graph-granet
   ports:
     - port: 5014
       targetPort: 5014
       name: rpc
 ---
+# Source: swh/templates/provenance/service.yaml
+apiVersion: v1
+kind: Service
+metadata:
+  name: provenance-grpc
+  namespace: swh-cassandra-next-version
+  labels:
+    app: provenance-grpc
+spec:
+  type: ClusterIP
+  selector:
+    app: provenance-grpc
+  ports:
+    - port: 50141
+      targetPort: 50141
+      name: grpc
+---
 # Source: swh/templates/scheduler/rpc-service.yaml
 apiVersion: v1
 kind: Service
 metadata:
   name: scheduler-rpc
   namespace: swh-cassandra-next-version
   labels:
     app: scheduler-rpc
 spec:
   type: ClusterIP
@@ -6052,21 +6200,21 @@
     type: RollingUpdate
     rollingUpdate:
       maxSurge: 1
   template:
     metadata:
       labels:
         app: graph-grpc-example
       annotations:
         checksum/config: e29fc7dba6ab2f3d519be71fcac3361e63dfcbb4b3655a0a78439168961022bb
         checksum/config-utils: 94d255131467f84bef964a4c72b2b792c5ebaf711bb1c77829d7cd1007a8ac22
-        checksum/backend-utils: 3d2301f0fc8b4715e380acef12da66bdd16981f91351f1269a057ac022babc5a
+        checksum/backend-utils: 979dcd12a9ecdb43f1c4c9191012acbc31c14afa348c1b91e8dfb2aa7d105fae
     spec:
       nodeSelector:
         kubernetes.io/hostname: rancher-node-staging-rke2-metal01
       affinity:
         nodeAffinity:
           requiredDuringSchedulingIgnoredDuringExecution:
             nodeSelectorTerms:
             - matchExpressions:
               - key: swh/graph
                 operator: In
@@ -6121,21 +6269,21 @@
         
           - name: swh-graph-grpc-inmemory
             mountPath: /srv/graph
             readOnly: false
         
         
         - name: wait-for-dataset
           image: container-registry.softwareheritage.org/swh/infra/swh-apps/utils:20250211.1
           imagePullPolicy: IfNotPresent
           command:
-          - /entrypoints/graph-wait-for-dataset.sh
+          - /entrypoints/wait-for-dataset.sh
           env:
             - name: WITNESS_FILE
               value: /srv/graph/test/compressed/.graph-is-initialized
             - name: DATASET_LOCATION
               value: /srv/graph/test/compressed
             - name: PERIOD
               value: "3"
           volumeMounts:
           - name: backend-utils
             mountPath: /entrypoints
@@ -6266,21 +6414,21 @@
     type: RollingUpdate
     rollingUpdate:
       maxSurge: 1
   template:
     metadata:
       labels:
         app: graph-rpc-example
       annotations:
         checksum/config: 670b17f81e650742736762ab973d6485c0a8573c2052b1357a1fc94534856b7d
         checksum/config-utils: 94d255131467f84bef964a4c72b2b792c5ebaf711bb1c77829d7cd1007a8ac22
-        checksum/backend-utils: 3d2301f0fc8b4715e380acef12da66bdd16981f91351f1269a057ac022babc5a
+        checksum/backend-utils: 979dcd12a9ecdb43f1c4c9191012acbc31c14afa348c1b91e8dfb2aa7d105fae
     spec:
       nodeSelector:
         kubernetes.io/hostname: rancher-node-staging-rke2-metal01
       affinity:
         nodeAffinity:
           requiredDuringSchedulingIgnoredDuringExecution:
             nodeSelectorTerms:
             - matchExpressions:
               - key: swh/graph
                 operator: In
@@ -6516,21 +6664,21 @@
     type: RollingUpdate
     rollingUpdate:
       maxSurge: 1
   template:
     metadata:
       labels:
         app: indexer-storage-rw
       annotations:
         checksum/config: 5ab3a947db0a5acead8abc2f73699f859d84fcb4aa54856a72245fa1d471f963
         checksum/config-logging: 514b813a6cc082d2a14192b1b6946c52c586735490a886eb2a498eaf2da4e731
-        checksum/backend-utils: 3d2301f0fc8b4715e380acef12da66bdd16981f91351f1269a057ac022babc5a
+        checksum/backend-utils: 979dcd12a9ecdb43f1c4c9191012acbc31c14afa348c1b91e8dfb2aa7d105fae
     spec:
       affinity:
         nodeAffinity:
           requiredDuringSchedulingIgnoredDuringExecution:
             nodeSelectorTerms:
             - matchExpressions:
               - key: swh/rpc
                 operator: In
                 values:
                 - "true"
@@ -9015,20 +9163,21 @@
     rollingUpdate:
       maxSurge: 1
   template:
     metadata:
       labels:
         app: provenance-graph-granet
       annotations:
         checksum/config: 3a920cab49ad7bb0f2c6e36ef83ad7764740050a70f204675c9b36eb544a59b1
         checksum/config-logging: 3ec68ca129865387885cf527bf08f90bda9e6d3ae5e50d948534cbe73306d6fb
         checksum/config-utils: 94d255131467f84bef964a4c72b2b792c5ebaf711bb1c77829d7cd1007a8ac22
+        checksum/backend-utils: 979dcd12a9ecdb43f1c4c9191012acbc31c14afa348c1b91e8dfb2aa7d105fae
     spec:
       affinity:
         nodeAffinity:
           requiredDuringSchedulingIgnoredDuringExecution:
             nodeSelectorTerms:
             - matchExpressions:
               - key: swh/rpc
                 operator: In
                 values:
                 - "true"
@@ -9048,21 +9197,21 @@
             mountPath: /etc/swh/configuration-template
           - name: config-utils
             mountPath: /entrypoints
             readOnly: true
       containers:
         - name: provenance-graph-granet
           resources:
             requests:
               memory: 512Mi
               cpu: 500m
-          image: container-registry.softwareheritage.org/swh/infra/swh-apps/provenance:20250319.1
+          image: container-registry.softwareheritage.org/swh/infra/swh-apps/provenance:20250321.1
           imagePullPolicy: IfNotPresent
           ports:
             - containerPort: 5014
               name: rpc
           readinessProbe:
             httpGet:
               path: /
               port: rpc
             initialDelaySeconds: 15
             failureThreshold: 30
@@ -9071,76 +9220,270 @@
             tcpSocket:
               port: rpc
             initialDelaySeconds: 10
             periodSeconds: 5
           command:
           - /bin/bash
           args:
           - -c
           - /opt/swh/entrypoint.sh
           env:
+            - name: PROVENANCE_TYPE
+              value: rpc
+            - name: PORT
+              value: "5014"
             - name: WORKERS
               value: "2"
             - name: THREADS
               value: "2"
             - name: TIMEOUT
               value: "60"
+            - name: SWH_CONFIG_FILENAME
+              value: /etc/swh/config.yml
+            - name: SWH_LOG_CONFIG_JSON
+              value: /etc/swh/logging/logging-gunicorn.json
+            - name: STATSD_SERVICE_TYPE
+              value: provenance-graph-granet
             - name: STATSD_HOST
               value: prometheus-statsd-exporter
             - name: STATSD_PORT
               value: "9125"
             - name: STATSD_TAGS
               value: deployment:provenance-graph-granet
-            - name: STATSD_SERVICE_TYPE
-              value: provenance-graph-granet
             - name: SWH_LOG_LEVEL
-              value: "INFO"
-            - name: SWH_LOG_CONFIG_JSON
-              value: /etc/swh/logging/logging-gunicorn.json
+              value: INFO
             - name: SWH_SENTRY_ENVIRONMENT
               value: staging
             - name: SWH_MAIN_PACKAGE
               value: swh.provenance
             - name: SWH_SENTRY_DSN
               valueFrom:
                 secretKeyRef:
                   name: common-secrets
                   key: provenance-sentry-dsn
                   # 'name' secret should exist & include key
                   # if the setting doesn't exist, sentry pushes will be disabled
                   optional: true
             - name: SWH_SENTRY_DISABLE_LOGGING_EVENTS
               value: "true"
           volumeMounts:
           - name: configuration
             mountPath: /etc/swh
           - name: configuration-logging
             mountPath: /etc/swh/logging
+          
       volumes:
       - name: configuration
         emptyDir: {}
       - name: configuration-template
         configMap:
           name: provenance-graph-granet-configuration-template
           items:
           - key: "config.yml.template"
             path: "config.yml.template"
       - name: configuration-logging
         configMap:
           name: provenance-graph-granet-configuration-logging
           items:
           - key: "logging-gunicorn.json"
             path: "logging-gunicorn.json"
+      
       - name: config-utils
         configMap:
           name: config-utils
           defaultMode: 0555
+      - name: backend-utils
+        configMap:
+          name: backend-utils
+          defaultMode: 0555
+---
+# Source: swh/templates/provenance/deployment.yaml
+apiVersion: apps/v1
+kind: Deployment
+metadata:
+  namespace: swh-cassandra-next-version
+  name: provenance-grpc
+  labels:
+    app: provenance-grpc
+spec:
+  revisionHistoryLimit: 2
+  selector:
+    matchLabels:
+      app: provenance-grpc
+  strategy:
+    type: RollingUpdate
+    rollingUpdate:
+      maxSurge: 1
+  template:
+    metadata:
+      labels:
+        app: provenance-grpc
+      annotations:
+        checksum/config: e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855
+        checksum/config-logging: 9fa299d379f661eab9d312fce16ef38fb94e197e908b92ac100aff85b4c36bb4
+        checksum/config-utils: 94d255131467f84bef964a4c72b2b792c5ebaf711bb1c77829d7cd1007a8ac22
+        checksum/backend-utils: 979dcd12a9ecdb43f1c4c9191012acbc31c14afa348c1b91e8dfb2aa7d105fae
+    spec:
+      affinity:
+        nodeAffinity:
+          requiredDuringSchedulingIgnoredDuringExecution:
+            nodeSelectorTerms:
+            - matchExpressions:
+              - key: swh/rpc
+                operator: In
+                values:
+                - "true"
+      priorityClassName: swh-cassandra-next-version-frontend-rpc
+      initContainers:
+        - name: fetch-provenance-dataset
+          image: container-registry.softwareheritage.org/swh/infra/swh-apps/provenance:20250321.1
+          command:
+          - /entrypoints/provenance-fetch-datasets.sh
+          env:
+          - name: WITNESS_FETCH_FILE
+            value: /srv/dataset/provenance/.provenance-is-initialized
+          - name: SWH_CONFIG_FILENAME
+            value: /etc/swh/config.yml
+          - name: PROVENANCE_PATH
+            value: /srv/dataset/provenance
+          - name: PROVENANCE_DATASET_FULL
+            value: "false"
+          - name: GRAPH_PATH
+            value: /srv/dataset/graph
+          - name: DATASET_VERSION
+            value: 2024-08-23-popular-500-python
+          volumeMounts:
+          - name: configuration
+            mountPath: /etc/swh
+          - name: backend-utils
+            mountPath: /entrypoints
+          - name: dataset-persistent
+            mountPath: /srv/dataset
+            readOnly: false
+          
+        - name: index-provenance-dataset
+          image: container-registry.softwareheritage.org/swh/infra/swh-apps/provenance:20250321.1
+          imagePullPolicy: IfNotPresent
+          command:
+          - /entrypoints/provenance-index-dataset.sh
+          env:
+            - name: WITNESS_DATASETS_FILE
+              value: /srv/dataset/provenance/.provenance-is-initialized
+            - name: WITNESS_INDEX_FILE
+              value: /srv/dataset/provenance/.provenance-is-indexed
+            - name: PROVENANCE_PATH
+              value: /srv/dataset/provenance
+            - name: PERIOD
+              value: "3"
+          volumeMounts:
+          - name: backend-utils
+            mountPath: /entrypoints
+            readOnly: true
+          - name: dataset-persistent
+            mountPath: /srv/dataset
+            readOnly: false
+          
+        - name: wait-for-dataset
+          image: container-registry.softwareheritage.org/swh/infra/swh-apps/utils:20250211.1
+          imagePullPolicy: IfNotPresent
+          command:
+          - /entrypoints/wait-for-dataset.sh
+          env:
+            - name: WITNESS_FILE
+              value: /srv/dataset/provenance/.provenance-is-initialized
+            - name: PERIOD
+              value: "3"
+          volumeMounts:
+          - name: backend-utils
+            mountPath: /entrypoints
+            readOnly: true
+          - name: dataset-persistent
+            mountPath: /srv/dataset
+            readOnly: false
+          
+      containers:
+        - name: provenance-grpc
+          resources:
+            requests:
+              memory: 512Mi
+              cpu: 500m
+          image: container-registry.softwareheritage.org/swh/infra/swh-apps/provenance:20250321.1
+          imagePullPolicy: IfNotPresent
+          ports:
+            - containerPort: 50141
+              name: grpc
+          readinessProbe:
+            tcpSocket:
+              port: grpc
+            initialDelaySeconds: 15
+            failureThreshold: 30
+            periodSeconds: 5
+          livenessProbe:
+            tcpSocket:
+              port: grpc
+            initialDelaySeconds: 10
+            periodSeconds: 5
+          command:
+          - /bin/bash
+          args:
+          - -c
+          - /opt/swh/entrypoint.sh
+          env:
+            - name: PROVENANCE_TYPE
+              value: grpc
+            - name: PORT
+              value: "50141"
+            - name: PROVENANCE_PATH
+              value: /srv/dataset/provenance
+            - name: GRAPH_PATH
+              value: /srv/dataset/graph/graph
+            - name: STATSD_HOST
+              value: prometheus-statsd-exporter
+            - name: STATSD_PORT
+              value: "9125"
+            - name: STATSD_TAGS
+              value: deployment:provenance-grpc
+            - name: SWH_LOG_LEVEL
+              value: INFO
+            - name: SWH_SENTRY_ENVIRONMENT
+              value: staging
+            - name: SWH_MAIN_PACKAGE
+              value: swh.provenance
+            - name: SWH_SENTRY_DSN
+              valueFrom:
+                secretKeyRef:
+                  name: common-secrets
+                  key: provenance-sentry-dsn
+                  # 'name' secret should exist & include key
+                  # if the setting doesn't exist, sentry pushes will be disabled
+                  optional: true
+            - name: SWH_SENTRY_DISABLE_LOGGING_EVENTS
+              value: "true"
+          volumeMounts:
+          - name: dataset-persistent
+            mountPath: /srv/dataset
+            readOnly: false
+          
+      volumes:
+      - name: configuration
+        emptyDir: {}
+      - name: config-utils
+        configMap:
+          name: config-utils
+          defaultMode: 0555
+      - name: backend-utils
+        configMap:
+          name: backend-utils
+          defaultMode: 0555
+      - name: dataset-persistent
+        persistentVolumeClaim:
+          claimName: provenance-popular-persistent-pvc
 ---
 # Source: swh/templates/scheduler/extra-services-deployment.yaml
 apiVersion: apps/v1
 kind: Deployment
 metadata:
   namespace: swh-cassandra-next-version
   name: scheduler-listener
   labels:
     app: scheduler-listener
 spec:
@@ -10090,21 +10433,21 @@
     rollingUpdate:
       maxSurge: 1
   template:
     metadata:
       labels:
         app: search-rpc
       annotations:
         checksum/config: 4526bc9eea3c5b071fc8d14d9a9993ae1819e8c7de57d97340b7f3d4a10b8b4f
         checksum/config-logging: 0bc72d1f0a5e779cba1b812d82f00ae1973bb8c5140ff975d94cb4e7f4181000
         checksum/config-utils: 94d255131467f84bef964a4c72b2b792c5ebaf711bb1c77829d7cd1007a8ac22
-        checksum/backend-utils: 3d2301f0fc8b4715e380acef12da66bdd16981f91351f1269a057ac022babc5a
+        checksum/backend-utils: 979dcd12a9ecdb43f1c4c9191012acbc31c14afa348c1b91e8dfb2aa7d105fae
     spec:
       affinity:
         nodeAffinity:
           requiredDuringSchedulingIgnoredDuringExecution:
             nodeSelectorTerms:
             - matchExpressions:
               - key: swh/rpc
                 operator: In
                 values:
                 - "true"
@@ -10290,21 +10633,21 @@
   strategy:
     type: RollingUpdate
     rollingUpdate:
       maxSurge: 1
   template:
     metadata:
       labels:
         app: storage-replayer-origin
       annotations:
         checksum/config: 2d12ac61e189624151390a01fbdc28c53b735945f71f4b4fbe01a83a1065bd34
-        checksum/config_utils: 3d2301f0fc8b4715e380acef12da66bdd16981f91351f1269a057ac022babc5a
+        checksum/config_utils: 979dcd12a9ecdb43f1c4c9191012acbc31c14afa348c1b91e8dfb2aa7d105fae
     spec:
       affinity:
         nodeAffinity:
           requiredDuringSchedulingIgnoredDuringExecution:
             nodeSelectorTerms:
             - matchExpressions:
               - key: node-role.kubernetes.io/etcd
                 operator: NotIn
                 values:
                 - "true"
@@ -10408,21 +10751,21 @@
     type: RollingUpdate
     rollingUpdate:
       maxSurge: 1
   template:
     metadata:
       labels:
         app: storage-ro-cassandra
       annotations:
         checksum/config: 82ddd066ef3b39ba30a9377b2e3d5f34e9c6771d244cb9966170d8821659043e
         checksum/config-logging: 539b1f63c51f751ac609a212d1d29b53ccee1fe8861d4b40cf156cabdbbcc9af
-        checksum/backend-utils: 3d2301f0fc8b4715e380acef12da66bdd16981f91351f1269a057ac022babc5a
+        checksum/backend-utils: 979dcd12a9ecdb43f1c4c9191012acbc31c14afa348c1b91e8dfb2aa7d105fae
         checksum/config-utils: 94d255131467f84bef964a4c72b2b792c5ebaf711bb1c77829d7cd1007a8ac22
     spec:
       affinity:
         nodeAffinity:
           requiredDuringSchedulingIgnoredDuringExecution:
             nodeSelectorTerms:
             - matchExpressions:
               - key: swh/storage
                 operator: In
                 values:
@@ -10565,21 +10908,21 @@
     type: RollingUpdate
     rollingUpdate:
       maxSurge: 1
   template:
     metadata:
       labels:
         app: storage-ro-postgresql
       annotations:
         checksum/config: ee389a0c378431ad964e527086008e8983bd8181007a5cef4bdc6b2f3719b891
         checksum/config-logging: 04d1f9a399d1326e46f5b78ca589b7dc5a6187afba6e61a46a7b333d07cb16aa
-        checksum/backend-utils: 3d2301f0fc8b4715e380acef12da66bdd16981f91351f1269a057ac022babc5a
+        checksum/backend-utils: 979dcd12a9ecdb43f1c4c9191012acbc31c14afa348c1b91e8dfb2aa7d105fae
         checksum/config-utils: 94d255131467f84bef964a4c72b2b792c5ebaf711bb1c77829d7cd1007a8ac22
     spec:
       affinity:
         nodeAffinity:
           requiredDuringSchedulingIgnoredDuringExecution:
             nodeSelectorTerms:
             - matchExpressions:
               - key: swh/storage
                 operator: In
                 values:
@@ -10722,21 +11065,21 @@
     type: RollingUpdate
     rollingUpdate:
       maxSurge: 1
   template:
     metadata:
       labels:
         app: storage-rw-cassandra
       annotations:
         checksum/config: 899b95e866ca9a7e5b053415e3df669b133875d96fbf5d91db52e0b105986297
         checksum/config-logging: c4f758383b8e60062735e5b03cfe1724b60a3a961b95632b75c511025ddd5d28
-        checksum/backend-utils: 3d2301f0fc8b4715e380acef12da66bdd16981f91351f1269a057ac022babc5a
+        checksum/backend-utils: 979dcd12a9ecdb43f1c4c9191012acbc31c14afa348c1b91e8dfb2aa7d105fae
         checksum/config-utils: 94d255131467f84bef964a4c72b2b792c5ebaf711bb1c77829d7cd1007a8ac22
     spec:
       affinity:
         nodeAffinity:
           requiredDuringSchedulingIgnoredDuringExecution:
             nodeSelectorTerms:
             - matchExpressions:
               - key: swh/storage
                 operator: In
                 values:
@@ -10893,21 +11236,21 @@
     type: RollingUpdate
     rollingUpdate:
       maxSurge: 1
   template:
     metadata:
       labels:
         app: storage-rw-postgresql
       annotations:
         checksum/config: 053e84875d8867dbb844f84faa7b55a7d5647ab98845e4c0fd8e83ed06aaaf04
         checksum/config-logging: b6c995f5c5944a279018efc4d112fe1395da696289b71ebf4ea382011400cd03
-        checksum/backend-utils: 3d2301f0fc8b4715e380acef12da66bdd16981f91351f1269a057ac022babc5a
+        checksum/backend-utils: 979dcd12a9ecdb43f1c4c9191012acbc31c14afa348c1b91e8dfb2aa7d105fae
         checksum/config-utils: 94d255131467f84bef964a4c72b2b792c5ebaf711bb1c77829d7cd1007a8ac22
     spec:
       affinity:
         nodeAffinity:
           requiredDuringSchedulingIgnoredDuringExecution:
             nodeSelectorTerms:
             - matchExpressions:
               - key: swh/storage
                 operator: In
                 values:
@@ -11597,21 +11940,21 @@
       app: web-cassandra
   strategy:
     type: RollingUpdate
     rollingUpdate:
       maxSurge: 1
   template:
     metadata:
       labels:
         app: web-cassandra
       annotations:
-        checksum/config: 412e790069b96930e00d38eea4a3ac5fcaac2ea46e0f75f429be7690318080bf
+        checksum/config: 4c81b069c0173732d5e8c5acb7f38671a4daba9145b4f75fc6dee8d19d42fc1c
         checksum/config-logging: f266f784128ac9c57c6d0f154a646e15f06d0ad7557f191487df0d1b385acb48
         checksum/config-utils: 94d255131467f84bef964a4c72b2b792c5ebaf711bb1c77829d7cd1007a8ac22
     spec:
       affinity:
         nodeAffinity:
           requiredDuringSchedulingIgnoredDuringExecution:
             nodeSelectorTerms:
             - matchExpressions:
               - key: swh/web
                 operator: In
@@ -12281,20 +12624,51 @@
     http:
       paths:
       - path: /
         pathType: Prefix
         backend:
           service:
             name: provenance-graph-granet
             port:
               number: 5014
 ---
+# Source: swh/templates/provenance/ingress.yaml
+apiVersion: networking.k8s.io/v1
+kind: Ingress
+metadata:
+  namespace: swh-cassandra-next-version
+  name: provenance-grpc-ingress-default
+  labels:
+    app: provenance-grpc
+    endpoint-definition: default
+  annotations: 
+    nginx.ingress.kubernetes.io/backend-protocol: GRPC
+    nginx.ingress.kubernetes.io/client-body-buffer-size: 128K
+    nginx.ingress.kubernetes.io/proxy-body-size: 4G
+    nginx.ingress.kubernetes.io/proxy-buffering: "on"
+    nginx.ingress.kubernetes.io/service-upstream: "true"
+    nginx.ingress.kubernetes.io/ssl-redirect: "true"
+    nginx.ingress.kubernetes.io/whitelist-source-range: 10.42.0.0/16,10.43.0.0/16
+spec:
+  ingressClassName: nginx
+  rules:
+  - host: provenance-grpc-next-version-ingress
+    http:
+      paths:
+      - path: /
+        pathType: Prefix
+        backend:
+          service:
+            name: provenance-grpc
+            port:
+              number: 50141
+---
 # Source: swh/templates/web/ingress.yaml
 apiVersion: networking.k8s.io/v1
 kind: Ingress
 metadata:
   namespace: swh-cassandra-next-version
   name: web-cassandra-ingress-authenticated
   labels:
     app: web-cassandra
     endpoint-definition: authenticated
   annotations: 


------------- diff for environment production namespace swh -------------

--- /tmp/swh-chart.swh.lQqjoZXu/production-swh.before	2025-03-21 17:04:34.629068567 +0100
+++ /tmp/swh-chart.swh.lQqjoZXu/production-swh.after	2025-03-21 17:04:35.153048411 +0100
@@ -2585,21 +2585,21 @@
     fi
 
     graph_transposed_name=${GRAPH_NAME}-transposed.graph
     if [ -L ${DATASET_LOCATION}/${graph_transposed_name} ] || ! [ -f ${DATASET_LOCATION}/${graph_transposed_name} ]; then
       cp -v --remove-destination ${DATASET_SOURCE}/${graph_transposed_name} ${DATASET_LOCATION}/;
     fi
 
     # Finally, we make explicit the graph is ready
     touch ${WITNESS_FILE}
 
-  graph-wait-for-dataset.sh: |
+  wait-for-dataset.sh: |
     #!/usr/bin/env bash
     # Uses env variables WITNESS_FILE
     [ -z "${WITNESS_FILE}" ] && \
       echo "<WITNESS_FILE> env variable must be set" && exit 1
 
     while [ ! -f ${WITNESS_FILE} ]; do
         echo "${WITNESS_FILE} not present, wait for it to start the graph..."
         sleep $PERIOD
     done
 
@@ -2681,20 +2681,125 @@
         echo "${WITNESS_SOURCE_FILE} missing, waiting graph dataset installation..."
         sleep $PERIOD
     done
 
     # For old datasets missing a .ef or in the wrong format, this fails with
     # `Cannot map Elias-Fano pointer list .../graph.ef`. The solution is to
     # reindex the dataset
     swh graph reindex --ef ${DATASET_LOCATION}/${GRAPH_NAME} && \
       touch $WITNESS_REINDEX_FILE
 
+  provenance-fetch-datasets.sh: |
+    #!/usr/bin/env bash
+    [ -z "${WITNESS_FETCH_FILE}" ] && \
+      echo "<WITNESS_FETCH_FILE> env variable must be set" && exit 1
+    [ -z "${DATASET_VERSION}" ] && \
+      echo "<DATASET_VERSION> env variable must be set" && exit 1
+    [ -z "${PROVENANCE_PATH}" ] && \
+      echo "<PROVENANCE_PATH> env variable must be set" && exit 1
+    [ -z "${GRAPH_PATH}" ] && \
+      echo "<GRAPH_PATH> env variable must be set" && exit 1
+
+    [ -f ${WITNESS_FETCH_FILE} ] && \
+        echo "Datasets graph & provenance <${DATASET_VERSION}> already present. Skip." && \
+        exit 0
+
+    set -e
+
+    # Create destination paths
+    mkdir -p ${PROVENANCE_PATH} ${GRAPH_PATH}
+
+    echo "Fetching datasets..."
+
+    if [ ${PROVENANCE_DATASET_FULL} = true ]; then
+        # Retrieve all the provenance dataset
+        REFS=all
+    else
+        # This excludes revisions not targetted by a snapshot
+        # Ok to use for test purposes
+        REFS=heads
+    fi
+
+    URL_PROVENANCE="s3://softwareheritage/derived_datasets/${DATASET_VERSION}/provenance/${REFS}/"
+
+    CMD_GET="aws s3 cp --no-progress --no-sign-request"
+
+    echo "1. Fetching provenance dataset (parquet files)..."
+    ${CMD_GET} --recursive "${URL_PROVENANCE}" "${PROVENANCE_PATH}"
+    echo "1. Provenance datasets installed!"
+
+    echo "2. Fetching extra graph files..."
+    URL_GRAPH="s3://softwareheritage/graph/${DATASET_VERSION}/compressed"
+
+    mkdir -p "${GRAPH_PATH}"
+    for filename in graph.pthash graph.pthash.order graph.nodes.count.txt \
+                    graph.property.message.bin.zst \
+                    graph.property.message.offset.bin.zst \
+                    graph.property.tag_name.bin.zst \
+                    graph.property.tag_name.offset.bin.zst \
+                    graph.node2swhid.bin.zst graph.node2type.bin.zst; do
+        ${CMD_GET} "${URL_GRAPH}/${filename}" "${GRAPH_PATH}"
+    done
+    echo "2. Extra graph files installed!"
+
+    echo "3. Uncompressing graph files..."
+    set -x
+    # Uncompress the compressed graph *.zst files
+    for filepath in $(ls ${GRAPH_PATH}/*.zst); do
+        # Uncompress and delete the .zst file
+        [ -f "${filepath}" ] && unzstd --force --rm "${filepath}"
+    done
+    set +x
+    echo "3. Graph files uncompressed!"
+
+    # Make explicit the provenance datasets are fetched
+    touch ${WITNESS_FETCH_FILE}
+
+    echo "Provenance datasets installed!"
+
+  provenance-index-dataset.sh: |
+    #!/usr/bin/env bash
+    [ -z "${WITNESS_DATASETS_FILE}" ] && \
+      echo "<WITNESS_DATASETS_FILE> env variable must be set" && exit 1
+    [ -z "${WITNESS_INDEX_FILE}" ] && \
+      echo "<WITNESS_INDEX_FILE> env variable must be set" && exit 1
+    [ -z "${PERIOD}" ] && \
+      echo "<PERIOD> env variable must be set" && exit 1
+    [ -z "${PROVENANCE_PATH}" ] && \
+      echo "<PROVENANCE_PATH> env variable must be set" && exit 1
+
+    [ -f ${WITNESS_INDEX_FILE} ] && echo "Provenance already indexed, do nothing." && \
+      exit 0
+
+    set -eu
+
+    # Let's wait for the dataset installation
+    while [ ! -f "${WITNESS_DATASETS_FILE}" ]; do
+        echo "${WITNESS_DATASETS_FILE} missing, waiting provenance dataset installation..."
+        sleep $PERIOD
+    done
+
+    echo "Datasets file installed, build provenance dataset indexes..."
+
+    echo "provenance path: $PROVENANCE_PATH"
+    set -x
+
+    # To make the query faster, the provenance needs to build index out of the
+    # current dataset files. We store the output indexes in the same path as
+    # the dataset.
+    swh-provenance-index \
+      --database file://${PROVENANCE_PATH} && \
+      touch "${WITNESS_INDEX_FILE}" && \
+      echo "Provenance indexes built!" || \
+
+    echo "Provenance indexes failed!"
+
   initialize-search-backend.sh: |
     #!/usr/bin/env bash
 
     set -eux
 
     # Uses internally the environment variable SWH_CONFIG_FILENAME
     swh search initialize
   register-task-types.sh: |
     #!/usr/bin/env bash
 
@@ -4386,21 +4491,21 @@
   strategy:
     type: RollingUpdate
     rollingUpdate:
       maxSurge: 1
   template:
     metadata:
       labels:
         app: storage-replayer-content
       annotations:
         checksum/config: ad9969915c9d4f098e176250342e634c3f9950c21b4bfce3c59a756eebd29d5a
-        checksum/config_utils: 82ab9d2291625dd17a30267f551b45870420eefd4a90bb40f14412553c45556a
+        checksum/config_utils: 233f1b432787895386fcdfff598b35a77ca1f18d4a8f7f0136af55928674c9a9
     spec:
       affinity:
         nodeAffinity:
           requiredDuringSchedulingIgnoredDuringExecution:
             nodeSelectorTerms:
             - matchExpressions:
               - key: swh/replayer
                 operator: In
                 values:
                 - "true"
@@ -4509,21 +4614,21 @@
   strategy:
     type: RollingUpdate
     rollingUpdate:
       maxSurge: 1
   template:
     metadata:
       labels:
         app: storage-replayer-directory
       annotations:
         checksum/config: d10ee8d64b973f71f54ac97d9b23a984ddcaf85a14e4e7d0c1ffbe6606745a9f
-        checksum/config_utils: 82ab9d2291625dd17a30267f551b45870420eefd4a90bb40f14412553c45556a
+        checksum/config_utils: 233f1b432787895386fcdfff598b35a77ca1f18d4a8f7f0136af55928674c9a9
     spec:
       affinity:
         nodeAffinity:
           requiredDuringSchedulingIgnoredDuringExecution:
             nodeSelectorTerms:
             - matchExpressions:
               - key: swh/replayer
                 operator: In
                 values:
                 - "true"
@@ -4632,21 +4737,21 @@
   strategy:
     type: RollingUpdate
     rollingUpdate:
       maxSurge: 1
   template:
     metadata:
       labels:
         app: storage-replayer-extid
       annotations:
         checksum/config: d6e51b2acf85824083b41c3fc454e0bde5cda180b13fd0ea0f7a90de3b13dd10
-        checksum/config_utils: 82ab9d2291625dd17a30267f551b45870420eefd4a90bb40f14412553c45556a
+        checksum/config_utils: 233f1b432787895386fcdfff598b35a77ca1f18d4a8f7f0136af55928674c9a9
     spec:
       affinity:
         nodeAffinity:
           requiredDuringSchedulingIgnoredDuringExecution:
             nodeSelectorTerms:
             - matchExpressions:
               - key: swh/replayer
                 operator: In
                 values:
                 - "true"
@@ -4755,21 +4860,21 @@
   strategy:
     type: RollingUpdate
     rollingUpdate:
       maxSurge: 1
   template:
     metadata:
       labels:
         app: storage-replayer-metadata
       annotations:
         checksum/config: 568c27f168a777aa1cd02d52a482105f91ea07a5ca96c490faecb6f0f126d510
-        checksum/config_utils: 82ab9d2291625dd17a30267f551b45870420eefd4a90bb40f14412553c45556a
+        checksum/config_utils: 233f1b432787895386fcdfff598b35a77ca1f18d4a8f7f0136af55928674c9a9
     spec:
       affinity:
         nodeAffinity:
           requiredDuringSchedulingIgnoredDuringExecution:
             nodeSelectorTerms:
             - matchExpressions:
               - key: swh/replayer
                 operator: In
                 values:
                 - "true"
@@ -4878,21 +4983,21 @@
   strategy:
     type: RollingUpdate
     rollingUpdate:
       maxSurge: 1
   template:
     metadata:
       labels:
         app: storage-replayer-origin
       annotations:
         checksum/config: 5a0c927d61eea568c15a84881cdcc36061b4c530db7c11f565ede65c5b3936c3
-        checksum/config_utils: 82ab9d2291625dd17a30267f551b45870420eefd4a90bb40f14412553c45556a
+        checksum/config_utils: 233f1b432787895386fcdfff598b35a77ca1f18d4a8f7f0136af55928674c9a9
     spec:
       affinity:
         nodeAffinity:
           requiredDuringSchedulingIgnoredDuringExecution:
             nodeSelectorTerms:
             - matchExpressions:
               - key: swh/replayer
                 operator: In
                 values:
                 - "true"
@@ -5001,21 +5106,21 @@
   strategy:
     type: RollingUpdate
     rollingUpdate:
       maxSurge: 1
   template:
     metadata:
       labels:
         app: storage-replayer-origin-visit
       annotations:
         checksum/config: 0e471b3f26d83e3d374296d1a6ce7077b31cb935705c692929ee6332dacb03fa
-        checksum/config_utils: 82ab9d2291625dd17a30267f551b45870420eefd4a90bb40f14412553c45556a
+        checksum/config_utils: 233f1b432787895386fcdfff598b35a77ca1f18d4a8f7f0136af55928674c9a9
     spec:
       affinity:
         nodeAffinity:
           requiredDuringSchedulingIgnoredDuringExecution:
             nodeSelectorTerms:
             - matchExpressions:
               - key: swh/replayer
                 operator: In
                 values:
                 - "true"
@@ -5124,21 +5229,21 @@
   strategy:
     type: RollingUpdate
     rollingUpdate:
       maxSurge: 1
   template:
     metadata:
       labels:
         app: storage-replayer-origin-visit-status
       annotations:
         checksum/config: 87f95fa2d03d52fdec4dcbb83ba5e790821fe8d739e8922a97046f0d2a10abae
-        checksum/config_utils: 82ab9d2291625dd17a30267f551b45870420eefd4a90bb40f14412553c45556a
+        checksum/config_utils: 233f1b432787895386fcdfff598b35a77ca1f18d4a8f7f0136af55928674c9a9
     spec:
       affinity:
         nodeAffinity:
           requiredDuringSchedulingIgnoredDuringExecution:
             nodeSelectorTerms:
             - matchExpressions:
               - key: swh/replayer
                 operator: In
                 values:
                 - "true"
@@ -5247,21 +5352,21 @@
   strategy:
     type: RollingUpdate
     rollingUpdate:
       maxSurge: 1
   template:
     metadata:
       labels:
         app: storage-replayer-raw-extrinsic-metadata
       annotations:
         checksum/config: daffe1093b0bb5c08485e25baeb2f10f5f39f0fd826c3a40283693a9d43fae37
-        checksum/config_utils: 82ab9d2291625dd17a30267f551b45870420eefd4a90bb40f14412553c45556a
+        checksum/config_utils: 233f1b432787895386fcdfff598b35a77ca1f18d4a8f7f0136af55928674c9a9
     spec:
       affinity:
         nodeAffinity:
           requiredDuringSchedulingIgnoredDuringExecution:
             nodeSelectorTerms:
             - matchExpressions:
               - key: swh/replayer
                 operator: In
                 values:
                 - "true"
@@ -5370,21 +5475,21 @@
   strategy:
     type: RollingUpdate
     rollingUpdate:
       maxSurge: 1
   template:
     metadata:
       labels:
         app: storage-replayer-release
       annotations:
         checksum/config: 41773bc062731699b038ae98bec197c7290766735e9ae57977da8d0d1b0a82d5
-        checksum/config_utils: 82ab9d2291625dd17a30267f551b45870420eefd4a90bb40f14412553c45556a
+        checksum/config_utils: 233f1b432787895386fcdfff598b35a77ca1f18d4a8f7f0136af55928674c9a9
     spec:
       affinity:
         nodeAffinity:
           requiredDuringSchedulingIgnoredDuringExecution:
             nodeSelectorTerms:
             - matchExpressions:
               - key: swh/replayer
                 operator: In
                 values:
                 - "true"
@@ -5493,21 +5598,21 @@
   strategy:
     type: RollingUpdate
     rollingUpdate:
       maxSurge: 1
   template:
     metadata:
       labels:
         app: storage-replayer-revision
       annotations:
         checksum/config: 24e62d01eeed14fdd9eab5bcbbdfc2165b37c246e93270c6a3593a86760b296f
-        checksum/config_utils: 82ab9d2291625dd17a30267f551b45870420eefd4a90bb40f14412553c45556a
+        checksum/config_utils: 233f1b432787895386fcdfff598b35a77ca1f18d4a8f7f0136af55928674c9a9
     spec:
       affinity:
         nodeAffinity:
           requiredDuringSchedulingIgnoredDuringExecution:
             nodeSelectorTerms:
             - matchExpressions:
               - key: swh/replayer
                 operator: In
                 values:
                 - "true"
@@ -5616,21 +5721,21 @@
   strategy:
     type: RollingUpdate
     rollingUpdate:
       maxSurge: 1
   template:
     metadata:
       labels:
         app: storage-replayer-skipped-content
       annotations:
         checksum/config: 61ed93650af06dedc4ee939164381d2e9ac65098d43af26405e34c2d6fd3cae8
-        checksum/config_utils: 82ab9d2291625dd17a30267f551b45870420eefd4a90bb40f14412553c45556a
+        checksum/config_utils: 233f1b432787895386fcdfff598b35a77ca1f18d4a8f7f0136af55928674c9a9
     spec:
       affinity:
         nodeAffinity:
           requiredDuringSchedulingIgnoredDuringExecution:
             nodeSelectorTerms:
             - matchExpressions:
               - key: swh/replayer
                 operator: In
                 values:
                 - "true"
@@ -5739,21 +5844,21 @@
   strategy:
     type: RollingUpdate
     rollingUpdate:
       maxSurge: 1
   template:
     metadata:
       labels:
         app: storage-replayer-snapshot
       annotations:
         checksum/config: 32cb30621d4c74bac7750080475ba1bece7f561dbc99a798bdbda13b29c5e9c0
-        checksum/config_utils: 82ab9d2291625dd17a30267f551b45870420eefd4a90bb40f14412553c45556a
+        checksum/config_utils: 233f1b432787895386fcdfff598b35a77ca1f18d4a8f7f0136af55928674c9a9
     spec:
       affinity:
         nodeAffinity:
           requiredDuringSchedulingIgnoredDuringExecution:
             nodeSelectorTerms:
             - matchExpressions:
               - key: swh/replayer
                 operator: In
                 values:
                 - "true"
@@ -5863,21 +5968,21 @@
     type: RollingUpdate
     rollingUpdate:
       maxSurge: 1
   template:
     metadata:
       labels:
         app: storage-postgresql-azure-readonly
       annotations:
         checksum/config: f82d8c731f4709a9e756911b666f9a28a25429375b97c614a4ef5c7bc231e3c8
         checksum/config-logging: 0ecdc326a2b3e525e21e5743d89eb3c4bfbadc12aee4fbe1a32ba77ab7bde899
-        checksum/backend-utils: 82ab9d2291625dd17a30267f551b45870420eefd4a90bb40f14412553c45556a
+        checksum/backend-utils: 233f1b432787895386fcdfff598b35a77ca1f18d4a8f7f0136af55928674c9a9
         checksum/config-utils: d75ca13b805bce6a8ab59c8e24c938f2283108f6a79134f6e71db86308651dc6
     spec:
       affinity:
         nodeAffinity:
           requiredDuringSchedulingIgnoredDuringExecution:
             nodeSelectorTerms:
             - matchExpressions:
               - key: swh/storage
                 operator: In
                 values:


------------- diff for environment production namespace swh-cassandra -------------

--- /tmp/swh-chart.swh.lQqjoZXu/production-swh-cassandra.before	2025-03-21 17:04:35.009053950 +0100
+++ /tmp/swh-chart.swh.lQqjoZXu/production-swh-cassandra.after	2025-03-21 17:04:35.521034257 +0100
@@ -10509,21 +10509,21 @@
     fi
 
     graph_transposed_name=${GRAPH_NAME}-transposed.graph
     if [ -L ${DATASET_LOCATION}/${graph_transposed_name} ] || ! [ -f ${DATASET_LOCATION}/${graph_transposed_name} ]; then
       cp -v --remove-destination ${DATASET_SOURCE}/${graph_transposed_name} ${DATASET_LOCATION}/;
     fi
 
     # Finally, we make explicit the graph is ready
     touch ${WITNESS_FILE}
 
-  graph-wait-for-dataset.sh: |
+  wait-for-dataset.sh: |
     #!/usr/bin/env bash
     # Uses env variables WITNESS_FILE
     [ -z "${WITNESS_FILE}" ] && \
       echo "<WITNESS_FILE> env variable must be set" && exit 1
 
     while [ ! -f ${WITNESS_FILE} ]; do
         echo "${WITNESS_FILE} not present, wait for it to start the graph..."
         sleep $PERIOD
     done
 
@@ -10605,20 +10605,125 @@
         echo "${WITNESS_SOURCE_FILE} missing, waiting graph dataset installation..."
         sleep $PERIOD
     done
 
     # For old datasets missing a .ef or in the wrong format, this fails with
     # `Cannot map Elias-Fano pointer list .../graph.ef`. The solution is to
     # reindex the dataset
     swh graph reindex --ef ${DATASET_LOCATION}/${GRAPH_NAME} && \
       touch $WITNESS_REINDEX_FILE
 
+  provenance-fetch-datasets.sh: |
+    #!/usr/bin/env bash
+    [ -z "${WITNESS_FETCH_FILE}" ] && \
+      echo "<WITNESS_FETCH_FILE> env variable must be set" && exit 1
+    [ -z "${DATASET_VERSION}" ] && \
+      echo "<DATASET_VERSION> env variable must be set" && exit 1
+    [ -z "${PROVENANCE_PATH}" ] && \
+      echo "<PROVENANCE_PATH> env variable must be set" && exit 1
+    [ -z "${GRAPH_PATH}" ] && \
+      echo "<GRAPH_PATH> env variable must be set" && exit 1
+
+    [ -f ${WITNESS_FETCH_FILE} ] && \
+        echo "Datasets graph & provenance <${DATASET_VERSION}> already present. Skip." && \
+        exit 0
+
+    set -e
+
+    # Create destination paths
+    mkdir -p ${PROVENANCE_PATH} ${GRAPH_PATH}
+
+    echo "Fetching datasets..."
+
+    if [ ${PROVENANCE_DATASET_FULL} = true ]; then
+        # Retrieve all the provenance dataset
+        REFS=all
+    else
+        # This excludes revisions not targetted by a snapshot
+        # Ok to use for test purposes
+        REFS=heads
+    fi
+
+    URL_PROVENANCE="s3://softwareheritage/derived_datasets/${DATASET_VERSION}/provenance/${REFS}/"
+
+    CMD_GET="aws s3 cp --no-progress --no-sign-request"
+
+    echo "1. Fetching provenance dataset (parquet files)..."
+    ${CMD_GET} --recursive "${URL_PROVENANCE}" "${PROVENANCE_PATH}"
+    echo "1. Provenance datasets installed!"
+
+    echo "2. Fetching extra graph files..."
+    URL_GRAPH="s3://softwareheritage/graph/${DATASET_VERSION}/compressed"
+
+    mkdir -p "${GRAPH_PATH}"
+    for filename in graph.pthash graph.pthash.order graph.nodes.count.txt \
+                    graph.property.message.bin.zst \
+                    graph.property.message.offset.bin.zst \
+                    graph.property.tag_name.bin.zst \
+                    graph.property.tag_name.offset.bin.zst \
+                    graph.node2swhid.bin.zst graph.node2type.bin.zst; do
+        ${CMD_GET} "${URL_GRAPH}/${filename}" "${GRAPH_PATH}"
+    done
+    echo "2. Extra graph files installed!"
+
+    echo "3. Uncompressing graph files..."
+    set -x
+    # Uncompress the compressed graph *.zst files
+    for filepath in $(ls ${GRAPH_PATH}/*.zst); do
+        # Uncompress and delete the .zst file
+        [ -f "${filepath}" ] && unzstd --force --rm "${filepath}"
+    done
+    set +x
+    echo "3. Graph files uncompressed!"
+
+    # Make explicit the provenance datasets are fetched
+    touch ${WITNESS_FETCH_FILE}
+
+    echo "Provenance datasets installed!"
+
+  provenance-index-dataset.sh: |
+    #!/usr/bin/env bash
+    [ -z "${WITNESS_DATASETS_FILE}" ] && \
+      echo "<WITNESS_DATASETS_FILE> env variable must be set" && exit 1
+    [ -z "${WITNESS_INDEX_FILE}" ] && \
+      echo "<WITNESS_INDEX_FILE> env variable must be set" && exit 1
+    [ -z "${PERIOD}" ] && \
+      echo "<PERIOD> env variable must be set" && exit 1
+    [ -z "${PROVENANCE_PATH}" ] && \
+      echo "<PROVENANCE_PATH> env variable must be set" && exit 1
+
+    [ -f ${WITNESS_INDEX_FILE} ] && echo "Provenance already indexed, do nothing." && \
+      exit 0
+
+    set -eu
+
+    # Let's wait for the dataset installation
+    while [ ! -f "${WITNESS_DATASETS_FILE}" ]; do
+        echo "${WITNESS_DATASETS_FILE} missing, waiting provenance dataset installation..."
+        sleep $PERIOD
+    done
+
+    echo "Datasets file installed, build provenance dataset indexes..."
+
+    echo "provenance path: $PROVENANCE_PATH"
+    set -x
+
+    # To make the query faster, the provenance needs to build index out of the
+    # current dataset files. We store the output indexes in the same path as
+    # the dataset.
+    swh-provenance-index \
+      --database file://${PROVENANCE_PATH} && \
+      touch "${WITNESS_INDEX_FILE}" && \
+      echo "Provenance indexes built!" || \
+
+    echo "Provenance indexes failed!"
+
   initialize-search-backend.sh: |
     #!/usr/bin/env bash
 
     set -eux
 
     # Uses internally the environment variable SWH_CONFIG_FILENAME
     swh search initialize
   register-task-types.sh: |
     #!/usr/bin/env bash
 
@@ -13978,21 +14083,21 @@
     type: RollingUpdate
     rollingUpdate:
       maxSurge: 1
   template:
     metadata:
       labels:
         app: graph-grpc-20241206
       annotations:
         checksum/config: b4edb88c0bcb74769dc2f39025a598580a6d6a39cece80ba52904365cd7380eb
         checksum/config-utils: 13a26f6add17e96ce01550153c77dcd48de60241a3f4db3c93d5467234be2a7f
-        checksum/backend-utils: 498bb7b35f4e2d6996251c8615bff7661ffa9981f2f69734acb80824fc37d2b1
+        checksum/backend-utils: bfb79c03c5f47eacbf1cedde17860a255223193cdeb60f9e3455084af3bab88c
     spec:
       affinity:
         nodeAffinity:
           requiredDuringSchedulingIgnoredDuringExecution:
             nodeSelectorTerms:
             - matchExpressions:
               - key: swh/graph
                 operator: In
                 values:
                 - "true"
@@ -14164,21 +14269,21 @@
     type: RollingUpdate
     rollingUpdate:
       maxSurge: 1
   template:
     metadata:
       labels:
         app: graph-rpc-20241206
       annotations:
         checksum/config: 095d223956d75728c8f8a26368053a8882cb3026736517767d8aacfc9895e159
         checksum/config-utils: 13a26f6add17e96ce01550153c77dcd48de60241a3f4db3c93d5467234be2a7f
-        checksum/backend-utils: 498bb7b35f4e2d6996251c8615bff7661ffa9981f2f69734acb80824fc37d2b1
+        checksum/backend-utils: bfb79c03c5f47eacbf1cedde17860a255223193cdeb60f9e3455084af3bab88c
     spec:
       affinity:
         nodeAffinity:
           requiredDuringSchedulingIgnoredDuringExecution:
             nodeSelectorTerms:
             - matchExpressions:
               - key: swh/graph
                 operator: In
                 values:
                 - "true"
@@ -14523,21 +14628,21 @@
     type: RollingUpdate
     rollingUpdate:
       maxSurge: 1
   template:
     metadata:
       labels:
         app: indexer-storage-read-only
       annotations:
         checksum/config: e233fce2b3a7a714653810d4a8084763fa3d456d691d5964f00c546ebbaaa49d
         checksum/config-logging: 3c46e3e49b8224015ed0a6ef21fec2ba66c4af22a8718cd0ad4f61483cd5e8be
-        checksum/backend-utils: 498bb7b35f4e2d6996251c8615bff7661ffa9981f2f69734acb80824fc37d2b1
+        checksum/backend-utils: bfb79c03c5f47eacbf1cedde17860a255223193cdeb60f9e3455084af3bab88c
     spec:
       affinity:
         nodeAffinity:
           requiredDuringSchedulingIgnoredDuringExecution:
             nodeSelectorTerms:
             - matchExpressions:
               - key: swh/rpc
                 operator: In
                 values:
                 - "true"
@@ -14670,21 +14775,21 @@
     type: RollingUpdate
     rollingUpdate:
       maxSurge: 1
   template:
     metadata:
       labels:
         app: indexer-storage-read-write
       annotations:
         checksum/config: 3b882fc68a6b1c70f8ce6b82965db4903c5f13557d4a0a43fd6d858745c72e90
         checksum/config-logging: 2b18f7d6ed7689e52685ba77e412dffc3ee95be9bddd0aa15d728ee8ef45591d
-        checksum/backend-utils: 498bb7b35f4e2d6996251c8615bff7661ffa9981f2f69734acb80824fc37d2b1
+        checksum/backend-utils: bfb79c03c5f47eacbf1cedde17860a255223193cdeb60f9e3455084af3bab88c
     spec:
       affinity:
         nodeAffinity:
           requiredDuringSchedulingIgnoredDuringExecution:
             nodeSelectorTerms:
             - matchExpressions:
               - key: swh/rpc
                 operator: In
                 values:
                 - "true"
@@ -25360,20 +25465,21 @@
     rollingUpdate:
       maxSurge: 1
   template:
     metadata:
       labels:
         app: provenance-graph-granet
       annotations:
         checksum/config: fcc422ce13f035bd4de309693c6044e4eee6a37fdc487ec2f9fef5437dfd954e
         checksum/config-logging: ddcd27d991938c46f4fc0ad7ee028cb3005f186b3db022596c9ae94363881e4f
         checksum/config-utils: 13a26f6add17e96ce01550153c77dcd48de60241a3f4db3c93d5467234be2a7f
+        checksum/backend-utils: bfb79c03c5f47eacbf1cedde17860a255223193cdeb60f9e3455084af3bab88c
     spec:
       affinity:
         nodeAffinity:
           requiredDuringSchedulingIgnoredDuringExecution:
             nodeSelectorTerms:
             - matchExpressions:
               - key: swh/rpc
                 operator: In
                 values:
                 - "true"
@@ -25393,21 +25499,21 @@
             mountPath: /etc/swh/configuration-template
           - name: config-utils
             mountPath: /entrypoints
             readOnly: true
       containers:
         - name: provenance-graph-granet
           resources:
             requests:
               memory: 512Mi
               cpu: 500m
-          image: container-registry.softwareheritage.org/swh/infra/swh-apps/provenance:20250319.1
+          image: container-registry.softwareheritage.org/swh/infra/swh-apps/provenance:20250321.1
           imagePullPolicy: IfNotPresent
           ports:
             - containerPort: 5014
               name: rpc
           readinessProbe:
             httpGet:
               path: /
               port: rpc
             initialDelaySeconds: 15
             failureThreshold: 30
@@ -25416,76 +25522,88 @@
             tcpSocket:
               port: rpc
             initialDelaySeconds: 10
             periodSeconds: 5
           command:
           - /bin/bash
           args:
           - -c
           - /opt/swh/entrypoint.sh
           env:
+            - name: PROVENANCE_TYPE
+              value: rpc
+            - name: PORT
+              value: "5014"
             - name: WORKERS
               value: "4"
             - name: THREADS
               value: "1"
             - name: TIMEOUT
               value: "60"
+            - name: SWH_CONFIG_FILENAME
+              value: /etc/swh/config.yml
+            - name: SWH_LOG_CONFIG_JSON
+              value: /etc/swh/logging/logging-gunicorn.json
+            - name: STATSD_SERVICE_TYPE
+              value: provenance-graph-granet
             - name: STATSD_HOST
               value: prometheus-statsd-exporter
             - name: STATSD_PORT
               value: "9125"
             - name: STATSD_TAGS
               value: deployment:provenance-graph-granet
-            - name: STATSD_SERVICE_TYPE
-              value: provenance-graph-granet
             - name: SWH_LOG_LEVEL
-              value: "INFO"
-            - name: SWH_LOG_CONFIG_JSON
-              value: /etc/swh/logging/logging-gunicorn.json
+              value: INFO
             - name: SWH_SENTRY_ENVIRONMENT
               value: production
             - name: SWH_MAIN_PACKAGE
               value: swh.provenance
             - name: SWH_SENTRY_DSN
               valueFrom:
                 secretKeyRef:
                   name: common-secrets
                   key: provenance-sentry-dsn
                   # 'name' secret should exist & include key
                   # if the setting doesn't exist, sentry pushes will be disabled
                   optional: true
             - name: SWH_SENTRY_DISABLE_LOGGING_EVENTS
               value: "true"
           volumeMounts:
           - name: configuration
             mountPath: /etc/swh
           - name: configuration-logging
             mountPath: /etc/swh/logging
+          
       volumes:
       - name: configuration
         emptyDir: {}
       - name: configuration-template
         configMap:
           name: provenance-graph-granet-configuration-template
           items:
           - key: "config.yml.template"
             path: "config.yml.template"
       - name: configuration-logging
         configMap:
           name: provenance-graph-granet-configuration-logging
           items:
           - key: "logging-gunicorn.json"
             path: "logging-gunicorn.json"
+      
       - name: config-utils
         configMap:
           name: config-utils
           defaultMode: 0555
+      - name: backend-utils
+        configMap:
+          name: backend-utils
+          defaultMode: 0555
 ---
 # Source: swh/templates/scheduler/extra-services-deployment.yaml
 apiVersion: apps/v1
 kind: Deployment
 metadata:
   namespace: swh-cassandra
   name: scheduler-listener
   labels:
     app: scheduler-listener
 spec:
@@ -27252,21 +27370,21 @@
     rollingUpdate:
       maxSurge: 1
   template:
     metadata:
       labels:
         app: search-rpc
       annotations:
         checksum/config: a76abbb3247f00560f78f1f1aaafc30e0c3958dc059e75911400596ddb51b4e2
         checksum/config-logging: 7bffbc6ce2cb11d88208ef0c5f1d8e6822659c361717afb51dcf0f4da02fe1f7
         checksum/config-utils: 13a26f6add17e96ce01550153c77dcd48de60241a3f4db3c93d5467234be2a7f
-        checksum/backend-utils: 498bb7b35f4e2d6996251c8615bff7661ffa9981f2f69734acb80824fc37d2b1
+        checksum/backend-utils: bfb79c03c5f47eacbf1cedde17860a255223193cdeb60f9e3455084af3bab88c
     spec:
       affinity:
         nodeAffinity:
           requiredDuringSchedulingIgnoredDuringExecution:
             nodeSelectorTerms:
             - matchExpressions:
               - key: swh/rpc
                 operator: In
                 values:
                 - "true"
@@ -27440,21 +27558,21 @@
     type: RollingUpdate
     rollingUpdate:
       maxSurge: 1
   template:
     metadata:
       labels:
         app: storage-cassandra-azure-readonly
       annotations:
         checksum/config: 63fbb2e5c758f9faab28192d2a0458eea22410b824e63d0b35de085b50fc3e6e
         checksum/config-logging: 6d3a84a071464bdb72aea996f9c90be8ff89eedb1f09a8cc71c7699e652c8a47
-        checksum/backend-utils: 498bb7b35f4e2d6996251c8615bff7661ffa9981f2f69734acb80824fc37d2b1
+        checksum/backend-utils: bfb79c03c5f47eacbf1cedde17860a255223193cdeb60f9e3455084af3bab88c
         checksum/config-utils: 13a26f6add17e96ce01550153c77dcd48de60241a3f4db3c93d5467234be2a7f
     spec:
       affinity:
         nodeAffinity:
           requiredDuringSchedulingIgnoredDuringExecution:
             nodeSelectorTerms:
             - matchExpressions:
               - key: swh/storage
                 operator: In
                 values:
@@ -27693,21 +27811,21 @@
     type: RollingUpdate
     rollingUpdate:
       maxSurge: 1
   template:
     metadata:
       labels:
         app: storage-cassandra-readonly
       annotations:
         checksum/config: 7a7e39b703ff92c12c87d9b4d8f0ea91c4d79e96c78f05476f37ab783c4687ff
         checksum/config-logging: 800fc3f5bdfec12955f3689d3c319b74f52a02d09b08fb710cea854d815dfad6
-        checksum/backend-utils: 498bb7b35f4e2d6996251c8615bff7661ffa9981f2f69734acb80824fc37d2b1
+        checksum/backend-utils: bfb79c03c5f47eacbf1cedde17860a255223193cdeb60f9e3455084af3bab88c
         checksum/config-utils: 13a26f6add17e96ce01550153c77dcd48de60241a3f4db3c93d5467234be2a7f
     spec:
       affinity:
         nodeAffinity:
           requiredDuringSchedulingIgnoredDuringExecution:
             nodeSelectorTerms:
             - matchExpressions:
               - key: swh/storage
                 operator: In
                 values:
@@ -27946,21 +28064,21 @@
     type: RollingUpdate
     rollingUpdate:
       maxSurge: 1
   template:
     metadata:
       labels:
         app: storage-cassandra-readonly-internal
       annotations:
         checksum/config: 0094e08338f11eeeda7a0958ec8402ab9d37544aecc81f62772fddad37c38dfe
         checksum/config-logging: 2ecb5a0cb1eaeb9246bc272bee5df3292a73f3c87134840a682f8c3fb03ac008
-        checksum/backend-utils: 498bb7b35f4e2d6996251c8615bff7661ffa9981f2f69734acb80824fc37d2b1
+        checksum/backend-utils: bfb79c03c5f47eacbf1cedde17860a255223193cdeb60f9e3455084af3bab88c
         checksum/config-utils: 13a26f6add17e96ce01550153c77dcd48de60241a3f4db3c93d5467234be2a7f
     spec:
       affinity:
         nodeAffinity:
           requiredDuringSchedulingIgnoredDuringExecution:
             nodeSelectorTerms:
             - matchExpressions:
               - key: swh/storage
                 operator: In
                 values:
@@ -28199,21 +28317,21 @@
     type: RollingUpdate
     rollingUpdate:
       maxSurge: 1
   template:
     metadata:
       labels:
         app: storage-cassandra-winery
       annotations:
         checksum/config: 17b0f2faa5762626ef5c966dc8a2aa810c208c8c86643305a8921e538ea21583
         checksum/config-logging: 21f19120491561669a337c91f8a5b62fb9b081d0c0ca55a9e69fdc26e1a5350a
-        checksum/backend-utils: 498bb7b35f4e2d6996251c8615bff7661ffa9981f2f69734acb80824fc37d2b1
+        checksum/backend-utils: bfb79c03c5f47eacbf1cedde17860a255223193cdeb60f9e3455084af3bab88c
         checksum/config-utils: 13a26f6add17e96ce01550153c77dcd48de60241a3f4db3c93d5467234be2a7f
     spec:
       affinity:
         nodeAffinity:
           requiredDuringSchedulingIgnoredDuringExecution:
             nodeSelectorTerms:
             - matchExpressions:
               - key: swh/storage
                 operator: In
                 values:

Refs. swh/infra/sysadm-environment#5608 (closed)

Edited by Antoine R. Dumont

Merge request reports

Loading
Loading

Activity

Filter activity
  • Approvals
  • Assignees & reviewers
  • Comments (from bots)
  • Comments (from users)
  • Commits & branches
  • Edits
  • Labels
  • Lock status
  • Mentions
  • Merge request status
  • Tracking
  • Antoine R. Dumont added 26 commits

    added 26 commits

    • db356083...dc5d1f09 - 15 commits from branch production
    • b93e84df - 1 earlier commit
    • 6f0d8907 - provenance: Start adapting deployment template to configure the port
    • 53beb2f3 - deployment/provenance: Refactor configuration parsing
    • 866d0803 - local-cluster: Add dummy provenance instance
    • 7a498aa5 - values-swh-application-versions: Bump to provenance v3.2
    • ccebc43c - deployment/provenance: Add startService configuration entry
    • 74aed5df - deployment/provenance: Unify log level setup
    • 35474a2a - deployment/provenance: Deploy gunicorn setup for rpc service type
    • ab277e3b - provenance/values: Design yaml configuration
    • 2d7d4e0a - provenance/deployment: Compute mandatory configuration per type
    • 2e26cd5b - wip: provenance/deployment: Fetch and prepare volumes with dataset

    Compare with previous version

  • added 1 commit

    • e0a7274d - provenance/deployment: Allow to prepare grpc backend with data

    Compare with previous version

  • Antoine R. Dumont added 10 commits

    added 10 commits

    • f54fd1da - next-version: Deploy provenance grpc
    • 355c979a - provenance/deployment: Add missing backend-utils volume mount
    • 30dab34d - provenance/deployment: Fix indentation
    • 54f4d819 - provenance/configmap: Only deploy configmap for rpc service
    • ead37538 - provenance/config: Fix missing provenance path introspection
    • 16351bdb - provenance/deployment: Drop unneeded image prefix name
    • d1eb8546 - provenance/deployment: Fix variable issue type and typo
    • f3d20d07 - provenance/deployment: deploy rpc configuration for rpc server
    • 58659568 - provenance/helper-volume: Fix witness file name
    • 56750418 - backend-utils: Make script verbose

    Compare with previous version

  • Antoine R. Dumont added 7 commits

    added 7 commits

    • 9b410ca8 - backend-utils/provenance-fetch-datasets.sh: Fix script
    • f4b934d1 - provenance/helper-volume: Fix witness file name on provenance index script
    • bc396e58 - provenance/backend-utils: Make script verbose & readable
    • e5a20df9 - provenance/fetch-provenance-dataset: Create the graph directory
    • bc746488 - provenance/provenance-index-dataset: Make it functional
    • bf75e9fa - provenance/script: Iterate to make script resilient
    • 6e8eddae - local-cluster/provenance: Use local-persistent class for pv

    Compare with previous version

  • Antoine R. Dumont added 9 commits

    added 9 commits

    • e5fdcc56 - provenance: Allow to use a data subset
    • 3d4603b5 - provenance/script: Fix graph files retrieval step
    • fa269d28 - provenance/script: Drop debugging instructions
    • d86aaf80 - provenance/script: Activate back index check
    • 6a8ab8b3 - values-swh-application-versions: Bump to recent provenance image
    • e27c57b2 - provenance: Add missing graph dataset file
    • 1c0ecd28 - deployment/provenance: It's actually the pattern to access graph files
    • b5983668 - provenance/helper-volume: Fix variable name typo
    • 12ac4262 - provenance/fetch-dataset: Add missing graph files

    Compare with previous version

  • Antoine R. Dumont added 39 commits

    added 39 commits

    • 48e56fd3 - 1 commit from branch production
    • 48e56fd3...58d1470e - 28 earlier commits
    • defa89b2 - provenance: Allow to use a data subset
    • 7f522b42 - provenance/script: Fix graph files retrieval step
    • 3dfe3991 - provenance/script: Drop debugging instructions
    • d85e5e6c - provenance/script: Activate back index check
    • 3aefbce2 - values-swh-application-versions: Bump to recent provenance image
    • fb8c6232 - provenance: Add missing graph dataset file
    • 1b517a96 - deployment/provenance: It's actually the pattern to access graph files
    • c3b80d63 - provenance/helper-volume: Fix variable name typo
    • cbfcaa42 - provenance/fetch-dataset: Add missing graph files
    • 2f025065 - next-version/provenance: Use smaller dataset

    Compare with previous version

  • Antoine R. Dumont changed the description

    changed the description

  • Antoine R. Dumont changed the description

    changed the description

  • added 1 commit

    • 7a9f9d7b - local-cluster/provenance: Add ingress setup

    Compare with previous version

  • added 1 commit

    • e6c6f946 - helper-ingress: Move grpc extra annotations declaration in helper

    Compare with previous version

  • Antoine R. Dumont changed the description

    changed the description

  • Antoine R. Dumont changed the description

    changed the description

  • Antoine R. Dumont added 41 commits

    added 41 commits

    • 35692512 - 1 commit from branch production
    • 35692512...5d13a8b0 - 30 earlier commits
    • f737f9c2 - provenance/script: Drop debugging instructions
    • 167692d4 - provenance/script: Activate back index check
    • f2a60beb - values-swh-application-versions: Bump to recent provenance image
    • 38298168 - provenance: Add missing graph dataset file
    • ca1beb2b - deployment/provenance: It's actually the pattern to access graph files
    • e0451855 - provenance/helper-volume: Fix variable name typo
    • 02652e1c - provenance/fetch-dataset: Add missing graph files
    • 6fcc96c0 - next-version/provenance: Use smaller dataset
    • f42977d7 - local-cluster/provenance: Add ingress setup
    • eb30d291 - helper-ingress: Move grpc extra annotations declaration in helper

    Compare with previous version

  • Antoine R. Dumont changed the description

    changed the description

  • Antoine R. Dumont added 21 commits

    added 21 commits

    • eb30d291...256bd1b9 - 11 earlier commits
    • 740ad96f - provenance/deployment: Drop unneeded image prefix name
    • ff197899 - provenance/deployment: Fix variable issue type and typo
    • c6c5f0fb - provenance/deployment: deploy rpc configuration for rpc server
    • da0fbf93 - backend-utils/provenance: Iterate to make scripts functionals
    • a5317f81 - provenance/deployment: Allow to use a data subset
    • aa953c03 - provenance/script: Fix graph files retrieval step
    • fc564ccd - provenance/fetch-dataset: Make the provenance functional
    • 557a553f - helper-ingress: Move grpc extra annotations declaration in helper
    • 729b625c - local-cluster/provenance: Adapt configuration for the instance to run
    • f9480a2d - next-version: Deploy provenance grpc

    Compare with previous version

    • Resolved by Antoine R. Dumont

      Testing the connection to the graph grpc instance either through an ipython repl [1], issue is raised. And through curl there is just a blank response [2].

      @vlorentz Any hints please?

      [1] Try to discuss directly with the provenance grpc by instantiating the code in charge of this (the one used in the webapp)

      In [1]: from swh.provenance import get_provenance
         ...: from yaml import safe_load
         ...: from swh.model.swhids import CoreSWHID
         ...:
         ...: config = """
         ...:   cls: graph
         ...:   url: provenance-grpc-popular-ingress:80
         ...: """
         ...:
         ...: config_d = safe_load(config)
         ...: provenance=get_provenance(**config_d)
         ...:
         ...: provenance.check_config()
         ...: swhid=CoreSWHID.from_string("swh:1:cnt:07d9e8c75f4f7e7dba04b5b4e8589a158c8a6892")
         ...: unknown_swhid=CoreSWHID.from_string("swh:1:cnt:27766b99cdcab4e9b68501c3b50f1712e016c945")
         ...:
         ...: provenance.whereis(swhid=swhid)
         ...:
      ---------------------------------------------------------------------------
      _MultiThreadedRendezvous                  Traceback (most recent call last)
      Cell In[1], line 17
           14 swhid=CoreSWHID.from_string("swh:1:cnt:07d9e8c75f4f7e7dba04b5b4e8589a158c8a6892")
           15 unknown_swhid=CoreSWHID.from_string("swh:1:cnt:27766b99cdcab4e9b68501c3b50f1712e016c945")
      ---> 17 provenance.whereis(swhid=swhid)
           18 provenance.whereis(swhid=unknown_swhid)
      
      File /opt/swh/venv/lib/python3.11/site-packages/swh/provenance/backend/graph.py:166, in GraphProvenance.whereis(self, swhid)
          156 def whereis(self, *, swhid: CoreSWHID) -> Optional[QualifiedSWHID]:
          157     """Given a SWHID return a QualifiedSWHID with some provenance info:
          158
          159     - the release or revision containing that content or directory
         (...)    164     be an association release if any.
          165     """
      --> 166     anchor = self._get_anchor(swhid, "rel")
          167     if anchor is None:
          168         anchor = self._get_anchor(swhid, "rev")
      
      File /opt/swh/venv/lib/python3.11/site-packages/swh/provenance/backend/graph.py:77, in GraphProvenance._get_anchor(self, swhid, leaf_type)
           75 try:
           76     t0 = monotonic()
      ---> 77     resp = list(self._stub.Traverse(anchor_search))
           78 except grpc.RpcError as exc:
           79     if exc.code() == grpc.StatusCode.NOT_FOUND:
      
      File /opt/swh/venv/lib/python3.11/site-packages/grpc/_channel.py:543, in _Rendezvous.__next__(self)
          542 def __next__(self):
      --> 543     return self._next()
      
      File /opt/swh/venv/lib/python3.11/site-packages/grpc/_channel.py:969, in _MultiThreadedRendezvous._next(self)
          967     raise StopIteration()
          968 elif self._state.code is not None:
      --> 969     raise self
      
      _MultiThreadedRendezvous: <_MultiThreadedRendezvous of RPC that terminated with:
              status = StatusCode.UNIMPLEMENTED
              details = ""
              debug_error_string = "UNKNOWN:Error received from peer  {grpc_message:"", grpc_status:12, created_time:"2025-03-21T15:14:05.003033412+00:00"}"
      >

      [2] After dropping the hard-coding authentication code [0]

      $ curl -s http://web-local-archive-ingress/api/1/provenance/whereis/swh:1:cnt:dcb2d732994e615aab0777bfe625bd1f07e486ac
      $ # no response ^

      [0] Internal deployment detail (comment out authentication code part [not configurable] and push changes to the pod).

      $ grep "_check_auth_and_permission(request)" ~/swh/swh-environment/swh-web/swh/web/provenance/api_views.py
          # _check_auth_and_permission(request)
          # _check_auth_and_permission(request)
      $ namespace=swh; pod=web-local-archive-5b557fd5b4-9d7wz; swh-kubectl kind cp ~/swh/swh-environment/swh-web/swh/web/provenance/api_views.py $namespace/$pod:/opt/swh/venv/lib/python3.11/site-packages/swh/web/provenance/api_views.py
      + case "$1" in
      + context=kind-local-cluster
      + shift
      + kubectl --context kind-local-cluster cp /home/tony/swh/swh-environment/swh-web/swh/web/provenance/api_views.py swh/web-local-archive-5b557fd5b4-9d7wz:/opt/swh/venv/lib/python3.11/site-packages/swh/web/provenance/api_views.py
      Defaulted container "web-local-archive" out of: web-local-archive, nginx, prepare-configuration (init), do-migration (init), prepare-static (init)
  • Antoine R. Dumont resolved all threads

    resolved all threads

  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
Please register or sign in to reply
Loading