deployment/provenance: Adapt template to manage grpc or rpc service
This makes the provenance template evolve to be able to declare a new grpc provenance server. It's still possible to deploy the rpc provenance server.
For now, it's focused on not touching the provenance rpc servers already deployed in staging and production (some environment variables got moved so there is a slight configuration checksum change but no behavioral change).
And this adds 2 grpc server instances with the smallest dataset possible:
- one in the local-cluster [1] (connection tested through toolbox [2])
- another in the next-version [3]
In another mr, we'll deploy the provenance server in staging.
TODO:
-
local-cluster -
Check response of grpc service with grpcurl (determine queries first) [2] -
Check api provenance response with webapp connected to grpc [3] -
Adapt webapp configuration to use the new provenance [4]
-
-
next-version: Deploy provenance-grpc & webapp accordingly ^ -
provenance: Add downloading witness file (to avoid concurrent downloads) -
graph: same ^
[1] local-cluster: Provenance Grpc running ok
2025-03-20T15:04:06.823292798Z Starting the swh-provenance GRPC server
2025-03-20T15:04:06.828395082Z 2025-03-20T15:04:06.828326Z INFO swh_provenance_grpc_serve: Loading graph properties and database
2025-03-20T15:04:06.843855546Z 2025-03-20T15:04:06.843767Z INFO swh_provenance::utils: Graph loaded
2025-03-20T15:04:06.904579636Z 2025-03-20T15:04:06.904494Z INFO swh_provenance::utils: Database loaded
2025-03-20T15:04:06.904596958Z 2025-03-20T15:04:06.904532Z INFO swh_provenance_grpc_serve: Starting server
2025-03-21T09:15:11.549147527Z graph-grpc-popular-20240823 2025-03-21T09:15:11.549070Z INFO swh_graph_grpc_serve: Loading graph
2025-03-21T09:15:11.556021379Z graph-grpc-popular-20240823 2025-03-21T09:15:11.555943Z INFO swh_graph_grpc_serve: Starting server
2025-03-21T09:15:31.921485587Z graph-grpc-popular-20240823 2025-03-21T09:15:31.921332Z INFO request{id=0}:stats: swh_graph_grpc_server: StatsRequest
2025-03-21T09:15:31.930900836Z graph-grpc-popular-20240823 2025-03-21T09:15:31.930779Z ERROR request{id=0}:stats: swh_graph_grpc_server: Missing compratio in /
srv/graph/2024-08-23_popular-4-shell/compressed/graph.properties
2025-03-21T09:15:31.930938605Z graph-grpc-popular-20240823 2025-03-21T09:15:31.930796Z ERROR request{id=0}:stats: swh_graph_grpc_server: Missing bitspernode in
/srv/graph/2024-08-23_popular-4-shell/compressed/graph.properties
2025-03-21T09:15:31.930951990Z graph-grpc-popular-20240823 2025-03-21T09:15:31.930802Z ERROR request{id=0}:stats: swh_graph_grpc_server: Missing bitsperlink in
/srv/graph/2024-08-23_popular-4-shell/compressed/graph.properties
2025-03-21T09:15:31.930963150Z graph-grpc-popular-20240823 2025-03-21T09:15:31.930808Z ERROR request{id=0}:stats: swh_graph_grpc_server: Missing avglocality in
/srv/graph/2024-08-23_popular-4-shell/compressed/graph.stats
2025-03-21T09:15:31.930990010Z graph-grpc-popular-20240823 2025-03-21T09:15:31.930897Z INFO request{id=0}: swh_graph_grpc_server::metrics: 200 OK - /swh.graph
.TraversalService/Stats - response: 9.64976ms - streaming: 44.992µs
2025-03-21T09:17:55.699423515Z graph-grpc-popular-20240823 2025-03-21T09:17:55.699342Z INFO request{id=1}:traverse: swh_graph_grpc_server: TraversalRequest {
src: ["swh:1:cnt:dcb2d732994e615aab0777bfe625bd1f07e486ac"], direction: Backward, edges: None, max_edges: None, min_depth: None, max_depth: None, return_nodes:
None, mask: None, max_matching_nodes: None }
2025-03-21T09:17:55.704288980Z graph-grpc-popular-20240823 2025-03-21T09:17:55.704221Z INFO request{id=1}:traverse: swh_graph_grpc_server: error=status: NotFo
und, message: "Unknown SWHID: swh:1:cnt:dcb2d732994e615aab0777bfe625bd1f07e486ac", details: [], metadata: MetadataMap { headers: {} }
2025-03-21T09:17:55.704312002Z graph-grpc-popular-20240823 2025-03-21T09:17:55.704277Z INFO request{id=1}: swh_graph_grpc_server::metrics: 200 OK - /swh.graph
.TraversalService/Traverse - response: 4.972231ms - streaming: 431ns
2025-03-21T09:27:40.574344623Z graph-grpc-popular-20240823 2025-03-21T09:27:40.574270Z INFO request{id=2}:traverse: swh_graph_grpc_server: TraversalRequest {
src: ["swh:1:cnt:dcb2d732994e615aab0777bfe625bd1f07e486ac"], direction: Backward, edges: None, max_edges: None, min_depth: None, max_depth: None, return_nodes:
None, mask: None, max_matching_nodes: None }
2025-03-21T09:27:40.574390096Z graph-grpc-popular-20240823 2025-03-21T09:27:40.574319Z INFO request{id=2}:traverse: swh_graph_grpc_server: error=status: NotFo
und, message: "Unknown SWHID: swh:1:cnt:dcb2d732994e615aab0777bfe625bd1f07e486ac", details: [], metadata: MetadataMap { headers: {} }
2025-03-21T09:27:40.574395456Z graph-grpc-popular-20240823 2025-03-21T09:27:40.574339Z INFO request{id=2}: swh_graph_grpc_server::metrics: 200 OK - /swh.graph
.TraversalService/Traverse - response: 100.985µs - streaming: 230ns
[2] local-cluster: Connection to grpc provenance ok
swh@swh-toolbox-5548445f74-x8vmb:~$ server=provenance-grpc-popular-ingress:80
grpcurl --plaintext $server list swh.provenance.ProvenanceService
swh.provenance.ProvenanceService.WhereAreOne
swh.provenance.ProvenanceService.WhereIsOne
swh@swh-toolbox-5548445f74-x8vmb:~$ unknown_swhid="swh:1:cnt:27766b99cdcab4e9b68501c3b50f1712e016c945"
grpcurl -d "{\"swhid\": \"${unknown_swhid}\"}" --plaintext $server swh.provenance.ProvenanceService.WhereIsOne
ERROR:
Code: Internal
Message: status: NotFound, message: "Unknown SWHID: swh:1:cnt:27766b99cdcab4e9b68501c3b50f1712e016c945", details: [], metadata: MetadataMap { headers: {} }
swh@swh-toolbox-5548445f74-x8vmb:~$ swhid="swh:1:cnt:07d9e8c75f4f7e7dba04b5b4e8589a158c8a6892"
grpcurl -d "{\"swhid\": \"${swhid}\"}" --plaintext $server swh.provenance.ProvenanceService.WhereIsOne
{
"swhid": "swh:1:cnt:07d9e8c75f4f7e7dba04b5b4e8589a158c8a6892",
"anchor": "swh:1:rev:aef9e137acd823aa0097f195b613f96aae619923"
}
[4] local-cluster: Connection through webapp to the new grpc provenance
$ curl -s http://web-local-archive-ingress/api/1/provenance/whereis/swh:1:cnt:07d9e8c75f4f7e7dba04b5b4e8589a158c8a6892/ | jq .
"swh:1:cnt:07d9e8c75f4f7e7dba04b5b4e8589a158c8a6892;anchor=swh:1:rev:aef9e137acd823aa0097f195b613f96aae619923"
# Missing contents
$ curl -s http://web-local-archive-ingress/api/1/provenance/whereis/swh:1:cnt:dcb2d732994e615aab0777bfe625bd1f07e486ac/ | jq .
{
"exception": "_InactiveRpcError",
"reason": "<_InactiveRpcError of RPC that terminated with:\n\tstatus = StatusCode.INTERNAL\n\tdetails = \"status: NotFound, message: \"Unknown SWHID: swh:1:cnt:dcb2d732994e615aab0777bfe625bd1f07e486ac\", details: [], metadata: MetadataMap { headers: {} }\"\n\tdebug_error_string = \"UNKNOWN:Error received from peer {grpc_message:\"status: NotFound, message: \\\"Unknown SWHID: swh:1:cnt:dcb2d732994e615aab0777bfe625bd1f07e486ac\\\", details: [], metadata: MetadataMap { headers: {} }\", grpc_status:13, created_time:\"2025-03-21T15:42:58.066805842+00:00\"}\"\n>"
}
$ curl -s http://web-local-archive-ingress/api/1/provenance/whereis/swh:1:cnt:dcb2d732994e615aab0777bfe625bd1f07e486ac/ | jq .
{
"exception": "_InactiveRpcError",
"reason": "<_InactiveRpcError of RPC that terminated with:\n\tstatus = StatusCode.INTERNAL\n\tdetails = \"status: NotFound, message: \"Unknown SWHID: swh:1:cnt:dcb2d732994e615aab0777bfe625bd1f07e486ac\", details: [], metadata: MetadataMap { headers: {} }\"\n\tdebug_error_string = \"UNKNOWN:Error received from peer {grpc_message:\"status: NotFound, message: \\\"Unknown SWHID: swh:1:cnt:dcb2d732994e615aab0777bfe625bd1f07e486ac\\\", details: [], metadata: MetadataMap { headers: {} }\", grpc_status:13, created_time:\"2025-03-21T15:43:39.432082899+00:00\"}\"\n>"
}
[3] Deploy next-version provenance grpc instance
[swh] Comparing changes between branches production and mr/adapt-provenance-deployment (per environment)...
Your branch is up to date with 'origin/production'.
[swh] Generate config in production branch for environment staging, namespace swh...
[swh] Generate config in production branch for environment staging, namespace swh-cassandra...
[swh] Generate config in production branch for environment staging, namespace next-version...
[swh] Generate config in mr/adapt-provenance-deployment branch for environment staging...
[swh] Generate config in mr/adapt-provenance-deployment branch for environment staging...
[swh] Generate config in mr/adapt-provenance-deployment branch for environment staging...
Your branch is up to date with 'origin/production'.
[swh] Generate config in production branch for environment production, namespace swh...
[swh] Generate config in production branch for environment production, namespace swh-cassandra...
[swh] Generate config in production branch for environment production, namespace next-version...
[swh] Generate config in mr/adapt-provenance-deployment branch for environment production...
[swh] Generate config in mr/adapt-provenance-deployment branch for environment production...
[swh] Generate config in mr/adapt-provenance-deployment branch for environment production...
------------- diff for environment staging namespace swh -------------
--- /tmp/swh-chart.swh.lQqjoZXu/staging-swh.before 2025-03-21 17:04:33.345117955 +0100
+++ /tmp/swh-chart.swh.lQqjoZXu/staging-swh.after 2025-03-21 17:04:33.997092876 +0100
@@ -1914,21 +1914,21 @@
fi
graph_transposed_name=${GRAPH_NAME}-transposed.graph
if [ -L ${DATASET_LOCATION}/${graph_transposed_name} ] || ! [ -f ${DATASET_LOCATION}/${graph_transposed_name} ]; then
cp -v --remove-destination ${DATASET_SOURCE}/${graph_transposed_name} ${DATASET_LOCATION}/;
fi
# Finally, we make explicit the graph is ready
touch ${WITNESS_FILE}
- graph-wait-for-dataset.sh: |
+ wait-for-dataset.sh: |
#!/usr/bin/env bash
# Uses env variables WITNESS_FILE
[ -z "${WITNESS_FILE}" ] && \
echo "<WITNESS_FILE> env variable must be set" && exit 1
while [ ! -f ${WITNESS_FILE} ]; do
echo "${WITNESS_FILE} not present, wait for it to start the graph..."
sleep $PERIOD
done
@@ -2010,20 +2010,125 @@
echo "${WITNESS_SOURCE_FILE} missing, waiting graph dataset installation..."
sleep $PERIOD
done
# For old datasets missing a .ef or in the wrong format, this fails with
# `Cannot map Elias-Fano pointer list .../graph.ef`. The solution is to
# reindex the dataset
swh graph reindex --ef ${DATASET_LOCATION}/${GRAPH_NAME} && \
touch $WITNESS_REINDEX_FILE
+ provenance-fetch-datasets.sh: |
+ #!/usr/bin/env bash
+ [ -z "${WITNESS_FETCH_FILE}" ] && \
+ echo "<WITNESS_FETCH_FILE> env variable must be set" && exit 1
+ [ -z "${DATASET_VERSION}" ] && \
+ echo "<DATASET_VERSION> env variable must be set" && exit 1
+ [ -z "${PROVENANCE_PATH}" ] && \
+ echo "<PROVENANCE_PATH> env variable must be set" && exit 1
+ [ -z "${GRAPH_PATH}" ] && \
+ echo "<GRAPH_PATH> env variable must be set" && exit 1
+
+ [ -f ${WITNESS_FETCH_FILE} ] && \
+ echo "Datasets graph & provenance <${DATASET_VERSION}> already present. Skip." && \
+ exit 0
+
+ set -e
+
+ # Create destination paths
+ mkdir -p ${PROVENANCE_PATH} ${GRAPH_PATH}
+
+ echo "Fetching datasets..."
+
+ if [ ${PROVENANCE_DATASET_FULL} = true ]; then
+ # Retrieve all the provenance dataset
+ REFS=all
+ else
+ # This excludes revisions not targetted by a snapshot
+ # Ok to use for test purposes
+ REFS=heads
+ fi
+
+ URL_PROVENANCE="s3://softwareheritage/derived_datasets/${DATASET_VERSION}/provenance/${REFS}/"
+
+ CMD_GET="aws s3 cp --no-progress --no-sign-request"
+
+ echo "1. Fetching provenance dataset (parquet files)..."
+ ${CMD_GET} --recursive "${URL_PROVENANCE}" "${PROVENANCE_PATH}"
+ echo "1. Provenance datasets installed!"
+
+ echo "2. Fetching extra graph files..."
+ URL_GRAPH="s3://softwareheritage/graph/${DATASET_VERSION}/compressed"
+
+ mkdir -p "${GRAPH_PATH}"
+ for filename in graph.pthash graph.pthash.order graph.nodes.count.txt \
+ graph.property.message.bin.zst \
+ graph.property.message.offset.bin.zst \
+ graph.property.tag_name.bin.zst \
+ graph.property.tag_name.offset.bin.zst \
+ graph.node2swhid.bin.zst graph.node2type.bin.zst; do
+ ${CMD_GET} "${URL_GRAPH}/${filename}" "${GRAPH_PATH}"
+ done
+ echo "2. Extra graph files installed!"
+
+ echo "3. Uncompressing graph files..."
+ set -x
+ # Uncompress the compressed graph *.zst files
+ for filepath in $(ls ${GRAPH_PATH}/*.zst); do
+ # Uncompress and delete the .zst file
+ [ -f "${filepath}" ] && unzstd --force --rm "${filepath}"
+ done
+ set +x
+ echo "3. Graph files uncompressed!"
+
+ # Make explicit the provenance datasets are fetched
+ touch ${WITNESS_FETCH_FILE}
+
+ echo "Provenance datasets installed!"
+
+ provenance-index-dataset.sh: |
+ #!/usr/bin/env bash
+ [ -z "${WITNESS_DATASETS_FILE}" ] && \
+ echo "<WITNESS_DATASETS_FILE> env variable must be set" && exit 1
+ [ -z "${WITNESS_INDEX_FILE}" ] && \
+ echo "<WITNESS_INDEX_FILE> env variable must be set" && exit 1
+ [ -z "${PERIOD}" ] && \
+ echo "<PERIOD> env variable must be set" && exit 1
+ [ -z "${PROVENANCE_PATH}" ] && \
+ echo "<PROVENANCE_PATH> env variable must be set" && exit 1
+
+ [ -f ${WITNESS_INDEX_FILE} ] && echo "Provenance already indexed, do nothing." && \
+ exit 0
+
+ set -eu
+
+ # Let's wait for the dataset installation
+ while [ ! -f "${WITNESS_DATASETS_FILE}" ]; do
+ echo "${WITNESS_DATASETS_FILE} missing, waiting provenance dataset installation..."
+ sleep $PERIOD
+ done
+
+ echo "Datasets file installed, build provenance dataset indexes..."
+
+ echo "provenance path: $PROVENANCE_PATH"
+ set -x
+
+ # To make the query faster, the provenance needs to build index out of the
+ # current dataset files. We store the output indexes in the same path as
+ # the dataset.
+ swh-provenance-index \
+ --database file://${PROVENANCE_PATH} && \
+ touch "${WITNESS_INDEX_FILE}" && \
+ echo "Provenance indexes built!" || \
+
+ echo "Provenance indexes failed!"
+
initialize-search-backend.sh: |
#!/usr/bin/env bash
set -eux
# Uses internally the environment variable SWH_CONFIG_FILENAME
swh search initialize
register-task-types.sh: |
#!/usr/bin/env bash
@@ -3974,21 +4079,21 @@
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 1
template:
metadata:
labels:
app: storage-replayer-content
annotations:
checksum/config: 6dab6ee0c1a4a8b88f25c2ae5ae03e8f9123247ab730bc299f03ad5e552fdd2e
- checksum/config_utils: 82ab9d2291625dd17a30267f551b45870420eefd4a90bb40f14412553c45556a
+ checksum/config_utils: 233f1b432787895386fcdfff598b35a77ca1f18d4a8f7f0136af55928674c9a9
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: swh/replayer
operator: In
values:
- "true"
@@ -4114,21 +4219,21 @@
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 1
template:
metadata:
labels:
app: storage-replayer-directory
annotations:
checksum/config: 6e201e0b3d31c59906f2f6bb40eed69c44f72f5fc46aa224a9115980194888b5
- checksum/config_utils: 82ab9d2291625dd17a30267f551b45870420eefd4a90bb40f14412553c45556a
+ checksum/config_utils: 233f1b432787895386fcdfff598b35a77ca1f18d4a8f7f0136af55928674c9a9
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: swh/replayer
operator: In
values:
- "true"
@@ -4254,21 +4359,21 @@
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 1
template:
metadata:
labels:
app: storage-replayer-extid
annotations:
checksum/config: fb5ef271dff488758f430d36ab153d3d71c2bcd5466401f8b4b8f56eecdbab09
- checksum/config_utils: 82ab9d2291625dd17a30267f551b45870420eefd4a90bb40f14412553c45556a
+ checksum/config_utils: 233f1b432787895386fcdfff598b35a77ca1f18d4a8f7f0136af55928674c9a9
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: swh/replayer
operator: In
values:
- "true"
@@ -4394,21 +4499,21 @@
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 1
template:
metadata:
labels:
app: storage-replayer-metadata
annotations:
checksum/config: 59c200997e2837af6c79d357abf2b9d3887ddd9fa1da218e6867372137a6c12c
- checksum/config_utils: 82ab9d2291625dd17a30267f551b45870420eefd4a90bb40f14412553c45556a
+ checksum/config_utils: 233f1b432787895386fcdfff598b35a77ca1f18d4a8f7f0136af55928674c9a9
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: swh/replayer
operator: In
values:
- "true"
@@ -4534,21 +4639,21 @@
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 1
template:
metadata:
labels:
app: storage-replayer-origin
annotations:
checksum/config: 8a444fbf8b876ca0413f40eced16d8f67f37e21f9d9af30dc0e9d230f99353f0
- checksum/config_utils: 82ab9d2291625dd17a30267f551b45870420eefd4a90bb40f14412553c45556a
+ checksum/config_utils: 233f1b432787895386fcdfff598b35a77ca1f18d4a8f7f0136af55928674c9a9
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: swh/replayer
operator: In
values:
- "true"
@@ -4674,21 +4779,21 @@
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 1
template:
metadata:
labels:
app: storage-replayer-origin-visit
annotations:
checksum/config: 0232863ea6af9728905e510a2c7cda793b30e51282bb61e15951295b1b4ae5be
- checksum/config_utils: 82ab9d2291625dd17a30267f551b45870420eefd4a90bb40f14412553c45556a
+ checksum/config_utils: 233f1b432787895386fcdfff598b35a77ca1f18d4a8f7f0136af55928674c9a9
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: swh/replayer
operator: In
values:
- "true"
@@ -4814,21 +4919,21 @@
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 1
template:
metadata:
labels:
app: storage-replayer-origin-visit-status
annotations:
checksum/config: 50ccfa4fef14a26d4e62b4a4a7e9548e96ecd33bff9ed38f4caee11ad4872f50
- checksum/config_utils: 82ab9d2291625dd17a30267f551b45870420eefd4a90bb40f14412553c45556a
+ checksum/config_utils: 233f1b432787895386fcdfff598b35a77ca1f18d4a8f7f0136af55928674c9a9
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: swh/replayer
operator: In
values:
- "true"
@@ -4954,21 +5059,21 @@
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 1
template:
metadata:
labels:
app: storage-replayer-raw-extrinsic-metadata
annotations:
checksum/config: f7c02005d918fc96845fbe96ca5b80e02cbbe8db33c52304977fdf5775a7b39b
- checksum/config_utils: 82ab9d2291625dd17a30267f551b45870420eefd4a90bb40f14412553c45556a
+ checksum/config_utils: 233f1b432787895386fcdfff598b35a77ca1f18d4a8f7f0136af55928674c9a9
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: swh/replayer
operator: In
values:
- "true"
@@ -5094,21 +5199,21 @@
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 1
template:
metadata:
labels:
app: storage-replayer-release
annotations:
checksum/config: 2080d9cc684b645df5f3408a379de43116e1cd92dda2b81e90d60dbeee8d4b7a
- checksum/config_utils: 82ab9d2291625dd17a30267f551b45870420eefd4a90bb40f14412553c45556a
+ checksum/config_utils: 233f1b432787895386fcdfff598b35a77ca1f18d4a8f7f0136af55928674c9a9
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: swh/replayer
operator: In
values:
- "true"
@@ -5234,21 +5339,21 @@
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 1
template:
metadata:
labels:
app: storage-replayer-revision
annotations:
checksum/config: 576ef144c0f7a064b95e0584e23ed8e1bb49d8d54ca2314992484b79b549b116
- checksum/config_utils: 82ab9d2291625dd17a30267f551b45870420eefd4a90bb40f14412553c45556a
+ checksum/config_utils: 233f1b432787895386fcdfff598b35a77ca1f18d4a8f7f0136af55928674c9a9
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: swh/replayer
operator: In
values:
- "true"
@@ -5374,21 +5479,21 @@
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 1
template:
metadata:
labels:
app: storage-replayer-skipped-content
annotations:
checksum/config: 7bb2dae23e587e50bb4d18ddbd484ea63585e1d728b9dfdcdd27fc62435cee7f
- checksum/config_utils: 82ab9d2291625dd17a30267f551b45870420eefd4a90bb40f14412553c45556a
+ checksum/config_utils: 233f1b432787895386fcdfff598b35a77ca1f18d4a8f7f0136af55928674c9a9
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: swh/replayer
operator: In
values:
- "true"
@@ -5514,21 +5619,21 @@
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 1
template:
metadata:
labels:
app: storage-replayer-snapshot
annotations:
checksum/config: cddbe0bf9d91c24560eb3af9f843a00986a2d524dba0fdc5c111a6502caeed38
- checksum/config_utils: 82ab9d2291625dd17a30267f551b45870420eefd4a90bb40f14412553c45556a
+ checksum/config_utils: 233f1b432787895386fcdfff598b35a77ca1f18d4a8f7f0136af55928674c9a9
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: swh/replayer
operator: In
values:
- "true"
@@ -5655,21 +5760,21 @@
type: RollingUpdate
rollingUpdate:
maxSurge: 1
template:
metadata:
labels:
app: storage-postgresql-read-only
annotations:
checksum/config: 557a29778d601193c743a35a7075315560447fc8740ba7922d5d52ce3f0c621e
checksum/config-logging: cca6d0318bd776cd9bee0901e67e4db9fa401456f6f03f569c15468c5e62bea7
- checksum/backend-utils: 82ab9d2291625dd17a30267f551b45870420eefd4a90bb40f14412553c45556a
+ checksum/backend-utils: 233f1b432787895386fcdfff598b35a77ca1f18d4a8f7f0136af55928674c9a9
checksum/config-utils: d75ca13b805bce6a8ab59c8e24c938f2283108f6a79134f6e71db86308651dc6
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: swh/storage
operator: In
values:
@@ -5820,21 +5925,21 @@
type: RollingUpdate
rollingUpdate:
maxSurge: 1
template:
metadata:
labels:
app: storage-postgresql-read-write
annotations:
checksum/config: 7673b003223acd4a3a0d130516efd4f89740163d29c4800d53c9afe16b8d21a2
checksum/config-logging: c9f05b677492d0f7443fc8193c82673ce3c550f351b82dcf616a247f7477fae0
- checksum/backend-utils: 82ab9d2291625dd17a30267f551b45870420eefd4a90bb40f14412553c45556a
+ checksum/backend-utils: 233f1b432787895386fcdfff598b35a77ca1f18d4a8f7f0136af55928674c9a9
checksum/config-utils: d75ca13b805bce6a8ab59c8e24c938f2283108f6a79134f6e71db86308651dc6
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: swh/storage
operator: In
values:
------------- diff for environment staging namespace swh-cassandra -------------
--- /tmp/swh-chart.swh.lQqjoZXu/staging-swh-cassandra.before 2025-03-21 17:04:33.705104109 +0100
+++ /tmp/swh-chart.swh.lQqjoZXu/staging-swh-cassandra.after 2025-03-21 17:04:34.321080414 +0100
@@ -8875,21 +8875,21 @@
fi
graph_transposed_name=${GRAPH_NAME}-transposed.graph
if [ -L ${DATASET_LOCATION}/${graph_transposed_name} ] || ! [ -f ${DATASET_LOCATION}/${graph_transposed_name} ]; then
cp -v --remove-destination ${DATASET_SOURCE}/${graph_transposed_name} ${DATASET_LOCATION}/;
fi
# Finally, we make explicit the graph is ready
touch ${WITNESS_FILE}
- graph-wait-for-dataset.sh: |
+ wait-for-dataset.sh: |
#!/usr/bin/env bash
# Uses env variables WITNESS_FILE
[ -z "${WITNESS_FILE}" ] && \
echo "<WITNESS_FILE> env variable must be set" && exit 1
while [ ! -f ${WITNESS_FILE} ]; do
echo "${WITNESS_FILE} not present, wait for it to start the graph..."
sleep $PERIOD
done
@@ -8971,20 +8971,125 @@
echo "${WITNESS_SOURCE_FILE} missing, waiting graph dataset installation..."
sleep $PERIOD
done
# For old datasets missing a .ef or in the wrong format, this fails with
# `Cannot map Elias-Fano pointer list .../graph.ef`. The solution is to
# reindex the dataset
swh graph reindex --ef ${DATASET_LOCATION}/${GRAPH_NAME} && \
touch $WITNESS_REINDEX_FILE
+ provenance-fetch-datasets.sh: |
+ #!/usr/bin/env bash
+ [ -z "${WITNESS_FETCH_FILE}" ] && \
+ echo "<WITNESS_FETCH_FILE> env variable must be set" && exit 1
+ [ -z "${DATASET_VERSION}" ] && \
+ echo "<DATASET_VERSION> env variable must be set" && exit 1
+ [ -z "${PROVENANCE_PATH}" ] && \
+ echo "<PROVENANCE_PATH> env variable must be set" && exit 1
+ [ -z "${GRAPH_PATH}" ] && \
+ echo "<GRAPH_PATH> env variable must be set" && exit 1
+
+ [ -f ${WITNESS_FETCH_FILE} ] && \
+ echo "Datasets graph & provenance <${DATASET_VERSION}> already present. Skip." && \
+ exit 0
+
+ set -e
+
+ # Create destination paths
+ mkdir -p ${PROVENANCE_PATH} ${GRAPH_PATH}
+
+ echo "Fetching datasets..."
+
+ if [ ${PROVENANCE_DATASET_FULL} = true ]; then
+ # Retrieve all the provenance dataset
+ REFS=all
+ else
+ # This excludes revisions not targetted by a snapshot
+ # Ok to use for test purposes
+ REFS=heads
+ fi
+
+ URL_PROVENANCE="s3://softwareheritage/derived_datasets/${DATASET_VERSION}/provenance/${REFS}/"
+
+ CMD_GET="aws s3 cp --no-progress --no-sign-request"
+
+ echo "1. Fetching provenance dataset (parquet files)..."
+ ${CMD_GET} --recursive "${URL_PROVENANCE}" "${PROVENANCE_PATH}"
+ echo "1. Provenance datasets installed!"
+
+ echo "2. Fetching extra graph files..."
+ URL_GRAPH="s3://softwareheritage/graph/${DATASET_VERSION}/compressed"
+
+ mkdir -p "${GRAPH_PATH}"
+ for filename in graph.pthash graph.pthash.order graph.nodes.count.txt \
+ graph.property.message.bin.zst \
+ graph.property.message.offset.bin.zst \
+ graph.property.tag_name.bin.zst \
+ graph.property.tag_name.offset.bin.zst \
+ graph.node2swhid.bin.zst graph.node2type.bin.zst; do
+ ${CMD_GET} "${URL_GRAPH}/${filename}" "${GRAPH_PATH}"
+ done
+ echo "2. Extra graph files installed!"
+
+ echo "3. Uncompressing graph files..."
+ set -x
+ # Uncompress the compressed graph *.zst files
+ for filepath in $(ls ${GRAPH_PATH}/*.zst); do
+ # Uncompress and delete the .zst file
+ [ -f "${filepath}" ] && unzstd --force --rm "${filepath}"
+ done
+ set +x
+ echo "3. Graph files uncompressed!"
+
+ # Make explicit the provenance datasets are fetched
+ touch ${WITNESS_FETCH_FILE}
+
+ echo "Provenance datasets installed!"
+
+ provenance-index-dataset.sh: |
+ #!/usr/bin/env bash
+ [ -z "${WITNESS_DATASETS_FILE}" ] && \
+ echo "<WITNESS_DATASETS_FILE> env variable must be set" && exit 1
+ [ -z "${WITNESS_INDEX_FILE}" ] && \
+ echo "<WITNESS_INDEX_FILE> env variable must be set" && exit 1
+ [ -z "${PERIOD}" ] && \
+ echo "<PERIOD> env variable must be set" && exit 1
+ [ -z "${PROVENANCE_PATH}" ] && \
+ echo "<PROVENANCE_PATH> env variable must be set" && exit 1
+
+ [ -f ${WITNESS_INDEX_FILE} ] && echo "Provenance already indexed, do nothing." && \
+ exit 0
+
+ set -eu
+
+ # Let's wait for the dataset installation
+ while [ ! -f "${WITNESS_DATASETS_FILE}" ]; do
+ echo "${WITNESS_DATASETS_FILE} missing, waiting provenance dataset installation..."
+ sleep $PERIOD
+ done
+
+ echo "Datasets file installed, build provenance dataset indexes..."
+
+ echo "provenance path: $PROVENANCE_PATH"
+ set -x
+
+ # To make the query faster, the provenance needs to build index out of the
+ # current dataset files. We store the output indexes in the same path as
+ # the dataset.
+ swh-provenance-index \
+ --database file://${PROVENANCE_PATH} && \
+ touch "${WITNESS_INDEX_FILE}" && \
+ echo "Provenance indexes built!" || \
+
+ echo "Provenance indexes failed!"
+
initialize-search-backend.sh: |
#!/usr/bin/env bash
set -eux
# Uses internally the environment variable SWH_CONFIG_FILENAME
swh search initialize
register-task-types.sh: |
#!/usr/bin/env bash
@@ -11668,21 +11773,21 @@
type: RollingUpdate
rollingUpdate:
maxSurge: 1
template:
metadata:
labels:
app: graph-grpc-python3k
annotations:
checksum/config: b73b013412ed4679009823fcd1967f46c6c74ce0cac466277d96a7303b1ee5e9
checksum/config-utils: 13a26f6add17e96ce01550153c77dcd48de60241a3f4db3c93d5467234be2a7f
- checksum/backend-utils: 498bb7b35f4e2d6996251c8615bff7661ffa9981f2f69734acb80824fc37d2b1
+ checksum/backend-utils: bfb79c03c5f47eacbf1cedde17860a255223193cdeb60f9e3455084af3bab88c
spec:
nodeSelector:
kubernetes.io/hostname: rancher-node-staging-rke2-metal01
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: swh/graph
operator: In
@@ -11737,21 +11842,21 @@
- name: graph-python3k-persistent
mountPath: /srv/dataset
readOnly: false
- name: wait-for-dataset
image: container-registry.softwareheritage.org/swh/infra/swh-apps/utils:20250211.1
imagePullPolicy: IfNotPresent
command:
- - /entrypoints/graph-wait-for-dataset.sh
+ - /entrypoints/wait-for-dataset.sh
env:
- name: WITNESS_FILE
value: /srv/graph/2021-03-23-popular-3k-python/compressed/.graph-is-initialized
- name: DATASET_LOCATION
value: /srv/graph/2021-03-23-popular-3k-python/compressed
- name: PERIOD
value: "3"
volumeMounts:
- name: backend-utils
mountPath: /entrypoints
@@ -11882,21 +11987,21 @@
type: RollingUpdate
rollingUpdate:
maxSurge: 1
template:
metadata:
labels:
app: graph-rpc-python3k
annotations:
checksum/config: cd2257ef14a7e6adb8b613ab147d995cfc4b4f250daba77c5b5cdbd63dcb1a35
checksum/config-utils: 13a26f6add17e96ce01550153c77dcd48de60241a3f4db3c93d5467234be2a7f
- checksum/backend-utils: 498bb7b35f4e2d6996251c8615bff7661ffa9981f2f69734acb80824fc37d2b1
+ checksum/backend-utils: bfb79c03c5f47eacbf1cedde17860a255223193cdeb60f9e3455084af3bab88c
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: swh/graph
operator: In
values:
- "true"
@@ -12129,21 +12234,21 @@
type: RollingUpdate
rollingUpdate:
maxSurge: 1
template:
metadata:
labels:
app: indexer-storage-rpc
annotations:
checksum/config: aba89c9cffc506b56207b7cc0377f0183f9e75493e42ee5a8e199dbc7d573ade
checksum/config-logging: 7d9616b680a77c6ec7ba4c1a1c0f3fbf6343c4fc132847f8f4e313d965014749
- checksum/backend-utils: 498bb7b35f4e2d6996251c8615bff7661ffa9981f2f69734acb80824fc37d2b1
+ checksum/backend-utils: bfb79c03c5f47eacbf1cedde17860a255223193cdeb60f9e3455084af3bab88c
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: swh/rpc
operator: In
values:
- "true"
@@ -19984,20 +20089,21 @@
rollingUpdate:
maxSurge: 1
template:
metadata:
labels:
app: provenance-graph-granet
annotations:
checksum/config: e466b9f13c5124eedcaa557de44e64caf7c7ff4aa9ab5dab35b7ceede6a09568
checksum/config-logging: ddcd27d991938c46f4fc0ad7ee028cb3005f186b3db022596c9ae94363881e4f
checksum/config-utils: 13a26f6add17e96ce01550153c77dcd48de60241a3f4db3c93d5467234be2a7f
+ checksum/backend-utils: bfb79c03c5f47eacbf1cedde17860a255223193cdeb60f9e3455084af3bab88c
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: swh/rpc
operator: In
values:
- "true"
@@ -20017,21 +20123,21 @@
mountPath: /etc/swh/configuration-template
- name: config-utils
mountPath: /entrypoints
readOnly: true
containers:
- name: provenance-graph-granet
resources:
requests:
memory: 512Mi
cpu: 500m
- image: container-registry.softwareheritage.org/swh/infra/swh-apps/provenance:20250319.1
+ image: container-registry.softwareheritage.org/swh/infra/swh-apps/provenance:20250321.1
imagePullPolicy: IfNotPresent
ports:
- containerPort: 5014
name: rpc
readinessProbe:
httpGet:
path: /
port: rpc
initialDelaySeconds: 15
failureThreshold: 30
@@ -20040,76 +20146,88 @@
tcpSocket:
port: rpc
initialDelaySeconds: 10
periodSeconds: 5
command:
- /bin/bash
args:
- -c
- /opt/swh/entrypoint.sh
env:
+ - name: PROVENANCE_TYPE
+ value: rpc
+ - name: PORT
+ value: "5014"
- name: WORKERS
value: "4"
- name: THREADS
value: "1"
- name: TIMEOUT
value: "60"
+ - name: SWH_CONFIG_FILENAME
+ value: /etc/swh/config.yml
+ - name: SWH_LOG_CONFIG_JSON
+ value: /etc/swh/logging/logging-gunicorn.json
+ - name: STATSD_SERVICE_TYPE
+ value: provenance-graph-granet
- name: STATSD_HOST
value: prometheus-statsd-exporter
- name: STATSD_PORT
value: "9125"
- name: STATSD_TAGS
value: deployment:provenance-graph-granet
- - name: STATSD_SERVICE_TYPE
- value: provenance-graph-granet
- name: SWH_LOG_LEVEL
- value: "INFO"
- - name: SWH_LOG_CONFIG_JSON
- value: /etc/swh/logging/logging-gunicorn.json
+ value: INFO
- name: SWH_SENTRY_ENVIRONMENT
value: staging
- name: SWH_MAIN_PACKAGE
value: swh.provenance
- name: SWH_SENTRY_DSN
valueFrom:
secretKeyRef:
name: common-secrets
key: provenance-sentry-dsn
# 'name' secret should exist & include key
# if the setting doesn't exist, sentry pushes will be disabled
optional: true
- name: SWH_SENTRY_DISABLE_LOGGING_EVENTS
value: "true"
volumeMounts:
- name: configuration
mountPath: /etc/swh
- name: configuration-logging
mountPath: /etc/swh/logging
+
volumes:
- name: configuration
emptyDir: {}
- name: configuration-template
configMap:
name: provenance-graph-granet-configuration-template
items:
- key: "config.yml.template"
path: "config.yml.template"
- name: configuration-logging
configMap:
name: provenance-graph-granet-configuration-logging
items:
- key: "logging-gunicorn.json"
path: "logging-gunicorn.json"
+
- name: config-utils
configMap:
name: config-utils
defaultMode: 0555
+ - name: backend-utils
+ configMap:
+ name: backend-utils
+ defaultMode: 0555
---
# Source: swh/templates/scheduler/extra-services-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
namespace: swh-cassandra
name: scheduler-listener
labels:
app: scheduler-listener
spec:
@@ -21523,21 +21641,21 @@
rollingUpdate:
maxSurge: 1
template:
metadata:
labels:
app: search-rpc
annotations:
checksum/config: cb3458d5372d0f78a475da3ac1b4f474cebe95188ad4f0931eba0a7c9657122e
checksum/config-logging: 7bffbc6ce2cb11d88208ef0c5f1d8e6822659c361717afb51dcf0f4da02fe1f7
checksum/config-utils: 13a26f6add17e96ce01550153c77dcd48de60241a3f4db3c93d5467234be2a7f
- checksum/backend-utils: 498bb7b35f4e2d6996251c8615bff7661ffa9981f2f69734acb80824fc37d2b1
+ checksum/backend-utils: bfb79c03c5f47eacbf1cedde17860a255223193cdeb60f9e3455084af3bab88c
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: swh/rpc
operator: In
values:
- "true"
@@ -21679,21 +21797,21 @@
rollingUpdate:
maxSurge: 1
template:
metadata:
labels:
app: search-static-rpc
annotations:
checksum/config: 7a5b75d976f3579a1c831aa35c012f5f15ff2e2b488fe1ae7f1da9ee4bb8ca3e
checksum/config-logging: bc2025f41b3eb8aa28b66033b96fdc1cb963f5c01fe33b2417c2378f715dbc32
checksum/config-utils: 13a26f6add17e96ce01550153c77dcd48de60241a3f4db3c93d5467234be2a7f
- checksum/backend-utils: 498bb7b35f4e2d6996251c8615bff7661ffa9981f2f69734acb80824fc37d2b1
+ checksum/backend-utils: bfb79c03c5f47eacbf1cedde17860a255223193cdeb60f9e3455084af3bab88c
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: swh/rpc
operator: In
values:
- "true"
@@ -21879,21 +21997,21 @@
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 1
template:
metadata:
labels:
app: storage-replayer-content
annotations:
checksum/config: 524d18d676bdcddf14d63ac8397a0e82dfc86601e69a4684fef07cfaa6953fd8
- checksum/config_utils: 498bb7b35f4e2d6996251c8615bff7661ffa9981f2f69734acb80824fc37d2b1
+ checksum/config_utils: bfb79c03c5f47eacbf1cedde17860a255223193cdeb60f9e3455084af3bab88c
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: swh/replayer
operator: In
values:
- "true"
@@ -22016,21 +22134,21 @@
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 1
template:
metadata:
labels:
app: storage-replayer-directory
annotations:
checksum/config: 0cacef7c155ca3e3df82cb77ae2b6bb8bb9ac49b19209f2e065c96bc4ba76ef8
- checksum/config_utils: 498bb7b35f4e2d6996251c8615bff7661ffa9981f2f69734acb80824fc37d2b1
+ checksum/config_utils: bfb79c03c5f47eacbf1cedde17860a255223193cdeb60f9e3455084af3bab88c
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: swh/replayer
operator: In
values:
- "true"
@@ -22153,21 +22271,21 @@
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 1
template:
metadata:
labels:
app: storage-replayer-extid
annotations:
checksum/config: a05be94c147c42ea8e51b234f50f2049496d777e0a00887817305ca00e31b924
- checksum/config_utils: 498bb7b35f4e2d6996251c8615bff7661ffa9981f2f69734acb80824fc37d2b1
+ checksum/config_utils: bfb79c03c5f47eacbf1cedde17860a255223193cdeb60f9e3455084af3bab88c
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: swh/replayer
operator: In
values:
- "true"
@@ -22290,21 +22408,21 @@
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 1
template:
metadata:
labels:
app: storage-replayer-metadata
annotations:
checksum/config: d237c40e4c2887e8c5268cd357ec129e5ef9d1e9d0fc6ef9603878a0a4f18acf
- checksum/config_utils: 498bb7b35f4e2d6996251c8615bff7661ffa9981f2f69734acb80824fc37d2b1
+ checksum/config_utils: bfb79c03c5f47eacbf1cedde17860a255223193cdeb60f9e3455084af3bab88c
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: swh/replayer
operator: In
values:
- "true"
@@ -22427,21 +22545,21 @@
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 1
template:
metadata:
labels:
app: storage-replayer-origin
annotations:
checksum/config: 6fb97eb4a6d4f4fa34cb922c7c7a2d54079ade1818d0146e5c7bb9d299b4fd34
- checksum/config_utils: 498bb7b35f4e2d6996251c8615bff7661ffa9981f2f69734acb80824fc37d2b1
+ checksum/config_utils: bfb79c03c5f47eacbf1cedde17860a255223193cdeb60f9e3455084af3bab88c
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: swh/replayer
operator: In
values:
- "true"
@@ -22564,21 +22682,21 @@
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 1
template:
metadata:
labels:
app: storage-replayer-origin-visit
annotations:
checksum/config: 8feb5282dc24309e969ad447d420801a0cd826820817bcef0862df9805aea973
- checksum/config_utils: 498bb7b35f4e2d6996251c8615bff7661ffa9981f2f69734acb80824fc37d2b1
+ checksum/config_utils: bfb79c03c5f47eacbf1cedde17860a255223193cdeb60f9e3455084af3bab88c
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: swh/replayer
operator: In
values:
- "true"
@@ -22701,21 +22819,21 @@
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 1
template:
metadata:
labels:
app: storage-replayer-origin-visit-status
annotations:
checksum/config: a345553096d29938201239f63e9d271d80fb1e516a63745da5c165060b2de2b8
- checksum/config_utils: 498bb7b35f4e2d6996251c8615bff7661ffa9981f2f69734acb80824fc37d2b1
+ checksum/config_utils: bfb79c03c5f47eacbf1cedde17860a255223193cdeb60f9e3455084af3bab88c
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: swh/replayer
operator: In
values:
- "true"
@@ -22838,21 +22956,21 @@
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 1
template:
metadata:
labels:
app: storage-replayer-raw-extrinsic-metadata
annotations:
checksum/config: 43306ba69c4c8557fbd4bfca185bd5edfb8f70fc84bee6667019bd333f602054
- checksum/config_utils: 498bb7b35f4e2d6996251c8615bff7661ffa9981f2f69734acb80824fc37d2b1
+ checksum/config_utils: bfb79c03c5f47eacbf1cedde17860a255223193cdeb60f9e3455084af3bab88c
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: swh/replayer
operator: In
values:
- "true"
@@ -22975,21 +23093,21 @@
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 1
template:
metadata:
labels:
app: storage-replayer-release
annotations:
checksum/config: eb1a796fdffbeba1776f6709ddd4855d41e82fd600284d8095e95fe1419c651d
- checksum/config_utils: 498bb7b35f4e2d6996251c8615bff7661ffa9981f2f69734acb80824fc37d2b1
+ checksum/config_utils: bfb79c03c5f47eacbf1cedde17860a255223193cdeb60f9e3455084af3bab88c
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: swh/replayer
operator: In
values:
- "true"
@@ -23112,21 +23230,21 @@
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 1
template:
metadata:
labels:
app: storage-replayer-revision
annotations:
checksum/config: cf1bf19391e5330e49b63bd2a1baf795b6fc08a8229fa4e774ca960fb05eaaf9
- checksum/config_utils: 498bb7b35f4e2d6996251c8615bff7661ffa9981f2f69734acb80824fc37d2b1
+ checksum/config_utils: bfb79c03c5f47eacbf1cedde17860a255223193cdeb60f9e3455084af3bab88c
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: swh/replayer
operator: In
values:
- "true"
@@ -23249,21 +23367,21 @@
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 1
template:
metadata:
labels:
app: storage-replayer-skipped-content
annotations:
checksum/config: 4213e5e6ff3ae51225dc1ec6f9f48a8ee3f206e6b8e69fd61dfe6335437d9bf9
- checksum/config_utils: 498bb7b35f4e2d6996251c8615bff7661ffa9981f2f69734acb80824fc37d2b1
+ checksum/config_utils: bfb79c03c5f47eacbf1cedde17860a255223193cdeb60f9e3455084af3bab88c
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: swh/replayer
operator: In
values:
- "true"
@@ -23386,21 +23504,21 @@
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 1
template:
metadata:
labels:
app: storage-replayer-snapshot
annotations:
checksum/config: d10ea86ff29eebbdd9fbf159d297fed599fe1833a8140149f0229df7eaba1b34
- checksum/config_utils: 498bb7b35f4e2d6996251c8615bff7661ffa9981f2f69734acb80824fc37d2b1
+ checksum/config_utils: bfb79c03c5f47eacbf1cedde17860a255223193cdeb60f9e3455084af3bab88c
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: swh/replayer
operator: In
values:
- "true"
@@ -23524,21 +23642,21 @@
type: RollingUpdate
rollingUpdate:
maxSurge: 1
template:
metadata:
labels:
app: storage-cassandra
annotations:
checksum/config: b840eaf8faafacbf7f7f08c78e69c3b7028b992f5681acb5de76b177f0a2b3a9
checksum/config-logging: 2f7a56936b194188f70175c52dc180320fcc071e5c110562a9f031116fadefd2
- checksum/backend-utils: 498bb7b35f4e2d6996251c8615bff7661ffa9981f2f69734acb80824fc37d2b1
+ checksum/backend-utils: bfb79c03c5f47eacbf1cedde17860a255223193cdeb60f9e3455084af3bab88c
checksum/config-utils: 13a26f6add17e96ce01550153c77dcd48de60241a3f4db3c93d5467234be2a7f
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: swh/storage
operator: In
values:
@@ -23695,21 +23813,21 @@
type: RollingUpdate
rollingUpdate:
maxSurge: 1
template:
metadata:
labels:
app: storage-cassandra-read-only
annotations:
checksum/config: 5fe1511b81d6079cf97c2918094816926a283f83778bec40e117a7100441a4c9
checksum/config-logging: 7403d71b4a2e4da28cc9c1af0b9d022e85bf0b5ffcb738dc9f2b6dcfa3e14456
- checksum/backend-utils: 498bb7b35f4e2d6996251c8615bff7661ffa9981f2f69734acb80824fc37d2b1
+ checksum/backend-utils: bfb79c03c5f47eacbf1cedde17860a255223193cdeb60f9e3455084af3bab88c
checksum/config-utils: 13a26f6add17e96ce01550153c77dcd48de60241a3f4db3c93d5467234be2a7f
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: swh/storage
operator: In
values:
------------- diff for environment staging namespace next-version -------------
--- /tmp/swh-chart.swh.lQqjoZXu/staging-next-version.before 2025-03-21 17:04:33.857098261 +0100
+++ /tmp/swh-chart.swh.lQqjoZXu/staging-next-version.after 2025-03-21 17:04:34.469074721 +0100
@@ -4005,21 +4005,21 @@
fi
graph_transposed_name=${GRAPH_NAME}-transposed.graph
if [ -L ${DATASET_LOCATION}/${graph_transposed_name} ] || ! [ -f ${DATASET_LOCATION}/${graph_transposed_name} ]; then
cp -v --remove-destination ${DATASET_SOURCE}/${graph_transposed_name} ${DATASET_LOCATION}/;
fi
# Finally, we make explicit the graph is ready
touch ${WITNESS_FILE}
- graph-wait-for-dataset.sh: |
+ wait-for-dataset.sh: |
#!/usr/bin/env bash
# Uses env variables WITNESS_FILE
[ -z "${WITNESS_FILE}" ] && \
echo "<WITNESS_FILE> env variable must be set" && exit 1
while [ ! -f ${WITNESS_FILE} ]; do
echo "${WITNESS_FILE} not present, wait for it to start the graph..."
sleep $PERIOD
done
@@ -4101,20 +4101,125 @@
echo "${WITNESS_SOURCE_FILE} missing, waiting graph dataset installation..."
sleep $PERIOD
done
# For old datasets missing a .ef or in the wrong format, this fails with
# `Cannot map Elias-Fano pointer list .../graph.ef`. The solution is to
# reindex the dataset
swh graph reindex --ef ${DATASET_LOCATION}/${GRAPH_NAME} && \
touch $WITNESS_REINDEX_FILE
+ provenance-fetch-datasets.sh: |
+ #!/usr/bin/env bash
+ [ -z "${WITNESS_FETCH_FILE}" ] && \
+ echo "<WITNESS_FETCH_FILE> env variable must be set" && exit 1
+ [ -z "${DATASET_VERSION}" ] && \
+ echo "<DATASET_VERSION> env variable must be set" && exit 1
+ [ -z "${PROVENANCE_PATH}" ] && \
+ echo "<PROVENANCE_PATH> env variable must be set" && exit 1
+ [ -z "${GRAPH_PATH}" ] && \
+ echo "<GRAPH_PATH> env variable must be set" && exit 1
+
+ [ -f ${WITNESS_FETCH_FILE} ] && \
+ echo "Datasets graph & provenance <${DATASET_VERSION}> already present. Skip." && \
+ exit 0
+
+ set -e
+
+ # Create destination paths
+ mkdir -p ${PROVENANCE_PATH} ${GRAPH_PATH}
+
+ echo "Fetching datasets..."
+
+ if [ ${PROVENANCE_DATASET_FULL} = true ]; then
+ # Retrieve all the provenance dataset
+ REFS=all
+ else
+ # This excludes revisions not targetted by a snapshot
+ # Ok to use for test purposes
+ REFS=heads
+ fi
+
+ URL_PROVENANCE="s3://softwareheritage/derived_datasets/${DATASET_VERSION}/provenance/${REFS}/"
+
+ CMD_GET="aws s3 cp --no-progress --no-sign-request"
+
+ echo "1. Fetching provenance dataset (parquet files)..."
+ ${CMD_GET} --recursive "${URL_PROVENANCE}" "${PROVENANCE_PATH}"
+ echo "1. Provenance datasets installed!"
+
+ echo "2. Fetching extra graph files..."
+ URL_GRAPH="s3://softwareheritage/graph/${DATASET_VERSION}/compressed"
+
+ mkdir -p "${GRAPH_PATH}"
+ for filename in graph.pthash graph.pthash.order graph.nodes.count.txt \
+ graph.property.message.bin.zst \
+ graph.property.message.offset.bin.zst \
+ graph.property.tag_name.bin.zst \
+ graph.property.tag_name.offset.bin.zst \
+ graph.node2swhid.bin.zst graph.node2type.bin.zst; do
+ ${CMD_GET} "${URL_GRAPH}/${filename}" "${GRAPH_PATH}"
+ done
+ echo "2. Extra graph files installed!"
+
+ echo "3. Uncompressing graph files..."
+ set -x
+ # Uncompress the compressed graph *.zst files
+ for filepath in $(ls ${GRAPH_PATH}/*.zst); do
+ # Uncompress and delete the .zst file
+ [ -f "${filepath}" ] && unzstd --force --rm "${filepath}"
+ done
+ set +x
+ echo "3. Graph files uncompressed!"
+
+ # Make explicit the provenance datasets are fetched
+ touch ${WITNESS_FETCH_FILE}
+
+ echo "Provenance datasets installed!"
+
+ provenance-index-dataset.sh: |
+ #!/usr/bin/env bash
+ [ -z "${WITNESS_DATASETS_FILE}" ] && \
+ echo "<WITNESS_DATASETS_FILE> env variable must be set" && exit 1
+ [ -z "${WITNESS_INDEX_FILE}" ] && \
+ echo "<WITNESS_INDEX_FILE> env variable must be set" && exit 1
+ [ -z "${PERIOD}" ] && \
+ echo "<PERIOD> env variable must be set" && exit 1
+ [ -z "${PROVENANCE_PATH}" ] && \
+ echo "<PROVENANCE_PATH> env variable must be set" && exit 1
+
+ [ -f ${WITNESS_INDEX_FILE} ] && echo "Provenance already indexed, do nothing." && \
+ exit 0
+
+ set -eu
+
+ # Let's wait for the dataset installation
+ while [ ! -f "${WITNESS_DATASETS_FILE}" ]; do
+ echo "${WITNESS_DATASETS_FILE} missing, waiting provenance dataset installation..."
+ sleep $PERIOD
+ done
+
+ echo "Datasets file installed, build provenance dataset indexes..."
+
+ echo "provenance path: $PROVENANCE_PATH"
+ set -x
+
+ # To make the query faster, the provenance needs to build index out of the
+ # current dataset files. We store the output indexes in the same path as
+ # the dataset.
+ swh-provenance-index \
+ --database file://${PROVENANCE_PATH} && \
+ touch "${WITNESS_INDEX_FILE}" && \
+ echo "Provenance indexes built!" || \
+
+ echo "Provenance indexes failed!"
+
initialize-search-backend.sh: |
#!/usr/bin/env bash
set -eux
# Uses internally the environment variable SWH_CONFIG_FILENAME
swh search initialize
register-task-types.sh: |
#!/usr/bin/env bash
@@ -4669,23 +4774,22 @@
enable_requests_retry: true
url: http://storage-ro-cassandra:5002
corner_ribbon_text: StagingNextVersion
show_corner_ribbon: "true"
search:
cls: remote
enable_requests_retry: true
url: http://search-rpc:5010
provenance:
- cls: remote
- enable_requests_retry: true
- url: http://webapp-provenance-ingress-next-version
+ cls: grpc
+ url: provenance-grpc-next-version-ingress:80
scheduler:
cls: remote
url: http://scheduler-rpc:5008
vault:
cls: remote
enable_requests_retry: true
url: http://vault-rpc:5005
graph:
max_edges:
anonymous: 1000
@@ -4803,20 +4907,37 @@
- kafka-cluster-kafka-brokers:9092
auto_offset_reset: latest
group_id: staging-next-version-archive-webhooks
object_types:
- origin_visit_status
---
# Source: swh/templates/volumes/persistent-volume-claims.yaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
+ name: provenance-popular-persistent-pvc
+ namespace: swh-cassandra-next-version
+ labels:
+ app: provenance-grpc
+spec:
+ accessModes:
+ - ReadWriteOnce
+ resources:
+ requests:
+ storage: 1Gi
+ storageClassName: local-persistent
+ volumeMode: Filesystem
+---
+# Source: swh/templates/volumes/persistent-volume-claims.yaml
+apiVersion: v1
+kind: PersistentVolumeClaim
+metadata:
name: swh-graph-grpc-dataset-example-pvc
namespace: swh-cassandra-next-version
labels:
app: graph-grpc-example
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 1Gi
@@ -5015,20 +5136,30 @@
name: webapp-provenance-ingress-next-version
namespace: swh-cassandra-next-version
spec:
type: ExternalName
externalName: archive-staging-rke2-ingress-nginx-controller.ingress-nginx.svc.cluster.local
---
# Source: swh/templates/external-services/cname.yaml
apiVersion: v1
kind: Service
metadata:
+ name: provenance-grpc-next-version-ingress
+ namespace: swh-cassandra-next-version
+spec:
+ type: ExternalName
+ externalName: archive-staging-rke2-ingress-nginx-controller.ingress-nginx.svc.cluster.local
+---
+# Source: swh/templates/external-services/cname.yaml
+apiVersion: v1
+kind: Service
+metadata:
name: graph-rpc-ingress
namespace: swh-cassandra-next-version
spec:
type: ExternalName
externalName: archive-staging-rke2-ingress-nginx-controller.ingress-nginx.svc.cluster.local
---
# Source: swh/templates/external-services/cname.yaml
apiVersion: v1
kind: Service
metadata:
@@ -5231,20 +5362,37 @@
app: provenance-graph-granet
spec:
type: ClusterIP
selector:
app: provenance-graph-granet
ports:
- port: 5014
targetPort: 5014
name: rpc
---
+# Source: swh/templates/provenance/service.yaml
+apiVersion: v1
+kind: Service
+metadata:
+ name: provenance-grpc
+ namespace: swh-cassandra-next-version
+ labels:
+ app: provenance-grpc
+spec:
+ type: ClusterIP
+ selector:
+ app: provenance-grpc
+ ports:
+ - port: 50141
+ targetPort: 50141
+ name: grpc
+---
# Source: swh/templates/scheduler/rpc-service.yaml
apiVersion: v1
kind: Service
metadata:
name: scheduler-rpc
namespace: swh-cassandra-next-version
labels:
app: scheduler-rpc
spec:
type: ClusterIP
@@ -6052,21 +6200,21 @@
type: RollingUpdate
rollingUpdate:
maxSurge: 1
template:
metadata:
labels:
app: graph-grpc-example
annotations:
checksum/config: e29fc7dba6ab2f3d519be71fcac3361e63dfcbb4b3655a0a78439168961022bb
checksum/config-utils: 94d255131467f84bef964a4c72b2b792c5ebaf711bb1c77829d7cd1007a8ac22
- checksum/backend-utils: 3d2301f0fc8b4715e380acef12da66bdd16981f91351f1269a057ac022babc5a
+ checksum/backend-utils: 979dcd12a9ecdb43f1c4c9191012acbc31c14afa348c1b91e8dfb2aa7d105fae
spec:
nodeSelector:
kubernetes.io/hostname: rancher-node-staging-rke2-metal01
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: swh/graph
operator: In
@@ -6121,21 +6269,21 @@
- name: swh-graph-grpc-inmemory
mountPath: /srv/graph
readOnly: false
- name: wait-for-dataset
image: container-registry.softwareheritage.org/swh/infra/swh-apps/utils:20250211.1
imagePullPolicy: IfNotPresent
command:
- - /entrypoints/graph-wait-for-dataset.sh
+ - /entrypoints/wait-for-dataset.sh
env:
- name: WITNESS_FILE
value: /srv/graph/test/compressed/.graph-is-initialized
- name: DATASET_LOCATION
value: /srv/graph/test/compressed
- name: PERIOD
value: "3"
volumeMounts:
- name: backend-utils
mountPath: /entrypoints
@@ -6266,21 +6414,21 @@
type: RollingUpdate
rollingUpdate:
maxSurge: 1
template:
metadata:
labels:
app: graph-rpc-example
annotations:
checksum/config: 670b17f81e650742736762ab973d6485c0a8573c2052b1357a1fc94534856b7d
checksum/config-utils: 94d255131467f84bef964a4c72b2b792c5ebaf711bb1c77829d7cd1007a8ac22
- checksum/backend-utils: 3d2301f0fc8b4715e380acef12da66bdd16981f91351f1269a057ac022babc5a
+ checksum/backend-utils: 979dcd12a9ecdb43f1c4c9191012acbc31c14afa348c1b91e8dfb2aa7d105fae
spec:
nodeSelector:
kubernetes.io/hostname: rancher-node-staging-rke2-metal01
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: swh/graph
operator: In
@@ -6516,21 +6664,21 @@
type: RollingUpdate
rollingUpdate:
maxSurge: 1
template:
metadata:
labels:
app: indexer-storage-rw
annotations:
checksum/config: 5ab3a947db0a5acead8abc2f73699f859d84fcb4aa54856a72245fa1d471f963
checksum/config-logging: 514b813a6cc082d2a14192b1b6946c52c586735490a886eb2a498eaf2da4e731
- checksum/backend-utils: 3d2301f0fc8b4715e380acef12da66bdd16981f91351f1269a057ac022babc5a
+ checksum/backend-utils: 979dcd12a9ecdb43f1c4c9191012acbc31c14afa348c1b91e8dfb2aa7d105fae
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: swh/rpc
operator: In
values:
- "true"
@@ -9015,20 +9163,21 @@
rollingUpdate:
maxSurge: 1
template:
metadata:
labels:
app: provenance-graph-granet
annotations:
checksum/config: 3a920cab49ad7bb0f2c6e36ef83ad7764740050a70f204675c9b36eb544a59b1
checksum/config-logging: 3ec68ca129865387885cf527bf08f90bda9e6d3ae5e50d948534cbe73306d6fb
checksum/config-utils: 94d255131467f84bef964a4c72b2b792c5ebaf711bb1c77829d7cd1007a8ac22
+ checksum/backend-utils: 979dcd12a9ecdb43f1c4c9191012acbc31c14afa348c1b91e8dfb2aa7d105fae
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: swh/rpc
operator: In
values:
- "true"
@@ -9048,21 +9197,21 @@
mountPath: /etc/swh/configuration-template
- name: config-utils
mountPath: /entrypoints
readOnly: true
containers:
- name: provenance-graph-granet
resources:
requests:
memory: 512Mi
cpu: 500m
- image: container-registry.softwareheritage.org/swh/infra/swh-apps/provenance:20250319.1
+ image: container-registry.softwareheritage.org/swh/infra/swh-apps/provenance:20250321.1
imagePullPolicy: IfNotPresent
ports:
- containerPort: 5014
name: rpc
readinessProbe:
httpGet:
path: /
port: rpc
initialDelaySeconds: 15
failureThreshold: 30
@@ -9071,76 +9220,270 @@
tcpSocket:
port: rpc
initialDelaySeconds: 10
periodSeconds: 5
command:
- /bin/bash
args:
- -c
- /opt/swh/entrypoint.sh
env:
+ - name: PROVENANCE_TYPE
+ value: rpc
+ - name: PORT
+ value: "5014"
- name: WORKERS
value: "2"
- name: THREADS
value: "2"
- name: TIMEOUT
value: "60"
+ - name: SWH_CONFIG_FILENAME
+ value: /etc/swh/config.yml
+ - name: SWH_LOG_CONFIG_JSON
+ value: /etc/swh/logging/logging-gunicorn.json
+ - name: STATSD_SERVICE_TYPE
+ value: provenance-graph-granet
- name: STATSD_HOST
value: prometheus-statsd-exporter
- name: STATSD_PORT
value: "9125"
- name: STATSD_TAGS
value: deployment:provenance-graph-granet
- - name: STATSD_SERVICE_TYPE
- value: provenance-graph-granet
- name: SWH_LOG_LEVEL
- value: "INFO"
- - name: SWH_LOG_CONFIG_JSON
- value: /etc/swh/logging/logging-gunicorn.json
+ value: INFO
- name: SWH_SENTRY_ENVIRONMENT
value: staging
- name: SWH_MAIN_PACKAGE
value: swh.provenance
- name: SWH_SENTRY_DSN
valueFrom:
secretKeyRef:
name: common-secrets
key: provenance-sentry-dsn
# 'name' secret should exist & include key
# if the setting doesn't exist, sentry pushes will be disabled
optional: true
- name: SWH_SENTRY_DISABLE_LOGGING_EVENTS
value: "true"
volumeMounts:
- name: configuration
mountPath: /etc/swh
- name: configuration-logging
mountPath: /etc/swh/logging
+
volumes:
- name: configuration
emptyDir: {}
- name: configuration-template
configMap:
name: provenance-graph-granet-configuration-template
items:
- key: "config.yml.template"
path: "config.yml.template"
- name: configuration-logging
configMap:
name: provenance-graph-granet-configuration-logging
items:
- key: "logging-gunicorn.json"
path: "logging-gunicorn.json"
+
- name: config-utils
configMap:
name: config-utils
defaultMode: 0555
+ - name: backend-utils
+ configMap:
+ name: backend-utils
+ defaultMode: 0555
+---
+# Source: swh/templates/provenance/deployment.yaml
+apiVersion: apps/v1
+kind: Deployment
+metadata:
+ namespace: swh-cassandra-next-version
+ name: provenance-grpc
+ labels:
+ app: provenance-grpc
+spec:
+ revisionHistoryLimit: 2
+ selector:
+ matchLabels:
+ app: provenance-grpc
+ strategy:
+ type: RollingUpdate
+ rollingUpdate:
+ maxSurge: 1
+ template:
+ metadata:
+ labels:
+ app: provenance-grpc
+ annotations:
+ checksum/config: e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855
+ checksum/config-logging: 9fa299d379f661eab9d312fce16ef38fb94e197e908b92ac100aff85b4c36bb4
+ checksum/config-utils: 94d255131467f84bef964a4c72b2b792c5ebaf711bb1c77829d7cd1007a8ac22
+ checksum/backend-utils: 979dcd12a9ecdb43f1c4c9191012acbc31c14afa348c1b91e8dfb2aa7d105fae
+ spec:
+ affinity:
+ nodeAffinity:
+ requiredDuringSchedulingIgnoredDuringExecution:
+ nodeSelectorTerms:
+ - matchExpressions:
+ - key: swh/rpc
+ operator: In
+ values:
+ - "true"
+ priorityClassName: swh-cassandra-next-version-frontend-rpc
+ initContainers:
+ - name: fetch-provenance-dataset
+ image: container-registry.softwareheritage.org/swh/infra/swh-apps/provenance:20250321.1
+ command:
+ - /entrypoints/provenance-fetch-datasets.sh
+ env:
+ - name: WITNESS_FETCH_FILE
+ value: /srv/dataset/provenance/.provenance-is-initialized
+ - name: SWH_CONFIG_FILENAME
+ value: /etc/swh/config.yml
+ - name: PROVENANCE_PATH
+ value: /srv/dataset/provenance
+ - name: PROVENANCE_DATASET_FULL
+ value: "false"
+ - name: GRAPH_PATH
+ value: /srv/dataset/graph
+ - name: DATASET_VERSION
+ value: 2024-08-23-popular-500-python
+ volumeMounts:
+ - name: configuration
+ mountPath: /etc/swh
+ - name: backend-utils
+ mountPath: /entrypoints
+ - name: dataset-persistent
+ mountPath: /srv/dataset
+ readOnly: false
+
+ - name: index-provenance-dataset
+ image: container-registry.softwareheritage.org/swh/infra/swh-apps/provenance:20250321.1
+ imagePullPolicy: IfNotPresent
+ command:
+ - /entrypoints/provenance-index-dataset.sh
+ env:
+ - name: WITNESS_DATASETS_FILE
+ value: /srv/dataset/provenance/.provenance-is-initialized
+ - name: WITNESS_INDEX_FILE
+ value: /srv/dataset/provenance/.provenance-is-indexed
+ - name: PROVENANCE_PATH
+ value: /srv/dataset/provenance
+ - name: PERIOD
+ value: "3"
+ volumeMounts:
+ - name: backend-utils
+ mountPath: /entrypoints
+ readOnly: true
+ - name: dataset-persistent
+ mountPath: /srv/dataset
+ readOnly: false
+
+ - name: wait-for-dataset
+ image: container-registry.softwareheritage.org/swh/infra/swh-apps/utils:20250211.1
+ imagePullPolicy: IfNotPresent
+ command:
+ - /entrypoints/wait-for-dataset.sh
+ env:
+ - name: WITNESS_FILE
+ value: /srv/dataset/provenance/.provenance-is-initialized
+ - name: PERIOD
+ value: "3"
+ volumeMounts:
+ - name: backend-utils
+ mountPath: /entrypoints
+ readOnly: true
+ - name: dataset-persistent
+ mountPath: /srv/dataset
+ readOnly: false
+
+ containers:
+ - name: provenance-grpc
+ resources:
+ requests:
+ memory: 512Mi
+ cpu: 500m
+ image: container-registry.softwareheritage.org/swh/infra/swh-apps/provenance:20250321.1
+ imagePullPolicy: IfNotPresent
+ ports:
+ - containerPort: 50141
+ name: grpc
+ readinessProbe:
+ tcpSocket:
+ port: grpc
+ initialDelaySeconds: 15
+ failureThreshold: 30
+ periodSeconds: 5
+ livenessProbe:
+ tcpSocket:
+ port: grpc
+ initialDelaySeconds: 10
+ periodSeconds: 5
+ command:
+ - /bin/bash
+ args:
+ - -c
+ - /opt/swh/entrypoint.sh
+ env:
+ - name: PROVENANCE_TYPE
+ value: grpc
+ - name: PORT
+ value: "50141"
+ - name: PROVENANCE_PATH
+ value: /srv/dataset/provenance
+ - name: GRAPH_PATH
+ value: /srv/dataset/graph/graph
+ - name: STATSD_HOST
+ value: prometheus-statsd-exporter
+ - name: STATSD_PORT
+ value: "9125"
+ - name: STATSD_TAGS
+ value: deployment:provenance-grpc
+ - name: SWH_LOG_LEVEL
+ value: INFO
+ - name: SWH_SENTRY_ENVIRONMENT
+ value: staging
+ - name: SWH_MAIN_PACKAGE
+ value: swh.provenance
+ - name: SWH_SENTRY_DSN
+ valueFrom:
+ secretKeyRef:
+ name: common-secrets
+ key: provenance-sentry-dsn
+ # 'name' secret should exist & include key
+ # if the setting doesn't exist, sentry pushes will be disabled
+ optional: true
+ - name: SWH_SENTRY_DISABLE_LOGGING_EVENTS
+ value: "true"
+ volumeMounts:
+ - name: dataset-persistent
+ mountPath: /srv/dataset
+ readOnly: false
+
+ volumes:
+ - name: configuration
+ emptyDir: {}
+ - name: config-utils
+ configMap:
+ name: config-utils
+ defaultMode: 0555
+ - name: backend-utils
+ configMap:
+ name: backend-utils
+ defaultMode: 0555
+ - name: dataset-persistent
+ persistentVolumeClaim:
+ claimName: provenance-popular-persistent-pvc
---
# Source: swh/templates/scheduler/extra-services-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
namespace: swh-cassandra-next-version
name: scheduler-listener
labels:
app: scheduler-listener
spec:
@@ -10090,21 +10433,21 @@
rollingUpdate:
maxSurge: 1
template:
metadata:
labels:
app: search-rpc
annotations:
checksum/config: 4526bc9eea3c5b071fc8d14d9a9993ae1819e8c7de57d97340b7f3d4a10b8b4f
checksum/config-logging: 0bc72d1f0a5e779cba1b812d82f00ae1973bb8c5140ff975d94cb4e7f4181000
checksum/config-utils: 94d255131467f84bef964a4c72b2b792c5ebaf711bb1c77829d7cd1007a8ac22
- checksum/backend-utils: 3d2301f0fc8b4715e380acef12da66bdd16981f91351f1269a057ac022babc5a
+ checksum/backend-utils: 979dcd12a9ecdb43f1c4c9191012acbc31c14afa348c1b91e8dfb2aa7d105fae
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: swh/rpc
operator: In
values:
- "true"
@@ -10290,21 +10633,21 @@
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 1
template:
metadata:
labels:
app: storage-replayer-origin
annotations:
checksum/config: 2d12ac61e189624151390a01fbdc28c53b735945f71f4b4fbe01a83a1065bd34
- checksum/config_utils: 3d2301f0fc8b4715e380acef12da66bdd16981f91351f1269a057ac022babc5a
+ checksum/config_utils: 979dcd12a9ecdb43f1c4c9191012acbc31c14afa348c1b91e8dfb2aa7d105fae
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: node-role.kubernetes.io/etcd
operator: NotIn
values:
- "true"
@@ -10408,21 +10751,21 @@
type: RollingUpdate
rollingUpdate:
maxSurge: 1
template:
metadata:
labels:
app: storage-ro-cassandra
annotations:
checksum/config: 82ddd066ef3b39ba30a9377b2e3d5f34e9c6771d244cb9966170d8821659043e
checksum/config-logging: 539b1f63c51f751ac609a212d1d29b53ccee1fe8861d4b40cf156cabdbbcc9af
- checksum/backend-utils: 3d2301f0fc8b4715e380acef12da66bdd16981f91351f1269a057ac022babc5a
+ checksum/backend-utils: 979dcd12a9ecdb43f1c4c9191012acbc31c14afa348c1b91e8dfb2aa7d105fae
checksum/config-utils: 94d255131467f84bef964a4c72b2b792c5ebaf711bb1c77829d7cd1007a8ac22
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: swh/storage
operator: In
values:
@@ -10565,21 +10908,21 @@
type: RollingUpdate
rollingUpdate:
maxSurge: 1
template:
metadata:
labels:
app: storage-ro-postgresql
annotations:
checksum/config: ee389a0c378431ad964e527086008e8983bd8181007a5cef4bdc6b2f3719b891
checksum/config-logging: 04d1f9a399d1326e46f5b78ca589b7dc5a6187afba6e61a46a7b333d07cb16aa
- checksum/backend-utils: 3d2301f0fc8b4715e380acef12da66bdd16981f91351f1269a057ac022babc5a
+ checksum/backend-utils: 979dcd12a9ecdb43f1c4c9191012acbc31c14afa348c1b91e8dfb2aa7d105fae
checksum/config-utils: 94d255131467f84bef964a4c72b2b792c5ebaf711bb1c77829d7cd1007a8ac22
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: swh/storage
operator: In
values:
@@ -10722,21 +11065,21 @@
type: RollingUpdate
rollingUpdate:
maxSurge: 1
template:
metadata:
labels:
app: storage-rw-cassandra
annotations:
checksum/config: 899b95e866ca9a7e5b053415e3df669b133875d96fbf5d91db52e0b105986297
checksum/config-logging: c4f758383b8e60062735e5b03cfe1724b60a3a961b95632b75c511025ddd5d28
- checksum/backend-utils: 3d2301f0fc8b4715e380acef12da66bdd16981f91351f1269a057ac022babc5a
+ checksum/backend-utils: 979dcd12a9ecdb43f1c4c9191012acbc31c14afa348c1b91e8dfb2aa7d105fae
checksum/config-utils: 94d255131467f84bef964a4c72b2b792c5ebaf711bb1c77829d7cd1007a8ac22
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: swh/storage
operator: In
values:
@@ -10893,21 +11236,21 @@
type: RollingUpdate
rollingUpdate:
maxSurge: 1
template:
metadata:
labels:
app: storage-rw-postgresql
annotations:
checksum/config: 053e84875d8867dbb844f84faa7b55a7d5647ab98845e4c0fd8e83ed06aaaf04
checksum/config-logging: b6c995f5c5944a279018efc4d112fe1395da696289b71ebf4ea382011400cd03
- checksum/backend-utils: 3d2301f0fc8b4715e380acef12da66bdd16981f91351f1269a057ac022babc5a
+ checksum/backend-utils: 979dcd12a9ecdb43f1c4c9191012acbc31c14afa348c1b91e8dfb2aa7d105fae
checksum/config-utils: 94d255131467f84bef964a4c72b2b792c5ebaf711bb1c77829d7cd1007a8ac22
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: swh/storage
operator: In
values:
@@ -11597,21 +11940,21 @@
app: web-cassandra
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 1
template:
metadata:
labels:
app: web-cassandra
annotations:
- checksum/config: 412e790069b96930e00d38eea4a3ac5fcaac2ea46e0f75f429be7690318080bf
+ checksum/config: 4c81b069c0173732d5e8c5acb7f38671a4daba9145b4f75fc6dee8d19d42fc1c
checksum/config-logging: f266f784128ac9c57c6d0f154a646e15f06d0ad7557f191487df0d1b385acb48
checksum/config-utils: 94d255131467f84bef964a4c72b2b792c5ebaf711bb1c77829d7cd1007a8ac22
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: swh/web
operator: In
@@ -12281,20 +12624,51 @@
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: provenance-graph-granet
port:
number: 5014
---
+# Source: swh/templates/provenance/ingress.yaml
+apiVersion: networking.k8s.io/v1
+kind: Ingress
+metadata:
+ namespace: swh-cassandra-next-version
+ name: provenance-grpc-ingress-default
+ labels:
+ app: provenance-grpc
+ endpoint-definition: default
+ annotations:
+ nginx.ingress.kubernetes.io/backend-protocol: GRPC
+ nginx.ingress.kubernetes.io/client-body-buffer-size: 128K
+ nginx.ingress.kubernetes.io/proxy-body-size: 4G
+ nginx.ingress.kubernetes.io/proxy-buffering: "on"
+ nginx.ingress.kubernetes.io/service-upstream: "true"
+ nginx.ingress.kubernetes.io/ssl-redirect: "true"
+ nginx.ingress.kubernetes.io/whitelist-source-range: 10.42.0.0/16,10.43.0.0/16
+spec:
+ ingressClassName: nginx
+ rules:
+ - host: provenance-grpc-next-version-ingress
+ http:
+ paths:
+ - path: /
+ pathType: Prefix
+ backend:
+ service:
+ name: provenance-grpc
+ port:
+ number: 50141
+---
# Source: swh/templates/web/ingress.yaml
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
namespace: swh-cassandra-next-version
name: web-cassandra-ingress-authenticated
labels:
app: web-cassandra
endpoint-definition: authenticated
annotations:
------------- diff for environment production namespace swh -------------
--- /tmp/swh-chart.swh.lQqjoZXu/production-swh.before 2025-03-21 17:04:34.629068567 +0100
+++ /tmp/swh-chart.swh.lQqjoZXu/production-swh.after 2025-03-21 17:04:35.153048411 +0100
@@ -2585,21 +2585,21 @@
fi
graph_transposed_name=${GRAPH_NAME}-transposed.graph
if [ -L ${DATASET_LOCATION}/${graph_transposed_name} ] || ! [ -f ${DATASET_LOCATION}/${graph_transposed_name} ]; then
cp -v --remove-destination ${DATASET_SOURCE}/${graph_transposed_name} ${DATASET_LOCATION}/;
fi
# Finally, we make explicit the graph is ready
touch ${WITNESS_FILE}
- graph-wait-for-dataset.sh: |
+ wait-for-dataset.sh: |
#!/usr/bin/env bash
# Uses env variables WITNESS_FILE
[ -z "${WITNESS_FILE}" ] && \
echo "<WITNESS_FILE> env variable must be set" && exit 1
while [ ! -f ${WITNESS_FILE} ]; do
echo "${WITNESS_FILE} not present, wait for it to start the graph..."
sleep $PERIOD
done
@@ -2681,20 +2681,125 @@
echo "${WITNESS_SOURCE_FILE} missing, waiting graph dataset installation..."
sleep $PERIOD
done
# For old datasets missing a .ef or in the wrong format, this fails with
# `Cannot map Elias-Fano pointer list .../graph.ef`. The solution is to
# reindex the dataset
swh graph reindex --ef ${DATASET_LOCATION}/${GRAPH_NAME} && \
touch $WITNESS_REINDEX_FILE
+ provenance-fetch-datasets.sh: |
+ #!/usr/bin/env bash
+ [ -z "${WITNESS_FETCH_FILE}" ] && \
+ echo "<WITNESS_FETCH_FILE> env variable must be set" && exit 1
+ [ -z "${DATASET_VERSION}" ] && \
+ echo "<DATASET_VERSION> env variable must be set" && exit 1
+ [ -z "${PROVENANCE_PATH}" ] && \
+ echo "<PROVENANCE_PATH> env variable must be set" && exit 1
+ [ -z "${GRAPH_PATH}" ] && \
+ echo "<GRAPH_PATH> env variable must be set" && exit 1
+
+ [ -f ${WITNESS_FETCH_FILE} ] && \
+ echo "Datasets graph & provenance <${DATASET_VERSION}> already present. Skip." && \
+ exit 0
+
+ set -e
+
+ # Create destination paths
+ mkdir -p ${PROVENANCE_PATH} ${GRAPH_PATH}
+
+ echo "Fetching datasets..."
+
+ if [ ${PROVENANCE_DATASET_FULL} = true ]; then
+ # Retrieve all the provenance dataset
+ REFS=all
+ else
+ # This excludes revisions not targetted by a snapshot
+ # Ok to use for test purposes
+ REFS=heads
+ fi
+
+ URL_PROVENANCE="s3://softwareheritage/derived_datasets/${DATASET_VERSION}/provenance/${REFS}/"
+
+ CMD_GET="aws s3 cp --no-progress --no-sign-request"
+
+ echo "1. Fetching provenance dataset (parquet files)..."
+ ${CMD_GET} --recursive "${URL_PROVENANCE}" "${PROVENANCE_PATH}"
+ echo "1. Provenance datasets installed!"
+
+ echo "2. Fetching extra graph files..."
+ URL_GRAPH="s3://softwareheritage/graph/${DATASET_VERSION}/compressed"
+
+ mkdir -p "${GRAPH_PATH}"
+ for filename in graph.pthash graph.pthash.order graph.nodes.count.txt \
+ graph.property.message.bin.zst \
+ graph.property.message.offset.bin.zst \
+ graph.property.tag_name.bin.zst \
+ graph.property.tag_name.offset.bin.zst \
+ graph.node2swhid.bin.zst graph.node2type.bin.zst; do
+ ${CMD_GET} "${URL_GRAPH}/${filename}" "${GRAPH_PATH}"
+ done
+ echo "2. Extra graph files installed!"
+
+ echo "3. Uncompressing graph files..."
+ set -x
+ # Uncompress the compressed graph *.zst files
+ for filepath in $(ls ${GRAPH_PATH}/*.zst); do
+ # Uncompress and delete the .zst file
+ [ -f "${filepath}" ] && unzstd --force --rm "${filepath}"
+ done
+ set +x
+ echo "3. Graph files uncompressed!"
+
+ # Make explicit the provenance datasets are fetched
+ touch ${WITNESS_FETCH_FILE}
+
+ echo "Provenance datasets installed!"
+
+ provenance-index-dataset.sh: |
+ #!/usr/bin/env bash
+ [ -z "${WITNESS_DATASETS_FILE}" ] && \
+ echo "<WITNESS_DATASETS_FILE> env variable must be set" && exit 1
+ [ -z "${WITNESS_INDEX_FILE}" ] && \
+ echo "<WITNESS_INDEX_FILE> env variable must be set" && exit 1
+ [ -z "${PERIOD}" ] && \
+ echo "<PERIOD> env variable must be set" && exit 1
+ [ -z "${PROVENANCE_PATH}" ] && \
+ echo "<PROVENANCE_PATH> env variable must be set" && exit 1
+
+ [ -f ${WITNESS_INDEX_FILE} ] && echo "Provenance already indexed, do nothing." && \
+ exit 0
+
+ set -eu
+
+ # Let's wait for the dataset installation
+ while [ ! -f "${WITNESS_DATASETS_FILE}" ]; do
+ echo "${WITNESS_DATASETS_FILE} missing, waiting provenance dataset installation..."
+ sleep $PERIOD
+ done
+
+ echo "Datasets file installed, build provenance dataset indexes..."
+
+ echo "provenance path: $PROVENANCE_PATH"
+ set -x
+
+ # To make the query faster, the provenance needs to build index out of the
+ # current dataset files. We store the output indexes in the same path as
+ # the dataset.
+ swh-provenance-index \
+ --database file://${PROVENANCE_PATH} && \
+ touch "${WITNESS_INDEX_FILE}" && \
+ echo "Provenance indexes built!" || \
+
+ echo "Provenance indexes failed!"
+
initialize-search-backend.sh: |
#!/usr/bin/env bash
set -eux
# Uses internally the environment variable SWH_CONFIG_FILENAME
swh search initialize
register-task-types.sh: |
#!/usr/bin/env bash
@@ -4386,21 +4491,21 @@
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 1
template:
metadata:
labels:
app: storage-replayer-content
annotations:
checksum/config: ad9969915c9d4f098e176250342e634c3f9950c21b4bfce3c59a756eebd29d5a
- checksum/config_utils: 82ab9d2291625dd17a30267f551b45870420eefd4a90bb40f14412553c45556a
+ checksum/config_utils: 233f1b432787895386fcdfff598b35a77ca1f18d4a8f7f0136af55928674c9a9
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: swh/replayer
operator: In
values:
- "true"
@@ -4509,21 +4614,21 @@
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 1
template:
metadata:
labels:
app: storage-replayer-directory
annotations:
checksum/config: d10ee8d64b973f71f54ac97d9b23a984ddcaf85a14e4e7d0c1ffbe6606745a9f
- checksum/config_utils: 82ab9d2291625dd17a30267f551b45870420eefd4a90bb40f14412553c45556a
+ checksum/config_utils: 233f1b432787895386fcdfff598b35a77ca1f18d4a8f7f0136af55928674c9a9
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: swh/replayer
operator: In
values:
- "true"
@@ -4632,21 +4737,21 @@
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 1
template:
metadata:
labels:
app: storage-replayer-extid
annotations:
checksum/config: d6e51b2acf85824083b41c3fc454e0bde5cda180b13fd0ea0f7a90de3b13dd10
- checksum/config_utils: 82ab9d2291625dd17a30267f551b45870420eefd4a90bb40f14412553c45556a
+ checksum/config_utils: 233f1b432787895386fcdfff598b35a77ca1f18d4a8f7f0136af55928674c9a9
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: swh/replayer
operator: In
values:
- "true"
@@ -4755,21 +4860,21 @@
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 1
template:
metadata:
labels:
app: storage-replayer-metadata
annotations:
checksum/config: 568c27f168a777aa1cd02d52a482105f91ea07a5ca96c490faecb6f0f126d510
- checksum/config_utils: 82ab9d2291625dd17a30267f551b45870420eefd4a90bb40f14412553c45556a
+ checksum/config_utils: 233f1b432787895386fcdfff598b35a77ca1f18d4a8f7f0136af55928674c9a9
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: swh/replayer
operator: In
values:
- "true"
@@ -4878,21 +4983,21 @@
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 1
template:
metadata:
labels:
app: storage-replayer-origin
annotations:
checksum/config: 5a0c927d61eea568c15a84881cdcc36061b4c530db7c11f565ede65c5b3936c3
- checksum/config_utils: 82ab9d2291625dd17a30267f551b45870420eefd4a90bb40f14412553c45556a
+ checksum/config_utils: 233f1b432787895386fcdfff598b35a77ca1f18d4a8f7f0136af55928674c9a9
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: swh/replayer
operator: In
values:
- "true"
@@ -5001,21 +5106,21 @@
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 1
template:
metadata:
labels:
app: storage-replayer-origin-visit
annotations:
checksum/config: 0e471b3f26d83e3d374296d1a6ce7077b31cb935705c692929ee6332dacb03fa
- checksum/config_utils: 82ab9d2291625dd17a30267f551b45870420eefd4a90bb40f14412553c45556a
+ checksum/config_utils: 233f1b432787895386fcdfff598b35a77ca1f18d4a8f7f0136af55928674c9a9
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: swh/replayer
operator: In
values:
- "true"
@@ -5124,21 +5229,21 @@
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 1
template:
metadata:
labels:
app: storage-replayer-origin-visit-status
annotations:
checksum/config: 87f95fa2d03d52fdec4dcbb83ba5e790821fe8d739e8922a97046f0d2a10abae
- checksum/config_utils: 82ab9d2291625dd17a30267f551b45870420eefd4a90bb40f14412553c45556a
+ checksum/config_utils: 233f1b432787895386fcdfff598b35a77ca1f18d4a8f7f0136af55928674c9a9
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: swh/replayer
operator: In
values:
- "true"
@@ -5247,21 +5352,21 @@
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 1
template:
metadata:
labels:
app: storage-replayer-raw-extrinsic-metadata
annotations:
checksum/config: daffe1093b0bb5c08485e25baeb2f10f5f39f0fd826c3a40283693a9d43fae37
- checksum/config_utils: 82ab9d2291625dd17a30267f551b45870420eefd4a90bb40f14412553c45556a
+ checksum/config_utils: 233f1b432787895386fcdfff598b35a77ca1f18d4a8f7f0136af55928674c9a9
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: swh/replayer
operator: In
values:
- "true"
@@ -5370,21 +5475,21 @@
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 1
template:
metadata:
labels:
app: storage-replayer-release
annotations:
checksum/config: 41773bc062731699b038ae98bec197c7290766735e9ae57977da8d0d1b0a82d5
- checksum/config_utils: 82ab9d2291625dd17a30267f551b45870420eefd4a90bb40f14412553c45556a
+ checksum/config_utils: 233f1b432787895386fcdfff598b35a77ca1f18d4a8f7f0136af55928674c9a9
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: swh/replayer
operator: In
values:
- "true"
@@ -5493,21 +5598,21 @@
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 1
template:
metadata:
labels:
app: storage-replayer-revision
annotations:
checksum/config: 24e62d01eeed14fdd9eab5bcbbdfc2165b37c246e93270c6a3593a86760b296f
- checksum/config_utils: 82ab9d2291625dd17a30267f551b45870420eefd4a90bb40f14412553c45556a
+ checksum/config_utils: 233f1b432787895386fcdfff598b35a77ca1f18d4a8f7f0136af55928674c9a9
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: swh/replayer
operator: In
values:
- "true"
@@ -5616,21 +5721,21 @@
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 1
template:
metadata:
labels:
app: storage-replayer-skipped-content
annotations:
checksum/config: 61ed93650af06dedc4ee939164381d2e9ac65098d43af26405e34c2d6fd3cae8
- checksum/config_utils: 82ab9d2291625dd17a30267f551b45870420eefd4a90bb40f14412553c45556a
+ checksum/config_utils: 233f1b432787895386fcdfff598b35a77ca1f18d4a8f7f0136af55928674c9a9
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: swh/replayer
operator: In
values:
- "true"
@@ -5739,21 +5844,21 @@
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 1
template:
metadata:
labels:
app: storage-replayer-snapshot
annotations:
checksum/config: 32cb30621d4c74bac7750080475ba1bece7f561dbc99a798bdbda13b29c5e9c0
- checksum/config_utils: 82ab9d2291625dd17a30267f551b45870420eefd4a90bb40f14412553c45556a
+ checksum/config_utils: 233f1b432787895386fcdfff598b35a77ca1f18d4a8f7f0136af55928674c9a9
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: swh/replayer
operator: In
values:
- "true"
@@ -5863,21 +5968,21 @@
type: RollingUpdate
rollingUpdate:
maxSurge: 1
template:
metadata:
labels:
app: storage-postgresql-azure-readonly
annotations:
checksum/config: f82d8c731f4709a9e756911b666f9a28a25429375b97c614a4ef5c7bc231e3c8
checksum/config-logging: 0ecdc326a2b3e525e21e5743d89eb3c4bfbadc12aee4fbe1a32ba77ab7bde899
- checksum/backend-utils: 82ab9d2291625dd17a30267f551b45870420eefd4a90bb40f14412553c45556a
+ checksum/backend-utils: 233f1b432787895386fcdfff598b35a77ca1f18d4a8f7f0136af55928674c9a9
checksum/config-utils: d75ca13b805bce6a8ab59c8e24c938f2283108f6a79134f6e71db86308651dc6
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: swh/storage
operator: In
values:
------------- diff for environment production namespace swh-cassandra -------------
--- /tmp/swh-chart.swh.lQqjoZXu/production-swh-cassandra.before 2025-03-21 17:04:35.009053950 +0100
+++ /tmp/swh-chart.swh.lQqjoZXu/production-swh-cassandra.after 2025-03-21 17:04:35.521034257 +0100
@@ -10509,21 +10509,21 @@
fi
graph_transposed_name=${GRAPH_NAME}-transposed.graph
if [ -L ${DATASET_LOCATION}/${graph_transposed_name} ] || ! [ -f ${DATASET_LOCATION}/${graph_transposed_name} ]; then
cp -v --remove-destination ${DATASET_SOURCE}/${graph_transposed_name} ${DATASET_LOCATION}/;
fi
# Finally, we make explicit the graph is ready
touch ${WITNESS_FILE}
- graph-wait-for-dataset.sh: |
+ wait-for-dataset.sh: |
#!/usr/bin/env bash
# Uses env variables WITNESS_FILE
[ -z "${WITNESS_FILE}" ] && \
echo "<WITNESS_FILE> env variable must be set" && exit 1
while [ ! -f ${WITNESS_FILE} ]; do
echo "${WITNESS_FILE} not present, wait for it to start the graph..."
sleep $PERIOD
done
@@ -10605,20 +10605,125 @@
echo "${WITNESS_SOURCE_FILE} missing, waiting graph dataset installation..."
sleep $PERIOD
done
# For old datasets missing a .ef or in the wrong format, this fails with
# `Cannot map Elias-Fano pointer list .../graph.ef`. The solution is to
# reindex the dataset
swh graph reindex --ef ${DATASET_LOCATION}/${GRAPH_NAME} && \
touch $WITNESS_REINDEX_FILE
+ provenance-fetch-datasets.sh: |
+ #!/usr/bin/env bash
+ [ -z "${WITNESS_FETCH_FILE}" ] && \
+ echo "<WITNESS_FETCH_FILE> env variable must be set" && exit 1
+ [ -z "${DATASET_VERSION}" ] && \
+ echo "<DATASET_VERSION> env variable must be set" && exit 1
+ [ -z "${PROVENANCE_PATH}" ] && \
+ echo "<PROVENANCE_PATH> env variable must be set" && exit 1
+ [ -z "${GRAPH_PATH}" ] && \
+ echo "<GRAPH_PATH> env variable must be set" && exit 1
+
+ [ -f ${WITNESS_FETCH_FILE} ] && \
+ echo "Datasets graph & provenance <${DATASET_VERSION}> already present. Skip." && \
+ exit 0
+
+ set -e
+
+ # Create destination paths
+ mkdir -p ${PROVENANCE_PATH} ${GRAPH_PATH}
+
+ echo "Fetching datasets..."
+
+ if [ ${PROVENANCE_DATASET_FULL} = true ]; then
+ # Retrieve all the provenance dataset
+ REFS=all
+ else
+ # This excludes revisions not targetted by a snapshot
+ # Ok to use for test purposes
+ REFS=heads
+ fi
+
+ URL_PROVENANCE="s3://softwareheritage/derived_datasets/${DATASET_VERSION}/provenance/${REFS}/"
+
+ CMD_GET="aws s3 cp --no-progress --no-sign-request"
+
+ echo "1. Fetching provenance dataset (parquet files)..."
+ ${CMD_GET} --recursive "${URL_PROVENANCE}" "${PROVENANCE_PATH}"
+ echo "1. Provenance datasets installed!"
+
+ echo "2. Fetching extra graph files..."
+ URL_GRAPH="s3://softwareheritage/graph/${DATASET_VERSION}/compressed"
+
+ mkdir -p "${GRAPH_PATH}"
+ for filename in graph.pthash graph.pthash.order graph.nodes.count.txt \
+ graph.property.message.bin.zst \
+ graph.property.message.offset.bin.zst \
+ graph.property.tag_name.bin.zst \
+ graph.property.tag_name.offset.bin.zst \
+ graph.node2swhid.bin.zst graph.node2type.bin.zst; do
+ ${CMD_GET} "${URL_GRAPH}/${filename}" "${GRAPH_PATH}"
+ done
+ echo "2. Extra graph files installed!"
+
+ echo "3. Uncompressing graph files..."
+ set -x
+ # Uncompress the compressed graph *.zst files
+ for filepath in $(ls ${GRAPH_PATH}/*.zst); do
+ # Uncompress and delete the .zst file
+ [ -f "${filepath}" ] && unzstd --force --rm "${filepath}"
+ done
+ set +x
+ echo "3. Graph files uncompressed!"
+
+ # Make explicit the provenance datasets are fetched
+ touch ${WITNESS_FETCH_FILE}
+
+ echo "Provenance datasets installed!"
+
+ provenance-index-dataset.sh: |
+ #!/usr/bin/env bash
+ [ -z "${WITNESS_DATASETS_FILE}" ] && \
+ echo "<WITNESS_DATASETS_FILE> env variable must be set" && exit 1
+ [ -z "${WITNESS_INDEX_FILE}" ] && \
+ echo "<WITNESS_INDEX_FILE> env variable must be set" && exit 1
+ [ -z "${PERIOD}" ] && \
+ echo "<PERIOD> env variable must be set" && exit 1
+ [ -z "${PROVENANCE_PATH}" ] && \
+ echo "<PROVENANCE_PATH> env variable must be set" && exit 1
+
+ [ -f ${WITNESS_INDEX_FILE} ] && echo "Provenance already indexed, do nothing." && \
+ exit 0
+
+ set -eu
+
+ # Let's wait for the dataset installation
+ while [ ! -f "${WITNESS_DATASETS_FILE}" ]; do
+ echo "${WITNESS_DATASETS_FILE} missing, waiting provenance dataset installation..."
+ sleep $PERIOD
+ done
+
+ echo "Datasets file installed, build provenance dataset indexes..."
+
+ echo "provenance path: $PROVENANCE_PATH"
+ set -x
+
+ # To make the query faster, the provenance needs to build index out of the
+ # current dataset files. We store the output indexes in the same path as
+ # the dataset.
+ swh-provenance-index \
+ --database file://${PROVENANCE_PATH} && \
+ touch "${WITNESS_INDEX_FILE}" && \
+ echo "Provenance indexes built!" || \
+
+ echo "Provenance indexes failed!"
+
initialize-search-backend.sh: |
#!/usr/bin/env bash
set -eux
# Uses internally the environment variable SWH_CONFIG_FILENAME
swh search initialize
register-task-types.sh: |
#!/usr/bin/env bash
@@ -13978,21 +14083,21 @@
type: RollingUpdate
rollingUpdate:
maxSurge: 1
template:
metadata:
labels:
app: graph-grpc-20241206
annotations:
checksum/config: b4edb88c0bcb74769dc2f39025a598580a6d6a39cece80ba52904365cd7380eb
checksum/config-utils: 13a26f6add17e96ce01550153c77dcd48de60241a3f4db3c93d5467234be2a7f
- checksum/backend-utils: 498bb7b35f4e2d6996251c8615bff7661ffa9981f2f69734acb80824fc37d2b1
+ checksum/backend-utils: bfb79c03c5f47eacbf1cedde17860a255223193cdeb60f9e3455084af3bab88c
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: swh/graph
operator: In
values:
- "true"
@@ -14164,21 +14269,21 @@
type: RollingUpdate
rollingUpdate:
maxSurge: 1
template:
metadata:
labels:
app: graph-rpc-20241206
annotations:
checksum/config: 095d223956d75728c8f8a26368053a8882cb3026736517767d8aacfc9895e159
checksum/config-utils: 13a26f6add17e96ce01550153c77dcd48de60241a3f4db3c93d5467234be2a7f
- checksum/backend-utils: 498bb7b35f4e2d6996251c8615bff7661ffa9981f2f69734acb80824fc37d2b1
+ checksum/backend-utils: bfb79c03c5f47eacbf1cedde17860a255223193cdeb60f9e3455084af3bab88c
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: swh/graph
operator: In
values:
- "true"
@@ -14523,21 +14628,21 @@
type: RollingUpdate
rollingUpdate:
maxSurge: 1
template:
metadata:
labels:
app: indexer-storage-read-only
annotations:
checksum/config: e233fce2b3a7a714653810d4a8084763fa3d456d691d5964f00c546ebbaaa49d
checksum/config-logging: 3c46e3e49b8224015ed0a6ef21fec2ba66c4af22a8718cd0ad4f61483cd5e8be
- checksum/backend-utils: 498bb7b35f4e2d6996251c8615bff7661ffa9981f2f69734acb80824fc37d2b1
+ checksum/backend-utils: bfb79c03c5f47eacbf1cedde17860a255223193cdeb60f9e3455084af3bab88c
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: swh/rpc
operator: In
values:
- "true"
@@ -14670,21 +14775,21 @@
type: RollingUpdate
rollingUpdate:
maxSurge: 1
template:
metadata:
labels:
app: indexer-storage-read-write
annotations:
checksum/config: 3b882fc68a6b1c70f8ce6b82965db4903c5f13557d4a0a43fd6d858745c72e90
checksum/config-logging: 2b18f7d6ed7689e52685ba77e412dffc3ee95be9bddd0aa15d728ee8ef45591d
- checksum/backend-utils: 498bb7b35f4e2d6996251c8615bff7661ffa9981f2f69734acb80824fc37d2b1
+ checksum/backend-utils: bfb79c03c5f47eacbf1cedde17860a255223193cdeb60f9e3455084af3bab88c
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: swh/rpc
operator: In
values:
- "true"
@@ -25360,20 +25465,21 @@
rollingUpdate:
maxSurge: 1
template:
metadata:
labels:
app: provenance-graph-granet
annotations:
checksum/config: fcc422ce13f035bd4de309693c6044e4eee6a37fdc487ec2f9fef5437dfd954e
checksum/config-logging: ddcd27d991938c46f4fc0ad7ee028cb3005f186b3db022596c9ae94363881e4f
checksum/config-utils: 13a26f6add17e96ce01550153c77dcd48de60241a3f4db3c93d5467234be2a7f
+ checksum/backend-utils: bfb79c03c5f47eacbf1cedde17860a255223193cdeb60f9e3455084af3bab88c
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: swh/rpc
operator: In
values:
- "true"
@@ -25393,21 +25499,21 @@
mountPath: /etc/swh/configuration-template
- name: config-utils
mountPath: /entrypoints
readOnly: true
containers:
- name: provenance-graph-granet
resources:
requests:
memory: 512Mi
cpu: 500m
- image: container-registry.softwareheritage.org/swh/infra/swh-apps/provenance:20250319.1
+ image: container-registry.softwareheritage.org/swh/infra/swh-apps/provenance:20250321.1
imagePullPolicy: IfNotPresent
ports:
- containerPort: 5014
name: rpc
readinessProbe:
httpGet:
path: /
port: rpc
initialDelaySeconds: 15
failureThreshold: 30
@@ -25416,76 +25522,88 @@
tcpSocket:
port: rpc
initialDelaySeconds: 10
periodSeconds: 5
command:
- /bin/bash
args:
- -c
- /opt/swh/entrypoint.sh
env:
+ - name: PROVENANCE_TYPE
+ value: rpc
+ - name: PORT
+ value: "5014"
- name: WORKERS
value: "4"
- name: THREADS
value: "1"
- name: TIMEOUT
value: "60"
+ - name: SWH_CONFIG_FILENAME
+ value: /etc/swh/config.yml
+ - name: SWH_LOG_CONFIG_JSON
+ value: /etc/swh/logging/logging-gunicorn.json
+ - name: STATSD_SERVICE_TYPE
+ value: provenance-graph-granet
- name: STATSD_HOST
value: prometheus-statsd-exporter
- name: STATSD_PORT
value: "9125"
- name: STATSD_TAGS
value: deployment:provenance-graph-granet
- - name: STATSD_SERVICE_TYPE
- value: provenance-graph-granet
- name: SWH_LOG_LEVEL
- value: "INFO"
- - name: SWH_LOG_CONFIG_JSON
- value: /etc/swh/logging/logging-gunicorn.json
+ value: INFO
- name: SWH_SENTRY_ENVIRONMENT
value: production
- name: SWH_MAIN_PACKAGE
value: swh.provenance
- name: SWH_SENTRY_DSN
valueFrom:
secretKeyRef:
name: common-secrets
key: provenance-sentry-dsn
# 'name' secret should exist & include key
# if the setting doesn't exist, sentry pushes will be disabled
optional: true
- name: SWH_SENTRY_DISABLE_LOGGING_EVENTS
value: "true"
volumeMounts:
- name: configuration
mountPath: /etc/swh
- name: configuration-logging
mountPath: /etc/swh/logging
+
volumes:
- name: configuration
emptyDir: {}
- name: configuration-template
configMap:
name: provenance-graph-granet-configuration-template
items:
- key: "config.yml.template"
path: "config.yml.template"
- name: configuration-logging
configMap:
name: provenance-graph-granet-configuration-logging
items:
- key: "logging-gunicorn.json"
path: "logging-gunicorn.json"
+
- name: config-utils
configMap:
name: config-utils
defaultMode: 0555
+ - name: backend-utils
+ configMap:
+ name: backend-utils
+ defaultMode: 0555
---
# Source: swh/templates/scheduler/extra-services-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
namespace: swh-cassandra
name: scheduler-listener
labels:
app: scheduler-listener
spec:
@@ -27252,21 +27370,21 @@
rollingUpdate:
maxSurge: 1
template:
metadata:
labels:
app: search-rpc
annotations:
checksum/config: a76abbb3247f00560f78f1f1aaafc30e0c3958dc059e75911400596ddb51b4e2
checksum/config-logging: 7bffbc6ce2cb11d88208ef0c5f1d8e6822659c361717afb51dcf0f4da02fe1f7
checksum/config-utils: 13a26f6add17e96ce01550153c77dcd48de60241a3f4db3c93d5467234be2a7f
- checksum/backend-utils: 498bb7b35f4e2d6996251c8615bff7661ffa9981f2f69734acb80824fc37d2b1
+ checksum/backend-utils: bfb79c03c5f47eacbf1cedde17860a255223193cdeb60f9e3455084af3bab88c
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: swh/rpc
operator: In
values:
- "true"
@@ -27440,21 +27558,21 @@
type: RollingUpdate
rollingUpdate:
maxSurge: 1
template:
metadata:
labels:
app: storage-cassandra-azure-readonly
annotations:
checksum/config: 63fbb2e5c758f9faab28192d2a0458eea22410b824e63d0b35de085b50fc3e6e
checksum/config-logging: 6d3a84a071464bdb72aea996f9c90be8ff89eedb1f09a8cc71c7699e652c8a47
- checksum/backend-utils: 498bb7b35f4e2d6996251c8615bff7661ffa9981f2f69734acb80824fc37d2b1
+ checksum/backend-utils: bfb79c03c5f47eacbf1cedde17860a255223193cdeb60f9e3455084af3bab88c
checksum/config-utils: 13a26f6add17e96ce01550153c77dcd48de60241a3f4db3c93d5467234be2a7f
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: swh/storage
operator: In
values:
@@ -27693,21 +27811,21 @@
type: RollingUpdate
rollingUpdate:
maxSurge: 1
template:
metadata:
labels:
app: storage-cassandra-readonly
annotations:
checksum/config: 7a7e39b703ff92c12c87d9b4d8f0ea91c4d79e96c78f05476f37ab783c4687ff
checksum/config-logging: 800fc3f5bdfec12955f3689d3c319b74f52a02d09b08fb710cea854d815dfad6
- checksum/backend-utils: 498bb7b35f4e2d6996251c8615bff7661ffa9981f2f69734acb80824fc37d2b1
+ checksum/backend-utils: bfb79c03c5f47eacbf1cedde17860a255223193cdeb60f9e3455084af3bab88c
checksum/config-utils: 13a26f6add17e96ce01550153c77dcd48de60241a3f4db3c93d5467234be2a7f
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: swh/storage
operator: In
values:
@@ -27946,21 +28064,21 @@
type: RollingUpdate
rollingUpdate:
maxSurge: 1
template:
metadata:
labels:
app: storage-cassandra-readonly-internal
annotations:
checksum/config: 0094e08338f11eeeda7a0958ec8402ab9d37544aecc81f62772fddad37c38dfe
checksum/config-logging: 2ecb5a0cb1eaeb9246bc272bee5df3292a73f3c87134840a682f8c3fb03ac008
- checksum/backend-utils: 498bb7b35f4e2d6996251c8615bff7661ffa9981f2f69734acb80824fc37d2b1
+ checksum/backend-utils: bfb79c03c5f47eacbf1cedde17860a255223193cdeb60f9e3455084af3bab88c
checksum/config-utils: 13a26f6add17e96ce01550153c77dcd48de60241a3f4db3c93d5467234be2a7f
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: swh/storage
operator: In
values:
@@ -28199,21 +28317,21 @@
type: RollingUpdate
rollingUpdate:
maxSurge: 1
template:
metadata:
labels:
app: storage-cassandra-winery
annotations:
checksum/config: 17b0f2faa5762626ef5c966dc8a2aa810c208c8c86643305a8921e538ea21583
checksum/config-logging: 21f19120491561669a337c91f8a5b62fb9b081d0c0ca55a9e69fdc26e1a5350a
- checksum/backend-utils: 498bb7b35f4e2d6996251c8615bff7661ffa9981f2f69734acb80824fc37d2b1
+ checksum/backend-utils: bfb79c03c5f47eacbf1cedde17860a255223193cdeb60f9e3455084af3bab88c
checksum/config-utils: 13a26f6add17e96ce01550153c77dcd48de60241a3f4db3c93d5467234be2a7f
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: swh/storage
operator: In
values:
Merge request reports
Activity
added 26 commits
-
db356083...dc5d1f09 - 15 commits from branch
production
- b93e84df - 1 earlier commit
- 6f0d8907 - provenance: Start adapting deployment template to configure the port
- 53beb2f3 - deployment/provenance: Refactor configuration parsing
- 866d0803 - local-cluster: Add dummy provenance instance
- 7a498aa5 - values-swh-application-versions: Bump to provenance v3.2
- ccebc43c - deployment/provenance: Add startService configuration entry
- 74aed5df - deployment/provenance: Unify log level setup
- 35474a2a - deployment/provenance: Deploy gunicorn setup for rpc service type
- ab277e3b - provenance/values: Design yaml configuration
- 2d7d4e0a - provenance/deployment: Compute mandatory configuration per type
- 2e26cd5b - wip: provenance/deployment: Fetch and prepare volumes with dataset
Toggle commit list-
db356083...dc5d1f09 - 15 commits from branch
added 1 commit
- e0a7274d - provenance/deployment: Allow to prepare grpc backend with data
added 10 commits
- f54fd1da - next-version: Deploy provenance grpc
- 355c979a - provenance/deployment: Add missing backend-utils volume mount
- 30dab34d - provenance/deployment: Fix indentation
- 54f4d819 - provenance/configmap: Only deploy configmap for rpc service
- ead37538 - provenance/config: Fix missing provenance path introspection
- 16351bdb - provenance/deployment: Drop unneeded image prefix name
- d1eb8546 - provenance/deployment: Fix variable issue type and typo
- f3d20d07 - provenance/deployment: deploy rpc configuration for rpc server
- 58659568 - provenance/helper-volume: Fix witness file name
- 56750418 - backend-utils: Make script verbose
Toggle commit listadded 7 commits
- 9b410ca8 - backend-utils/provenance-fetch-datasets.sh: Fix script
- f4b934d1 - provenance/helper-volume: Fix witness file name on provenance index script
- bc396e58 - provenance/backend-utils: Make script verbose & readable
- e5a20df9 - provenance/fetch-provenance-dataset: Create the graph directory
- bc746488 - provenance/provenance-index-dataset: Make it functional
- bf75e9fa - provenance/script: Iterate to make script resilient
- 6e8eddae - local-cluster/provenance: Use local-persistent class for pv
Toggle commit listadded 9 commits
- e5fdcc56 - provenance: Allow to use a data subset
- 3d4603b5 - provenance/script: Fix graph files retrieval step
- fa269d28 - provenance/script: Drop debugging instructions
- d86aaf80 - provenance/script: Activate back index check
- 6a8ab8b3 - values-swh-application-versions: Bump to recent provenance image
- e27c57b2 - provenance: Add missing graph dataset file
- 1c0ecd28 - deployment/provenance: It's actually the pattern to access graph files
- b5983668 - provenance/helper-volume: Fix variable name typo
- 12ac4262 - provenance/fetch-dataset: Add missing graph files
Toggle commit listadded 39 commits
-
48e56fd3 - 1 commit from branch
production
- 48e56fd3...58d1470e - 28 earlier commits
- defa89b2 - provenance: Allow to use a data subset
- 7f522b42 - provenance/script: Fix graph files retrieval step
- 3dfe3991 - provenance/script: Drop debugging instructions
- d85e5e6c - provenance/script: Activate back index check
- 3aefbce2 - values-swh-application-versions: Bump to recent provenance image
- fb8c6232 - provenance: Add missing graph dataset file
- 1b517a96 - deployment/provenance: It's actually the pattern to access graph files
- c3b80d63 - provenance/helper-volume: Fix variable name typo
- cbfcaa42 - provenance/fetch-dataset: Add missing graph files
- 2f025065 - next-version/provenance: Use smaller dataset
Toggle commit list-
48e56fd3 - 1 commit from branch
mentioned in issue swh/infra/sysadm-environment#5608 (closed)
added 1 commit
- e6c6f946 - helper-ingress: Move grpc extra annotations declaration in helper
added 41 commits
-
35692512 - 1 commit from branch
production
- 35692512...5d13a8b0 - 30 earlier commits
- f737f9c2 - provenance/script: Drop debugging instructions
- 167692d4 - provenance/script: Activate back index check
- f2a60beb - values-swh-application-versions: Bump to recent provenance image
- 38298168 - provenance: Add missing graph dataset file
- ca1beb2b - deployment/provenance: It's actually the pattern to access graph files
- e0451855 - provenance/helper-volume: Fix variable name typo
- 02652e1c - provenance/fetch-dataset: Add missing graph files
- 6fcc96c0 - next-version/provenance: Use smaller dataset
- f42977d7 - local-cluster/provenance: Add ingress setup
- eb30d291 - helper-ingress: Move grpc extra annotations declaration in helper
Toggle commit list-
35692512 - 1 commit from branch
added 21 commits
- eb30d291...256bd1b9 - 11 earlier commits
- 740ad96f - provenance/deployment: Drop unneeded image prefix name
- ff197899 - provenance/deployment: Fix variable issue type and typo
- c6c5f0fb - provenance/deployment: deploy rpc configuration for rpc server
- da0fbf93 - backend-utils/provenance: Iterate to make scripts functionals
- a5317f81 - provenance/deployment: Allow to use a data subset
- aa953c03 - provenance/script: Fix graph files retrieval step
- fc564ccd - provenance/fetch-dataset: Make the provenance functional
- 557a553f - helper-ingress: Move grpc extra annotations declaration in helper
- 729b625c - local-cluster/provenance: Adapt configuration for the instance to run
- f9480a2d - next-version: Deploy provenance grpc
Toggle commit list- Resolved by Antoine R. Dumont
Testing the connection to the graph grpc instance either through an ipython repl [1], issue is raised. And through curl there is just a blank response [2].
@vlorentz Any hints please?
[1] Try to discuss directly with the provenance grpc by instantiating the code in charge of this (the one used in the webapp)
In [1]: from swh.provenance import get_provenance ...: from yaml import safe_load ...: from swh.model.swhids import CoreSWHID ...: ...: config = """ ...: cls: graph ...: url: provenance-grpc-popular-ingress:80 ...: """ ...: ...: config_d = safe_load(config) ...: provenance=get_provenance(**config_d) ...: ...: provenance.check_config() ...: swhid=CoreSWHID.from_string("swh:1:cnt:07d9e8c75f4f7e7dba04b5b4e8589a158c8a6892") ...: unknown_swhid=CoreSWHID.from_string("swh:1:cnt:27766b99cdcab4e9b68501c3b50f1712e016c945") ...: ...: provenance.whereis(swhid=swhid) ...: --------------------------------------------------------------------------- _MultiThreadedRendezvous Traceback (most recent call last) Cell In[1], line 17 14 swhid=CoreSWHID.from_string("swh:1:cnt:07d9e8c75f4f7e7dba04b5b4e8589a158c8a6892") 15 unknown_swhid=CoreSWHID.from_string("swh:1:cnt:27766b99cdcab4e9b68501c3b50f1712e016c945") ---> 17 provenance.whereis(swhid=swhid) 18 provenance.whereis(swhid=unknown_swhid) File /opt/swh/venv/lib/python3.11/site-packages/swh/provenance/backend/graph.py:166, in GraphProvenance.whereis(self, swhid) 156 def whereis(self, *, swhid: CoreSWHID) -> Optional[QualifiedSWHID]: 157 """Given a SWHID return a QualifiedSWHID with some provenance info: 158 159 - the release or revision containing that content or directory (...) 164 be an association release if any. 165 """ --> 166 anchor = self._get_anchor(swhid, "rel") 167 if anchor is None: 168 anchor = self._get_anchor(swhid, "rev") File /opt/swh/venv/lib/python3.11/site-packages/swh/provenance/backend/graph.py:77, in GraphProvenance._get_anchor(self, swhid, leaf_type) 75 try: 76 t0 = monotonic() ---> 77 resp = list(self._stub.Traverse(anchor_search)) 78 except grpc.RpcError as exc: 79 if exc.code() == grpc.StatusCode.NOT_FOUND: File /opt/swh/venv/lib/python3.11/site-packages/grpc/_channel.py:543, in _Rendezvous.__next__(self) 542 def __next__(self): --> 543 return self._next() File /opt/swh/venv/lib/python3.11/site-packages/grpc/_channel.py:969, in _MultiThreadedRendezvous._next(self) 967 raise StopIteration() 968 elif self._state.code is not None: --> 969 raise self _MultiThreadedRendezvous: <_MultiThreadedRendezvous of RPC that terminated with: status = StatusCode.UNIMPLEMENTED details = "" debug_error_string = "UNKNOWN:Error received from peer {grpc_message:"", grpc_status:12, created_time:"2025-03-21T15:14:05.003033412+00:00"}" >
[2] After dropping the hard-coding authentication code [0]
$ curl -s http://web-local-archive-ingress/api/1/provenance/whereis/swh:1:cnt:dcb2d732994e615aab0777bfe625bd1f07e486ac $ # no response ^
[0] Internal deployment detail (comment out authentication code part [not configurable] and push changes to the pod).
$ grep "_check_auth_and_permission(request)" ~/swh/swh-environment/swh-web/swh/web/provenance/api_views.py # _check_auth_and_permission(request) # _check_auth_and_permission(request) $ namespace=swh; pod=web-local-archive-5b557fd5b4-9d7wz; swh-kubectl kind cp ~/swh/swh-environment/swh-web/swh/web/provenance/api_views.py $namespace/$pod:/opt/swh/venv/lib/python3.11/site-packages/swh/web/provenance/api_views.py + case "$1" in + context=kind-local-cluster + shift + kubectl --context kind-local-cluster cp /home/tony/swh/swh-environment/swh-web/swh/web/provenance/api_views.py swh/web-local-archive-5b557fd5b4-9d7wz:/opt/swh/venv/lib/python3.11/site-packages/swh/web/provenance/api_views.py Defaulted container "web-local-archive" out of: web-local-archive, nginx, prepare-configuration (init), do-migration (init), prepare-static (init)