Skip to content

vault: Redirect to bundle download link if cache backend supports it

Some vault cache backends (azure one for instance) can provide a direct download link for a cooked bundle. In that case, redirect requests to the /api/1/vault/(bundle_type)/(swhid)/raw/ to such links in order to offer efficient downloads and avoid connection errors when dealing with large bundles.

Before we can deploy this change to production, we need to update the current vault production cache, using an azure backend, to store already cooked bundles in an uncompressed way as all stored bundles are gzipped in actual blobs container. We should create a new blobs container on azure and copy uncompressed existing bundles in it, using the following script for instance.

import gzip

from azure.storage.blob import ContainerClient

AZURE_CONNECTION_STRING = """DefaultEndpointsProtocol=https;\
AccountName=***;\
AccountKey=***;\
BlobEndpoint=***;"""

SRC_CONTAINER_NAME = "contents"
TGT_CONTAINER_NAME = "contents-uncompressed"

src_container_client = ContainerClient.from_connection_string(
    AZURE_CONNECTION_STRING, SRC_CONTAINER_NAME
)

tgt_container_client = ContainerClient.from_connection_string(
    AZURE_CONNECTION_STRING, TGT_CONTAINER_NAME
)

for blob in src_container_client.list_blobs():
    blob_data = src_container_client.download_blob(blob).readall()
    tgt_container_client.upload_blob(blob, gzip.decompress(blob_data))

Once done, we should update vault service configuration in production to use an uncompressed azure objstorage as cache backend targetting the newly create container (see swh-environment!275 (diffs)).

We must also enable CORS in the Azure storage to avoid bundle downloads to be blocked by browsers.

Depends on swh-vault!186 (merged)

Related to swh-vault#885 (closed).

Edited by Antoine Lambert

Merge request reports