Skip to content
Snippets Groups Projects

Compare revisions

Changes are shown as if the source revision was being merged into the target revision. Learn more about comparing revisions.

Source

Select target project
No results found

Target

Select target project
  • anlambert/swh-vault
  • lunar/swh-vault
  • swh/devel/swh-vault
  • douardda/swh-vault
  • olasd/swh-vault
  • marmoute/swh-vault
  • rboyer/swh-vault
7 results
Show changes
Commits on Source (340)
# Changes here will be overwritten by Copier
_commit: v0.3.3
_src_path: https://gitlab.softwareheritage.org/swh/devel/swh-py-template.git
description: Software Heritage vault
distribution_name: swh-vault
have_cli: true
have_workers: true
package_root: swh/vault
project_name: swh.vault
python_minimal_version: '3.7'
readme_format: rst
# python: Reformat code with black
be318c7fc864410fb44187fdaeade22ca3ee9914
19fc56a7ffa2a7715b8b0dcb1673f0d6f697313a
d746a27c972076801a7a217261443f10b186d15b
*.egg-info/
*.pyc
*.sw?
*~
.coverage
.eggs/
.hypothesis
.mypy_cache
.tox
__pycache__
dist
*.egg-info
version.txt
build/
dist/
# these are symlinks created by a hook in swh-docs' main sphinx conf.py
docs/README.rst
docs/README.md
# this should be a symlink for people who want to build the sphinx doc
# without using tox, generally created by the swh-env/bin/update script
docs/Makefile.sphinx
repos:
- repo: https://github.com/pre-commit/pre-commit-hooks
rev: v5.0.0
hooks:
- id: trailing-whitespace
- id: check-json
- id: check-yaml
- repo: https://github.com/python/black
rev: 25.1.0
hooks:
- id: black
- repo: https://github.com/PyCQA/isort
rev: 6.0.0
hooks:
- id: isort
- repo: https://github.com/pycqa/flake8
rev: 7.1.1
hooks:
- id: flake8
additional_dependencies: [flake8-bugbear==24.12.12, flake8-pyproject]
- repo: https://github.com/codespell-project/codespell
rev: v2.4.1
hooks:
- id: codespell
name: Check source code spelling
stages: [pre-commit]
- id: codespell
name: Check commit message spelling
stages: [commit-msg]
- repo: local
hooks:
- id: mypy
name: mypy
entry: mypy
args: [swh]
pass_filenames: false
language: system
types: [python]
- id: twine-check
name: twine check
description: call twine check when pushing an annotated release tag
entry: bash -c "ref=$(git describe) &&
[[ $ref =~ ^v[0-9]+\.[0-9]+\.[0-9]+$ ]] &&
(python3 -m build --sdist && twine check $(ls -t dist/* | head -1)) || true"
pass_filenames: false
stages: [pre-push]
language: python
additional_dependencies: [twine, build]
# Software Heritage Code of Conduct
## Our Pledge
In the interest of fostering an open and welcoming environment, we as Software
Heritage contributors and maintainers pledge to making participation in our
project and our community a harassment-free experience for everyone, regardless
of age, body size, disability, ethnicity, sex characteristics, gender identity
and expression, level of experience, education, socioeconomic status,
nationality, personal appearance, race, religion, or sexual identity and
orientation.
## Our Standards
Examples of behavior that contributes to creating a positive environment
include:
* Using welcoming and inclusive language
* Being respectful of differing viewpoints and experiences
* Gracefully accepting constructive criticism
* Focusing on what is best for the community
* Showing empathy towards other community members
Examples of unacceptable behavior by participants include:
* The use of sexualized language or imagery and unwelcome sexual attention or
advances
* Trolling, insulting/derogatory comments, and personal or political attacks
* Public or private harassment
* Publishing others' private information, such as a physical or electronic
address, without explicit permission
* Other conduct which could reasonably be considered inappropriate in a
professional setting
## Our Responsibilities
Project maintainers are responsible for clarifying the standards of acceptable
behavior and are expected to take appropriate and fair corrective action in
response to any instances of unacceptable behavior.
Project maintainers have the right and responsibility to remove, edit, or
reject comments, commits, code, wiki edits, issues, and other contributions
that are not aligned to this Code of Conduct, or to ban temporarily or
permanently any contributor for other behaviors that they deem inappropriate,
threatening, offensive, or harmful.
## Scope
This Code of Conduct applies within all project spaces, and it also applies when
an individual is representing the project or its community in public spaces.
Examples of representing a project or community include using an official
project e-mail address, posting via an official social media account, or acting
as an appointed representative at an online or offline event. Representation of
a project may be further defined and clarified by project maintainers.
## Enforcement
Instances of abusive, harassing, or otherwise unacceptable behavior may be
reported by contacting the project team at `conduct@softwareheritage.org`. All
complaints will be reviewed and investigated and will result in a response that
is deemed necessary and appropriate to the circumstances. The project team is
obligated to maintain confidentiality with regard to the reporter of an
incident. Further details of specific enforcement policies may be posted
separately.
Project maintainers who do not follow or enforce the Code of Conduct in good
faith may face temporary or permanent repercussions as determined by other
members of the project's leadership.
## Attribution
This Code of Conduct is adapted from the [Contributor Covenant][homepage], version 1.4,
available at https://www.contributor-covenant.org/version/1/4/code-of-conduct.html
[homepage]: https://www.contributor-covenant.org
For answers to common questions about this code of conduct, see
https://www.contributor-covenant.org/faq
Quentin Campos
include Makefile
include Makefile.local
include README.db_testing
include README.dev
include requirements.txt
include requirements-swh.txt
include version.txt
recursive-include sql *
Software Heritage - Vault
=========================
User-facing service that allows to retrieve parts of the archive as
self-contained bundles (e.g., individual releases, entire repository snapshots,
etc.)
The creation of a bundle is called "cooking" a bundle.
Architecture
------------
The vault is made of two main parts:
1. a stateful RPC server called the **backend**
2. Celery tasks, called **cookers**
# Copyright (C) 2020-2023 The Software Heritage developers
# See the AUTHORS file at the top-level directory of this distribution
# License: GNU General Public License version 3, or any later version
# See top-level LICENSE file for more information
pytest_plugins = [
"swh.storage.pytest_plugin",
]
swh-vault (0.0.1-1) unstable; urgency=low
* Create swh-vault package
-- Antoine Pietri <antoine.pietri@inria.fr> Fri, 05 May 2017 16:08:15 +0200
9
Source: swh-vault
Maintainer: Software Heritage developers <swh-devel@inria.fr>
Section: python
Priority: optional
Build-Depends: debhelper (>= 9),
dh-python (>= 2),
python3-all,
python3-click,
python3-dateutil,
python3-dulwich,
python3-fastimport,
python3-flask,
python3-nose,
python3-psycopg2,
python3-setuptools,
python3-swh.core (>= 0.0.28~),
python3-swh.model (>= 0.0.18~),
python3-swh.objstorage (>= 0.0.17~),
python3-swh.scheduler (>= 0.0.11~),
python3-swh.storage (>= 0.0.92~),
python3-vcversioner
Standards-Version: 3.9.6
Homepage: https://forge.softwareheritage.org/diffusion/DVAU/
Package: python3-swh.vault
Architecture: all
Depends: python3-swh.core (>= 0.0.28~),
python3-swh.model (>= 0.0.18~),
python3-swh.objstorage (>= 0.0.17~),
python3-swh.scheduler (>= 0.0.11~),
python3-swh.storage (>= 0.0.92~),
${misc:Depends},
${python3:Depends}
Description: Software Heritage Vault
Format: http://www.debian.org/doc/packaging-manuals/copyright-format/1.0/
Files: *
Copyright: 2015 The Software Heritage developers
License: GPL-3+
License: GPL-3+
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation; either version 3 of the License, or
(at your option) any later version.
.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
.
You should have received a copy of the GNU General Public License
along with this program. If not, see <http://www.gnu.org/licenses/>.
.
On Debian systems, the complete text of the GNU General Public
License version 3 can be found in `/usr/share/common-licenses/GPL-3'.
#!/usr/bin/make -f
export PYBUILD_NAME=swh.vault
export PYBUILD_TEST_ARGS=--with-doctest -sv -a !db,!fs
%:
dh $@ --with python3 --buildsystem=pybuild
override_dh_install:
dh_install
rm -v $(CURDIR)/debian/python3-*/usr/lib/python*/dist-packages/swh/__init__.py
3.0 (quilt)
include ../../swh-docs/Makefile.sphinx
include Makefile.sphinx
../README.rst
\ No newline at end of file
.. _vault-api-ref:
Vault API Reference
===================
Software source code **objects**---e.g., individual files, directories,
commits, tagged releases, etc.---are stored in the Software Heritage (SWH)
Archive in fully deduplicated form. That allows direct access to individual
artifacts, but require some preparation ("cooking") when fast access to a large
set of related objects (e.g., an entire repository) is required.
The **Software Heritage Vault** takes care of that preparation by
asynchronously assembling **bundles** of related source code objects, caching,
and garbage collecting them as needed.
The Vault is accessible via a RPC API documented below.
All endpoints are mounted at API root, which is currently at :swh_web:`api/1/`.
Unless otherwise stated, API endpoints respond to HTTP GET method.
Object identification
---------------------
The vault stores bundles corresponding to different kinds of objects (see
:ref:`data-model`).
The URL fragment ``:bundletype/:swhid`` is used throughout the vault API to
identify vault objects. See :ref:`persistent-identifiers` for details on
the syntax and meaning of ``:swhid``.
Bundle types
------------
Flat
~~~~
Flat bundles are simple tarballs that can be read without any specialized software.
When cooking directories, they are (very close to) the original directories that
were ingested.
When cooking other types of objects, they have multiple root directories,
each corresponding to an original object (revision, ...)
This is typically only useful to cook directories; cooking other types of objects
(revisions, releases, snapshots) are usually done with ``git-bare`` as it is
more efficient and closer to the original repository.
You can extract the resulting bundle using:
.. code:: shell
tar xaf bundle.tar.gz
gitfast
~~~~~~~
A gzip-compressed `git fast-export
<https://git-scm.com/docs/git-fast-export>`_. You can extract the resulting
bundle using:
.. code:: shell
git init
zcat bundle.gitfast.gz | git fast-import
git checkout HEAD
git-bare
~~~~~~~~
A tarball that can be decompressed to get a real git repository.
It is without a checkout, so it is the equivalent of what one would get
with ``git clone --bare``.
This is the most flexible bundle type, as it allow to perfectly recreate
original git repositories, including branches.
You can extract the resulting bundle using:
.. code:: shell
tar xaf bundle.tar.gz
Then explore its content like a normal ("non-bare") git repository by cloning it:
.. code:: shell
git clone path/to/extracted/:swhid
Cooking and status checking
---------------------------
Vault bundles might be ready for retrieval or not. When they are not, they will
need to be **cooked** before they can be retrieved. A cooked bundle will remain
around until it expires; after expiration, it will need to be cooked again
before it can be retrieved. Cooking is idempotent, and a no-op in between a
previous cooking operation and expiration.
.. http:post:: /vault/:bundletype/:swhid
.. http:get:: /vault/:bundletype/:swhid
**Request body**: optionally, an ``email`` POST parameter containing an
e-mail to notify when the bundle cooking has ended.
**Allowed HTTP Methods:**
- :http:method:`post` to **request** a bundle cooking
- :http:method:`get` to check the progress and status of the cooking
- :http:method:`head`
- :http:method:`options`
**Response:**
:statuscode 200: bundle available for cooking, status of the cooking
:statuscode 400: malformed SWHID
:statuscode 404: unavailable bundle or object not found
.. sourcecode:: http
HTTP/1.1 200 OK
Content-Type: application/json
{
"id": 42,
"fetch_url": "/api/1/vault/flat/:swhid/raw/",
"swhid": ":swhid",
"progress_message": "Creating tarball...",
"status": "pending"
}
After a cooking request has been started, all subsequent GET and POST
requests to the cooking URL return some JSON data containing information
about the progress of the bundle creation. The JSON contains the
following keys:
- ``id``: the ID of the cooking request
- ``fetch_url``: the URL that can be used for the retrieval of the bundle
- ``swhid``: the identifier of the requested bundle
- ``progress_message``: a string describing the current progress of the
cooking. If the cooking failed, ``progress_message`` will contain the
reason of the failure.
- ``status``: one of the following values:
- ``new``: the bundle request was created
- ``pending``: the bundle is being cooked
- ``done``: the bundle has been cooked and is ready for retrieval
- ``failed``: the bundle cooking failed and can be retried
Retrieval
---------
Retrieve a specific bundle from the vault with:
.. http:get:: /vault/:bundletype/:swhid/raw
**Allowed HTTP Methods:** :http:method:`get`, :http:method:`head`,
:http:method:`options`
**Response**:
:statuscode 200: bundle available; response body is the bundle.
:statuscode 404: unavailable bundle; client should request its cooking.
.. _swh-vault-cli:
Command-line interface
======================
.. click:: swh.vault.cli:vault
:prog: swh vault
:nested: full
.. _vault-primer:
Getting started
===============
The Vault is a service in charge of reconstructing parts of the archive
as self-contained bundles, that can then be imported locally, for
instance in a Git repository. This is basically where you can do a
``git clone`` of a repository stored in Software Heritage.
The Vault is asynchronous : you first need to do a request to prepare
the bundle you need, and then a second request to fetch the bundle once
the Vault has finished to reconstitute the bundle.
Example: retrieving a directory
-------------------------------
First, ask the Vault to prepare your bundle:
.. code:: shell
curl -X POST https://archive.softwareheritage.org/api/1/vault/flat/:swhid/
where ``:swhid`` is a :ref:`persistent-identifiers`. This initial request and all
subsequent requests to this endpoint will return some JSON data containing
information about the progress of bundle creation:
.. code:: json
{
"id": 42,
"fetch_url": "/api/1/vault/flat/:swhid/raw/",
"swhid": ":swhid",
"progress_message": "Creating tarball...",
"status": "pending"
}
Once the status is ``done``, you can fetch the bundle at the address
given in the ``fetch_url`` field.
.. code:: shell
curl -o bundle.tar.gz https://archive.softwareheritage.org/api/1/vault/flat/:swhid/raw
tar xaf bundle.tar.gz
E-mail notifications
--------------------
You can also ask to be notified by e-mail once the bundle you requested is
ready, by giving an ``email`` POST parameter:
.. code:: shell
curl -X POST -d 'email=example@example.com' \
https://archive.softwareheritage.org/api/1/vault/directory/:dir_id/
API reference
~~~~~~~~~~~~~
For a more exhaustive overview of the Vault API, see the :ref:`vault-api-ref`.