Skip to content
Snippets Groups Projects

Compare revisions

Changes are shown as if the source revision was being merged into the target revision. Learn more about comparing revisions.

Source

Select target project
No results found

Target

Select target project
  • lunar/swh-deposit
  • anlambert/swh-deposit
  • swh/devel/swh-deposit
  • douardda/swh-deposit
  • ardumont/swh-deposit
  • marmoute/swh-deposit
  • rboyer/swh-deposit
7 results
Show changes
Commits on Source (566)
# Changes here will be overwritten by Copier
_commit: v0.3.3
_src_path: https://gitlab.softwareheritage.org/swh/devel/swh-py-template.git
description: Software Heritage deposit server
distribution_name: swh-deposit
have_cli: true
have_workers: true
package_root: swh/deposit
project_name: swh.deposit
python_minimal_version: '3.7'
readme_format: rst
# python: Reformat code with black
f5426d6722826972e2d611d4e7040abbf40c49a1
8a006aeebf7d0cf52abc71b07cd560cbd098349e
7b0fac22d29db6ad27cb650f835cae2f8786ad70
# isort
9c0d0496369828c8fad882d5d676978fb76105f8
*.egg-info/
*.pyc
*.sw?
*~
/.coverage
/.coverage.*
.coverage
.eggs/
.hypothesis
.mypy_cache
.tox
__pycache__
*.egg-info/
version.txt
build/
dist/
/analysis.org
/swh/deposit/fixtures/private_data.yaml
/swh/deposit.json
/test.json
/swh/test
db.sqlite3
/.noseids
*.tgz
*.zip
*.tar.gz
*.tar.bz2
*.tar.lzma
.tox/
# these are symlinks created by a hook in swh-docs' main sphinx conf.py
docs/README.rst
docs/README.md
# this should be a symlink for people who want to build the sphinx doc
# without using tox, generally created by the swh-env/bin/update script
docs/Makefile.sphinx
exclude: ^swh/deposit/tests/data/atom/.*$
repos:
- repo: https://github.com/pre-commit/pre-commit-hooks
rev: v5.0.0
hooks:
- id: trailing-whitespace
- id: check-json
- id: check-yaml
- repo: https://github.com/python/black
rev: 25.1.0
hooks:
- id: black
- repo: https://github.com/PyCQA/isort
rev: 6.0.0
hooks:
- id: isort
- repo: https://github.com/pycqa/flake8
rev: 7.1.1
hooks:
- id: flake8
additional_dependencies: [flake8-bugbear==24.12.12, flake8-pyproject]
- repo: https://github.com/codespell-project/codespell
rev: v2.4.1
hooks:
- id: codespell
name: Check source code spelling
args: [-L sur]
stages: [pre-commit]
- id: codespell
name: Check commit message spelling
stages: [commit-msg]
- repo: local
hooks:
- id: mypy
name: mypy
entry: env DJANGO_SETTINGS_MODULE=swh.deposit.settings.testing mypy
args: [swh]
pass_filenames: false
language: system
types: [python]
- id: twine-check
name: twine check
description: call twine check when pushing an annotated release tag
entry: bash -c "ref=$(git describe) &&
[[ $ref =~ ^v[0-9]+\.[0-9]+\.[0-9]+$ ]] &&
(python3 -m build --sdist && twine check $(ls -t dist/* | head -1)) || true"
pass_filenames: false
stages: [pre-push]
language: python
additional_dependencies: [twine, build]
# Software Heritage Code of Conduct
## Our Pledge
In the interest of fostering an open and welcoming environment, we as Software
Heritage contributors and maintainers pledge to making participation in our
project and our community a harassment-free experience for everyone, regardless
of age, body size, disability, ethnicity, sex characteristics, gender identity
and expression, level of experience, education, socioeconomic status,
nationality, personal appearance, race, religion, or sexual identity and
orientation.
## Our Standards
Examples of behavior that contributes to creating a positive environment
include:
* Using welcoming and inclusive language
* Being respectful of differing viewpoints and experiences
* Gracefully accepting constructive criticism
* Focusing on what is best for the community
* Showing empathy towards other community members
Examples of unacceptable behavior by participants include:
* The use of sexualized language or imagery and unwelcome sexual attention or
advances
* Trolling, insulting/derogatory comments, and personal or political attacks
* Public or private harassment
* Publishing others' private information, such as a physical or electronic
address, without explicit permission
* Other conduct which could reasonably be considered inappropriate in a
professional setting
## Our Responsibilities
Project maintainers are responsible for clarifying the standards of acceptable
behavior and are expected to take appropriate and fair corrective action in
response to any instances of unacceptable behavior.
Project maintainers have the right and responsibility to remove, edit, or
reject comments, commits, code, wiki edits, issues, and other contributions
that are not aligned to this Code of Conduct, or to ban temporarily or
permanently any contributor for other behaviors that they deem inappropriate,
threatening, offensive, or harmful.
## Scope
This Code of Conduct applies within all project spaces, and it also applies when
an individual is representing the project or its community in public spaces.
Examples of representing a project or community include using an official
project e-mail address, posting via an official social media account, or acting
as an appointed representative at an online or offline event. Representation of
a project may be further defined and clarified by project maintainers.
## Enforcement
Instances of abusive, harassing, or otherwise unacceptable behavior may be
reported by contacting the project team at `conduct@softwareheritage.org`. All
complaints will be reviewed and investigated and will result in a response that
is deemed necessary and appropriate to the circumstances. The project team is
obligated to maintain confidentiality with regard to the reporter of an
incident. Further details of specific enforcement policies may be posted
separately.
Project maintainers who do not follow or enforce the Code of Conduct in good
faith may face temporary or permanent repercussions as determined by other
members of the project's leadership.
## Attribution
This Code of Conduct is adapted from the [Contributor Covenant][homepage], version 1.4,
available at https://www.contributor-covenant.org/version/1/4/code-of-conduct.html
[homepage]: https://www.contributor-covenant.org
For answers to common questions about this code of conduct, see
https://www.contributor-covenant.org/faq
Ishan Bhanuka
include Makefile
include requirements.txt
include requirements-test.txt
include requirements-swh.txt
include requirements-server.txt
include requirements-swh-server.txt
include version.txt
recursive-include swh/deposit/static *
recursive-include swh/deposit/fixtures *
recursive-include swh/deposit/templates *
......@@ -26,5 +26,9 @@ run-dev:
run:
gunicorn3 -b 127.0.0.1:5006 swh.deposit.wsgi
test:
./swh/deposit/manage.py test
# Override default rule to make sure DJANGO env var is properly set. It
# *should* work without any override thanks to the mypy django-stubs plugin,
# but it currently doesn't; see
# https://github.com/typeddjango/django-stubs/issues/166
check-mypy:
DJANGO_SETTINGS_MODULE=swh.deposit.settings.testing $(MYPY) $(MYPYFLAGS) swh
# swh-deposit
This is [Software Heritage](https://www.softwareheritage.org)'s
[SWORD 2.0](http://swordapp.github.io/SWORDv2-Profile/SWORDProfile.html) Server
implementation, as well as a simple client to upload deposits on the server.
**S.W.O.R.D** (**S**imple **W**eb-Service **O**ffering **R**epository
**D**eposit) is an interoperability standard for digital file deposit.
This implementation will permit interaction between a client (a
repository) and a server (SWH repository) to permit deposits of
software source code archives and associated metadata.
The documentation is at ./docs/README-specification.md
Software Heritage - Deposit
===========================
Simple Web-Service Offering Repository Deposit (S.W.O.R.D) is an interoperability
standard for digital file deposit.
This repository is both the `SWORD v2`_ Server and a deposit command-line client
implementations.
This implementation allows interaction between a client (a repository) and a server (SWH
repository) to deposit software source code archives and associated metadata.
Description
-----------
Most of the software source code artifacts present in the SWH Archive are gathered by
the mean of `loader`_ workers run by the SWH project from source code
origins identified by `lister`_ workers. This is a pull mechanism: it's
the responsibility of the SWH project to gather and collect source code artifacts that
way.
Alternatively, SWH allows its partners to push source code artifacts and metadata
directly into the Archive with a push-based mechanism. By using this possibility
different actors, holding software artifacts or metadata, can preserve their assets
without having to pass through an intermediate collaborative development platform, which
is already harvested by SWH (e.g GitHub, Gitlab, etc.).
This mechanism is the ``deposit``.
The main idea is the deposit is an authenticated access to an API allowing the user to
provide source code artifacts -- with metadata -- to be ingested in the SWH Archive. The
result of that is a `SWHID`_ that can be used to uniquely
and persistently identify that very piece of source code.
This unique identifier can then be used to `reference the source code
<https://hal.archives-ouvertes.fr/hal-02446202>`_ (e.g. in a `scientific paper
<https://www.softwareheritage.org/2020/05/26/citing-software-with-style/>`_) and
retrieve it using the `vault`_ feature of the SWH Archive platform.
The differences between a piece of code uploaded using the deposit rather than simply
asking SWH to archive a repository using the `save code now`_ feature
are:
- a deposited artifact is provided from one of the SWH partners which is regarded as a
trusted authority,
- a deposited artifact requires metadata properties describing the source code artifact,
- a deposited artifact has a codemeta_ metadata entry attached to it,
- a deposited artifact has the same visibility on the SWH Archive than a collected
repository,
- a deposited artifact can be searched with its provided url property on the SWH
Archive,
- the deposit API uses the `SWORD v2`_ API, thus requires some tooling to send deposits
to SWH. These tools are provided with this repository.
See the `User Manual`_ page for more details on how to use the deposit client
command line tools to push a deposit in the SWH Archive.
See the `API Documentation`_ reference pages of the SWORDv2 API implementation
in ``swh.deposit`` if you want to do upload deposits using HTTP requests.
Read the `Deposit metadata`_ chapter to get more details on what metadata
are supported when doing a deposit.
See `Running swh-deposit locally`_ if you want to hack the code of the ``swh.deposit`` module.
See `Production deployment`_ if you want to deploy your own copy of the
`swh.deposit` stack.
.. _codemeta: https://codemeta.github.io/
.. _SWORD v2: http://swordapp.org/sword-v2/
.. _loader: https://docs.softwareheritage.org/devel/glossary.html#term-loader
.. _lister: https://docs.softwareheritage.org/devel/glossary.html#term-lister
.. _SWHID: https://docs.softwareheritage.org/devel/swh-model/persistent-identifiers.html#persistent-identifiers
.. _vault: https://docs.softwareheritage.org/devel/swh-vault/index.html#swh-vault
.. _save code now: https://archive.softwareheritage.org/save/
.. _User Manual: https://docs.softwareheritage.org/devel/swh-deposit/api/user-manual.html#deposit-user-manual
.. _API Documentation: https://docs.softwareheritage.org/devel/swh-deposit/api/api-documentation.html#deposit-api-specifications
.. _Deposit metadata: https://docs.softwareheritage.org/devel/swh-deposit/api/metadata.html#deposit-metadata
.. _Running swh-deposit locally: https://docs.softwareheritage.org/devel/swh-deposit/internals/dev-environment.html#swh-deposit-dev-env
.. _Production deployment: https://docs.softwareheritage.org/devel/swh-deposit/internals/prod-environment.html#swh-deposit-prod-env
\ No newline at end of file
# Copyright (C) 2020 The Software Heritage developers
# See the AUTHORS file at the top-level directory of this distribution
# License: GNU General Public License version 3, or any later version
# See top-level LICENSE file for more information
import pytest
pytest_plugins = [
"swh.scheduler.pytest_plugin",
"swh.storage.pytest_plugin",
"swh.core.pytest_plugin",
]
@pytest.fixture(scope="session")
def swh_scheduler_celery_includes(swh_scheduler_celery_includes):
return swh_scheduler_celery_includes + [
"swh.deposit.loader.tasks",
]
_build/
apidoc/
*-stamp
include ../../swh-docs/Makefile.sphinx
include Makefile.sphinx
APIDOC_EXCLUDES += ../swh/*/settings/*
sphinx/html: images
sphinx/clean: clean-images
images:
make -C images/
clean-images:
make -C images/ clean
clean: clean-images
.PHONY: images clean-images
../README.rst
\ No newline at end of file
.. _deposit-api-specifications:
API Documentation
=================
This is `Software Heritage <https://www.softwareheritage.org>`__'s
`SWORD
2.0 <http://swordapp.github.io/SWORDv2-Profile/SWORDProfile.html>`__
Server implementation.
**S.W.O.R.D** (**S**\ imple **W**\ eb-Service **O**\ ffering
**R**\ epository **D**\ eposit) is an interoperability standard for
digital file deposit.
This implementation will permit interaction between a client (a repository) and
a server (SWH repository) to push deposits of software source code archives
with associated metadata.
*Note:*
* In the following document, we will use the ``archive`` or ``software source
code archive`` interchangeably.
* The supported archive formats are:
* zip: common zip archive (no multi-disk zip files).
* tar: tar archive without compression or optionally any of the following
compression algorithm gzip (.tar.gz, .tgz), bzip2 (.tar.bz2) , or lzma
(.tar.lzma)
.. _swh-deposit-collection:
Collection
----------
SWORD defines a ``collection`` concept. In SWH's case, this collection
refers to a group of deposits. A ``deposit`` is some form of software
source code archive(s) associated with metadata.
By default the client's collection will have the client's name.
Limitations
-----------
* upload limitation of 100Mib
* no mediation
API overview
------------
API access is over HTTPS.
The API is protected through basic authentication.
Endpoints
---------
The API endpoints are rooted at https://deposit.softwareheritage.org/1/.
Data is sent and received as XML (as specified in the SWORD 2.0
specification).
.. toctree::
../endpoints/service-document.rst
../endpoints/collection.rst
../endpoints/update-media.rst
../endpoints/update-metadata.rst
../endpoints/status.rst
../endpoints/content.rst
Possible errors:
----------------
* common errors:
* :http:statuscode:`401`:if a client does not provide credential or provide
wrong ones
* :http:statuscode:`403` a client tries access to a collection it does not own
* :http:statuscode:`404` if a client tries access to an unknown collection
* :http:statuscode:`404` if a client tries access to an unknown deposit
* :http:statuscode:`415` if a wrong media type is provided to the endpoint
* archive/binary deposit:
* :http:statuscode:`403` the length of the archive exceeds the max size
configured
* :http:statuscode:`412` the length or hash provided mismatch the reality of
the archive.
* :http:statuscode:`415` if a wrong media type is provided
* multipart deposit:
* :http:statuscode:`412` the md5 hash provided mismatch the reality of the
archive
* :http:statuscode:`415` if a wrong media type is provided
* Atom entry deposit:
* :http:statuscode:`400` if the request's body is empty (for creation only)
Sources
-------
* `SWORD v2 specification
<http://swordapp.github.io/SWORDv2-Profile/SWORDProfile.html>`__
* `arxiv documentation <https://arxiv.org/help/submit_sword>`__
* `Dataverse example <http://guides.dataverse.org/en/4.3/api/sword.html>`__
* `SWORD used on HAL <https://api.archives-ouvertes.fr/docs/sword>`__
* `xml examples for CCSD <https://github.com/CCSDForge/HAL/tree/master/Sword>`__
.. _swh-deposit-api:
Deposit API
===========
.. toctree::
:maxdepth: 2
:caption: Contents:
user-manual
api-documentation
metadata
use-cases
register-account
.. _deposit-metadata:
Deposit metadata
================
When making a software deposit into the SWH archive, one can add
information describing the software artifact and the software project.
.. _metadata-requirements:
Metadata requirements
---------------------
- **the schema/vocabulary** used *MUST* be specified with a persistent url
(DublinCore, DOAP, CodeMeta, etc.)
.. code:: xml
<entry xmlns="http://www.w3.org/2005/Atom">
or
<entry xmlns="http://www.w3.org/2005/Atom"
xmlns:dcterms="http://purl.org/dc/terms/">
or
<entry xmlns="http://www.w3.org/2005/Atom"
xmlns:codemeta="https://doi.org/10.5063/SCHEMA/CODEMETA-2.0">
- **the name** of the software deposit *MUST* be provided [atom:title,
codemeta:name, dcterms:title]
- **the authors** of the software deposit *MUST* be provided
- **the url** representing the location of the source *MAY* be provided under
the url tag. The url will be used for creating an origin object in the
archive.
.. code:: xml
<codemeta:url>http://example.com/my_project</codemeta:url>
- **the create\_origin** tag *SHOULD* be used to specify the URL of the origin
to create (otherwise, a fallback is created using the slug, or a random
string if missing)
- **the description** of the software deposit *SHOULD* be provided
[codemeta:description]: short or long description of the software
- **the license/s** of the software
deposit *SHOULD* be provided [codemeta:license]
- other metadata *MAY* be added with terms defined by the schema in use.
Examples
--------
Using only Atom
^^^^^^^^^^^^^^^
.. code:: xml
<?xml version="1.0"?>
<entry xmlns="http://www.w3.org/2005/Atom"
xmlns:swhdeposit="https://www.softwareheritage.org/schema/2018/deposit">
<title>Awesome Compiler</title>
<id>urn:uuid:1225c695-cfb8-4ebb-aaaa-80da344efa6a</id>
<updated>2017-10-07T15:17:08Z</updated>
<author>some awesome author</author>
<swhdeposit:deposit>
<swhdeposit:create_origin>
<swhdeposit:origin url="http://example.com/my_project" />
</swhdeposit:create_origin>
</swhdeposit:deposit>
</entry>
Using Atom with CodeMeta
^^^^^^^^^^^^^^^^^^^^^^^^
.. code:: xml
<?xml version="1.0"?>
<entry xmlns="http://www.w3.org/2005/Atom"
xmlns:codemeta="https://doi.org/10.5063/SCHEMA/CODEMETA-2.0"
xmlns:swhdeposit="https://www.softwareheritage.org/schema/2018/deposit">
<title>Awesome Compiler</title>
<id>urn:uuid:1225c695-cfb8-4ebb-aaaa-80da344efa6a</id>
<swhdeposit:deposit>
<swhdeposit:create_origin>
<swhdeposit:origin url="http://example.com/1785io25c695" />
</swhdeposit:create_origin>
</swhdeposit:deposit>
<codemeta:id>1785io25c695</codemeta:id>
<codemeta:url>origin url</codemeta:url>
<codemeta:identifier>other identifier, DOI, ARK</codemeta:identifier>
<codemeta:applicationCategory>Domain</codemeta:applicationCategory>
<codemeta:description>description</codemeta:description>
<codemeta:keywords>key-word 1</codemeta:keywords>
<codemeta:keywords>key-word 2</codemeta:keywords>
<codemeta:dateCreated>creation date</codemeta:dateCreated>
<codemeta:datePublished>publication date</codemeta:datePublished>
<codemeta:releaseNotes>comment</codemeta:releaseNotes>
<codemeta:referencePublication>
<codemeta:name> article name</codemeta:name>
<codemeta:identifier> article id </codemeta:identifier>
</codemeta:referencePublication>
<codemeta:isPartOf>
<codemeta:type> Collaboration/Project </codemeta:type>
<codemeta:name> project name</codemeta:name>
<codemeta:identifier> id </codemeta:identifier>
</codemeta:isPartOf>
<codemeta:relatedLink>see also </codemeta:relatedLink>
<codemeta:funding>Sponsor A </codemeta:funding>
<codemeta:funding>Sponsor B</codemeta:funding>
<codemeta:operatingSystem>Platform/OS </codemeta:operatingSystem>
<codemeta:softwareRequirements>dependencies </codemeta:softwareRequirements>
<codemeta:softwareVersion>Version</codemeta:softwareVersion>
<codemeta:developmentStatus>active </codemeta:developmentStatus>
<codemeta:license>
<codemeta:name>license</codemeta:name>
<codemeta:url>url spdx</codemeta:url>
</codemeta:license>
<codemeta:runtimePlatform>.Net Framework 3.0 </codemeta:runtimePlatform>
<codemeta:runtimePlatform>Python2.3</codemeta:runtimePlatform>
<codemeta:author>
<codemeta:name> author1 </codemeta:name>
<codemeta:affiliation> Inria </codemeta:affiliation>
<codemeta:affiliation> UPMC </codemeta:affiliation>
</codemeta:author>
<codemeta:author>
<codemeta:name> author2 </codemeta:name>
<codemeta:affiliation> Inria </codemeta:affiliation>
<codemeta:affiliation> UPMC </codemeta:affiliation>
</codemeta:author>
<codemeta:codeRepository>http://code.com</codemeta:codeRepository>
<codemeta:programmingLanguage>language 1</codemeta:programmingLanguage>
<codemeta:programmingLanguage>language 2</codemeta:programmingLanguage>
<codemeta:issueTracker>http://issuetracker.com</codemeta:issueTracker>
</entry>
Using Atom with DublinCore and CodeMeta (multi-schema entry)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
.. code:: xml
<?xml version="1.0"?>
<entry xmlns="http://www.w3.org/2005/Atom"
xmlns:dcterms="http://purl.org/dc/terms/"
xmlns:codemeta="https://doi.org/10.5063/SCHEMA/CODEMETA-2.0"
xmlns:swhdeposit="https://www.softwareheritage.org/schema/2018/deposit">
<title>Awesome Compiler</title>
<id>urn:uuid:1225c695-cfb8-4ebb-aaaa-80da344efa6a</id>
<swhdeposit:deposit>
<swhdeposit:create_origin>
<swhdeposit:origin url="http://example.com/225c695-cfb8-4ebb-aaaa-80da344efa6a" />
</swhdeposit:create_origin>
<swhdeposit:deposit>
<dcterms:identifier>hal-01587361</dcterms:identifier>
<dcterms:identifier>doi:10.5281/zenodo.438684</dcterms:identifier>
<dcterms:title xml:lang="en">The assignment problem</dcterms:title>
<dcterms:title xml:lang="fr">AffectationRO</dcterms:title>
<dcterms:creator>author</dcterms:creator>
<dcterms:subject>[INFO] Computer Science [cs]</dcterms:subject>
<dcterms:subject>[INFO.INFO-RO] Computer Science [cs]/Operations Research [cs.RO]</dcterms:subject>
<dcterms:type>SOFTWARE</dcterms:type>
<dcterms:abstract xml:lang="en">Project in OR: The assignment problemA java implementation for the assignment problem first release</dcterms:abstract>
<dcterms:abstract xml:lang="fr">description fr</dcterms:abstract>
<dcterms:created>2015-06-01</dcterms:created>
<dcterms:available>2017-10-19</dcterms:available>
<dcterms:language>en</dcterms:language>
<codemeta:url>origin url</codemeta:url>
<codemeta:softwareVersion>1.0.0</codemeta:softwareVersion>
<codemeta:keywords>key word</codemeta:keywords>
<codemeta:releaseNotes>Comment</codemeta:releaseNotes>
<codemeta:referencePublication>Rfrence interne </codemeta:referencePublication>
<codemeta:relatedLink>link </codemeta:relatedLink>
<codemeta:funding>Sponsor </codemeta:funding>
<codemeta:operatingSystem>Platform/OS </codemeta:operatingSystem>
<codemeta:softwareRequirements>dependencies </codemeta:softwareRequirements>
<codemeta:developmentStatus>Ended </codemeta:developmentStatus>
<codemeta:license>
<codemeta:name>license</codemeta:name>
<codemeta:url>url spdx</codemeta:url>
</codemeta:license>
<codemeta:codeRepository>http://code.com</codemeta:codeRepository>
<codemeta:programmingLanguage>language 1</codemeta:programmingLanguage>
<codemeta:programmingLanguage>language 2</codemeta:programmingLanguage>
</entry>
Note
----
We aim on harmonizing the metadata from different origins and thus
metadata will be translated to the `CodeMeta
v2.0 <https://doi.org/10.5063/SCHEMA/CODEMETA-2.0>`__ vocabulary if
possible.
See :ref:`deposit-protocol` for details on the content of ``<swh:deposit>``
elements.
.. _swh-deposit-register-account:
.. admonition:: Intended audience
:class: important
- deposit clients
- sysadm staff members
Register account
================
.. _swh-deposit-register-account-as-deposit-client:
Becoming a deposit client is very easy, just write to deposit@softwareheritage.org
to setup the deposit partner agreement. With the agreement signed you can follow the
steps below.
As a deposit client
-------------------
For this, as a client, you need to register an account on the swh keycloak `production
<https://archive.softwareheritage.org/oidc/login/>`_
or `staging
<https://webapp.staging.swh.network/oidc/login/>`_
instance.
.. _swh-deposit-register-account-as-sysadm:
As a sysadm
-----------
1. Retrieve the deposit client login (through email exchange or any other media).
2. Require a :ref:`provider url <swh-deposit-provider-url-definition>` from the deposit
client (through email exchange or any other media).
3. Within the keycloak `production instance <https://auth.softwareheritage.org/auth/admin/SoftwareHeritage/console/#/realms/SoftwareHeritage>`_ or `staging
instance <https://auth.softwareheritage.org/auth/admin/SoftwareHeritageStaging/console/#/realms/SoftwareHeritageStaging>`_, add the `swh.deposit.api` role to the deposit
client login.
4. Create an :ref:`associated deposit collection
<swh-deposit-add-client-and-collection>` in the deposit instance.
5. Create :ref:`a deposit client <swh-deposit-add-client-and-collection>` with the
provider url in the deposit instance.
6. To ensure everything is ok, ask the deposit client to check they can access at least
the service document iri (authenticated).
.. _deposit-use-cases:
Use cases
=========
The general idea is that a deposit can be created either in a single request
or by multiple requests to allow the user to add elements to the deposit piece
by piece (be it the deposited data or the metadata describing it).
An update request that does not have the ``In-Progress: true`` HTTP header will
de facto declare the deposit as *completed* (aka in the ``deposited`` status; see
below) and thus ready for ingestion.
Once the deposit is declared *complete* by the user, the server performs a few
validation checks. Then, if valid, schedule the ingestion of the deposited data
in the Software Heritage Archive (SWH).
There is a ``status`` property attached to a deposit allowing to follow the
processing workflow of the deposit. For example, when this ingestion task
completes successfully, the deposit is marked as ``done``.
Possible deposit statuses are:
partial
The deposit is partially received, since it can be done in
multiple requests.
expired
Deposit was there too long and is new deemed ready to be
garbage-collected.
deposited
Deposit is complete, ready to be checked.
rejected
Deposit failed the checks.
verified
Deposit passed the checks and is ready for loading.
loading
Injection is ongoing on SWH's side.
done
Loading is successful.
failed
Loading failed.
.. figure:: ../images/status.svg
:alt:
This document describes the possible scenarios for creating or updating a
deposit.
Deposit creation
----------------
From client's deposit repository server to SWH's repository server:
1. The client requests for the server's abilities and its associated
:ref:`collections <swh-deposit-collection>` using the *SD/service document uri*
(:http:get:`/1/servicedocument/`).
2. The server answers the client with the service document which lists the
*collections* linked to the user account (most of the time, there will one and
only one collection linked to the user's account). Each of these collection can
be used to push a deposit via its *COL/collection IRI*.
3. The client sends a deposit (a zip archive, some metadata or both) through
the *COL/collection uri*.
This can be done in:
* one POST request (metadata + archive) without the `In-Progress: true` header:
- :http:post:`/1/(str:collection-name)/`
* one POST request (metadata or archive) **with** `In-Progress: true` header:
- :http:post:`/1/(str:collection-name)/`
plus one or more PUT or POST requests *to the update uris*
(*edit-media iri* or *edit iri*):
- :http:post:`/1/(str:collection-name)/(int:deposit-id)/media/`
- :http:put:`/1/(str:collection-name)/(int:deposit-id)/media/`
- :http:post:`/1/(str:collection-name)/(int:deposit-id)/metadata/`
- :http:put:`/1/(str:collection-name)/(int:deposit-id)/metadata/`
Then:
a. Server validates the client's input or returns detailed error if any.
b. Server stores information received (metadata or software archive source
code or both).
4. The server creates a loading task and submits it to the
:ref:`Job Scheduler <swh-scheduler>`
5. The server notifies the client it acknowledged the client's request. An
``http 201 Created`` response with a deposit receipt in the body response is
sent back. That deposit receipt will hold the necessary information to
eventually complete the deposit later on if it was incomplete (also known as
status ``partial``).
Schema representation
^^^^^^^^^^^^^^^^^^^^^
Scenario: pushing a deposit via the SWORDv2_ protocol (nominal scenario):
.. figure:: ../images/deposit-create-chart.svg
:alt:
Deposit update
--------------
6. Client updates existing deposit through the *update uris* (one or more POST
or PUT requests to either the *edit-media iri* or *edit iri*).
1. Server validates the client's input or returns detailed error if any
2. Server stores information received (metadata or software archive source
code or both)
This would be the case for example if the client initially posted a
``partial`` deposit (e.g. only metadata with no archive, or an archive
without metadata, or a split archive because the initial one exceeded
the limit size imposed by swh repository deposit).
The content of a deposit can only be updated while it is in the ``partial``
state; this causes the content to be **replaced** (the old version is discarded).
Its metadata, however, can also be updated while in the ``done`` state; see below.
Schema representation
^^^^^^^^^^^^^^^^^^^^^
Scenario: updating a deposit via SWORDv2_ protocol:
.. figure:: ../images/deposit-update-chart.svg
:alt:
Deposit deletion (or associated archive, or associated metadata)
----------------------------------------------------------------
7. Deposit deletion is possible as long as the deposit is still in ``partial``
state.
1. Server validates the client's input or returns detailed error if any
2. Server actually delete information according to request
Schema representation
^^^^^^^^^^^^^^^^^^^^^
Scenario: deleting a deposit via SWORDv2_ protocol:
.. figure:: ../images/deposit-delete-chart.svg
:alt:
Client asks for operation status
--------------------------------
At any time during the next step, operation status can be read through
a GET query to the *state iri*.
Deposit loading
---------------
In one of the previous steps, when a deposit was created or loaded without
``In-Progress: true``, the deposit server created a load task and submitted it
to :ref:`swh-scheduler <swh-scheduler>`.
This triggers the following steps:
Server: Triggering deposit checks
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Once the status ``deposited`` is reached for a deposit, checks for the
associated archive(s) and metadata will be triggered. If those checks
fail, the status is changed to ``rejected`` and nothing more happens
there. Otherwise, the status is changed to ``verified``.
Server: Triggering deposit load
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Once the status ``verified`` is reached for a deposit, loading the
deposit with its associated metadata will be triggered.
The loading will result on status update, either ``done`` or ``failed``
(depending on the loading's status).
This is described in the :ref:`loading specifications document <swh-loading-specs>`.
Completing the deposit
----------------------
When this is all done, the loaders notify the deposit server, which sets
the deposit status to ``done``.
This can then be polled by deposit clients, using the *state iri*.
Deposit metadata updates
------------------------
We saw earlier that a deposit can only be updated when in ``partial`` state.
This is one exception to this rule: its metadata can be updated while in the
``done`` state; which adds a new version of the metadata in the SWH archive,
**in addition to** the old one(s).
In this state, ``In-Progress`` is not allowed, so the deposit cannot go back
in the ``partial`` state, but only to ``deposited``.
As a failsafe, to avoid accidentally updating the wrong deposit, this requires
the ``X-Check-SWHID`` HTTP header to be set to the value of the SWHID of the
deposit's content (returned after the deposit finished loading).
.. _use-case-metadata-only-deposit:
Metadata-only deposit
---------------------
Finally, as an extension to the SWORD protocol, swh-deposit allows a special
type of deposit: metadata-only deposits.
Unlike regular deposit (described above), they do not have a code archive.
Instead, they describe an existing :term:`software artifact` present in the
archive.
This use case is triggered by a ``<reference>`` tag in the Atom document,
see the :ref:`protocol reference <metadata-only-deposit>` for details.
In the current implementation, these deposits are loaded (or rejected)
immediately after a request without ``In-Progress: true`` is made,
ie. they skip the ``loading`` state. This may change in a future version.
.. _SWORDv2: http://swordapp.github.io/SWORDv2-Profile/SWORDProfile.html
.. _deposit-user-manual:
User Manual
===========
This is a guide for how to prepare and push a software deposit with
the ``swh deposit`` commands.
Requirements
------------
You need to :ref:`have an account on the Software Heritage deposit application
<swh-deposit-register-account-as-deposit-client>` to be able to use the service.
Please `contact the Software Heritage team <deposit@softwareheritage.org>`_ for
more information on how to get access to this service.
For testing purpose, a test instance `is available
<https://deposit.staging.swh.network>`_ [#f1]_ and will be used in the examples below.
Once you have an account, you should get a set of access credentials as a
``login`` and a ``password`` (identified as ``<name>`` and ``<pass>`` in the
remaining of this document). A deposit account also comes with a "provider URL"
which is used by SWH to build the :term:`Origin URL<origin>` of deposits
created using this account.
Installation
------------
To install the ``swh.deposit`` command line tools, you need a working Python 3.7+
environment. It is strongly recommended you use a `virtualenv
<https://virtualenv.pypa.io/en/stable/>`_ for this.
.. code:: console
$ python3 -m virtualenv deposit
[...]
$ source deposit/bin/activate
(deposit)$ pip install swh.deposit
[...]
(deposit)$ swh deposit --help
Usage: swh deposit [OPTIONS] COMMAND [ARGS]...
Deposit main command
Options:
-h, --help Show this message and exit.
Commands:
admin Server administration tasks (manipulate user or...
status Deposit's status
upload Software Heritage Public Deposit Client Create/Update...
(deposit)$
Note: in the examples below, we use the `jq`_ tool to make json outputs nicer.
If you do have it already, you may install it using your distribution's
packaging system. For example, on a Debian system:
.. _jq: https://stedolan.github.io/jq/
.. code:: console
$ sudo apt install jq
.. _prepare-deposit:
Prepare a deposit
-----------------
* compress the files in a supported archive format:
- zip: common zip archive (no multi-disk zip files).
- tar: tar archive without compression or optionally any of the
following compression algorithm gzip (``.tar.gz``, ``.tgz``), bzip2
(``.tar.bz2``) , or lzma (``.tar.lzma``)
* (Optional) prepare a metadata file (more details :ref:`deposit-metadata`):
Example:
Assuming you want to deposit the source code of `belenios
<https://gitlab.inria.fr/belenios/belenios>`_ version 1.12
.. code:: console
(deposit)$ wget https://gitlab.inria.fr/belenios/belenios/-/archive/1.12/belenios-1.12.zip
[...]
2020-10-28 11:40:37 (4,56 MB/s) - ‘belenios-1.12.zip’ saved [449880/449880]
(deposit)$
Then you need to prepare a metadata file allowing you to give detailed
information on your deposited source code. A rather minimal Atom with Codemeta
file could be:
.. code:: console
(deposit)$ cat metadata.xml
<?xml version="1.0" encoding="utf-8"?>
<entry xmlns="http://www.w3.org/2005/Atom"
xmlns:codemeta="https://doi.org/10.5063/SCHEMA/CODEMETA-2.0"
xmlns:swh="https://www.softwareheritage.org/schema/2018/deposit">
<title>Verifiable online voting system</title>
<id>belenios-01243065</id>
<codemeta:url>https://gitlab.inria.fr/belenios/belenios</codemeta:url>
<codemeta:applicationCategory>test</codemeta:applicationCategory>
<codemeta:keywords>Online voting</codemeta:keywords>
<codemeta:description>Verifiable online voting system</codemeta:description>
<codemeta:version>1.12</codemeta:version>
<codemeta:runtimePlatform>opam</codemeta:runtimePlatform>
<codemeta:developmentStatus>stable</codemeta:developmentStatus>
<codemeta:programmingLanguage>ocaml</codemeta:programmingLanguage>
<codemeta:license>
<codemeta:name>GNU Affero General Public License</codemeta:name>
</codemeta:license>
<author>
<name>Belenios</name>
<email>belenios@example.com</email>
</author>
<codemeta:author>
<codemeta:name>Belenios Test User</codemeta:name>
</codemeta:author>
<swh:deposit>
<swh:create_origin>
<swh:origin url="http://has.archives-ouvertes.fr/test-01243065" />
</swh:create_origin>
</swh:deposit>
</entry>
(deposit)$
Please read the :ref:`deposit-metadata` page for a more detailed view on the
metadata file formats and semantics; and :ref:`deposit-create_origin` for
a description of the ``<swh:create_origin>`` tag.
Push a deposit
--------------
You can push a deposit with:
* a single deposit (archive + metadata):
The user posts in one query a software
source code archive and associated metadata.
The deposit is directly marked with status ``deposited``.
* a multisteps deposit:
1. Create an incomplete deposit (marked with status ``partial``)
2. Add data to a deposit (in multiple requests if needed)
3. Finalize deposit (the status becomes ``deposited``)
* a metadata-only deposit:
The user posts in one query an associated metadata file on a :ref:`SWHID
<persistent-identifiers>` object. The deposit is directly marked with status
``done``.
Overall, a deposit can be a in series of steps as follow:
.. figure:: ../images/status.svg
:alt:
The important things to notice for now is that it can be:
partial:
the deposit is partially received
expired:
deposit has been there too long and is now deemed
ready to be garbage collected
deposited:
deposit is complete and is ready to be checked to ensure data consistency
verified:
deposit is fully received, checked, and ready for loading
loading:
loading is ongoing on swh's side
done:
loading is successful
failed:
loading is a failure
When you push a deposit, it is either in the ``deposited`` state or in the
``partial`` state if you asked for a partial upload.
Single deposit
^^^^^^^^^^^^^^
Once the files are ready for deposit, we want to do the actual deposit in one
shot, i.e. sending both the archive (zip) file and the metadata file.
* 1 archive (content-type ``application/zip`` or ``application/x-tar``)
* 1 metadata file in atom xml format (``content-type: application/atom+xml;type=entry``)
For this, we need to provide the:
* arguments: ``--username 'name' --password 'pass'`` as credentials
* archive's path (example: ``--archive path/to/archive-name.tgz``)
* metadata file path (example: ``--metadata path/to/metadata.xml``)
to the ``swh deposit upload`` command.
Example:
To push the Belenios 1.12 we prepared previously on the testing instance of the
deposit:
.. code:: console
(deposit)$ ls
belenios-1.12.zip metadata.xml deposit
(deposit)$ swh deposit upload --username <name> --password <secret> \
--url https://deposit.staging.swh.network/1 \
--create-origin http://has.archives-ouvertes.fr/test-01243065 \
--archive belenios.zip \
--metadata metadata.xml \
--format json | jq
{
'deposit_status': 'deposited',
'deposit_id': '1',
'deposit_date': 'Oct. 28, 2020, 1:52 p.m.',
'deposit_status_detail': None
}
(deposit)$
You just posted a deposit to your main collection on Software Heritage (staging
area)!
The returned value is a JSON dict, in which you will notably find the deposit
id (needed to check for its status later on) and the current status, which
should be ``deposited`` if no error has occurred.
Note: As the deposit is in ``deposited`` status, you can no longer
update the deposit after this query. It will be answered with a 403
(Forbidden) answer.
If something went wrong, an equivalent response will be given with the
``error`` and ``detail`` keys explaining the issue, e.g.:
.. code:: console
{
'error': 'Unknown collection name xyz',
'detail': None,
'deposit_status': None,
'deposit_status_detail': None,
'deposit_swh_id': None,
'status': 404
}
Once the deposit has been done, you can check its status using the ``swh deposit
status`` command:
.. code:: console
(deposit)$ swh deposit status --username <name> --password <secret> \
--url https://deposit.staging.swh.network/1 \
--deposit-id 1 -f json | jq
{
"deposit_id": "1",
"deposit_status": "done",
"deposit_status_detail": "The deposit has been successfully loaded into the Software Heritage archive",
"deposit_swh_id": "swh:1:dir:63a6fc0ed8f69bf66ccbf99fc0472e30ef0a895a",
"deposit_swh_id_context": "swh:1:dir:63a6fc0ed8f69bf66ccbf99fc0472e30ef0a895a;origin=https://softwareheritage.org/belenios-01234065;visit=swh:1:snp:0ae536667689da7047bfb7aa9f37f5958e9f4647;anchor=swh:1:rev:17ad98c940104d45b6b6bd6fba9aa832eeb95638;path=/",
"deposit_external_id": "belenios-01234065"
}
Metadata-only deposit
^^^^^^^^^^^^^^^^^^^^^
This allows to deposit only metadata information on a :ref:`SWHID reference
<persistent-identifiers>`. Prepare a metadata file as described in the
:ref:`prepare deposit section <prepare-deposit>`
Ensure this metadata file also declares a :ref:`SWHID reference
<persistent-identifiers>`:
.. code:: xml
<entry xmlns="..."
xmlns:swh="https://www.softwareheritage.org/schema/2018/deposit"
>
<!-- ... -->
<swh:deposit>
<swh:reference>
<swh:object swhid="swh:1:dir:31b5c8cc985d190b5a7ef4878128ebfdc2358f49" />
</swh:reference>
</swh:deposit>
<!-- ... -->
</entry>
For this, we then need to provide the following information:
* arguments: ``--username 'name' --password 'pass'`` as credentials
* metadata file path (example: ``--metadata path/to/metadata.xml``)
to the ``swh deposit metadata-only`` command.
Example:
.. code:: console
(deposit) swh deposit metadata-only --username <name> --password <secret> \
--url https://deposit.staging.swh.network/1 \
--metadata ../deposit-swh.metadata-only.xml \
--format json | jq .
{
"deposit_id": "29",
"deposit_status": "done",
"deposit_date": "Dec. 15, 2020, 11:37 a.m."
}
For details on the metadata-only deposit, see the
:ref:`metadata-only deposit protocol reference <metadata-only-deposit>`
Multisteps deposit
^^^^^^^^^^^^^^^^^^
In this case, the deposit is created by several requests, uploading objects
piece by piece. The steps to create a multisteps deposit:
1. Create an partial deposit
""""""""""""""""""""""""""""
First use the ``--partial`` argument to declare there is more to come
.. code:: console
$ swh deposit upload --username name --password secret \
--archive foo.tar.gz \
--partial
2. Add content or metadata to the deposit
"""""""""""""""""""""""""""""""""""""""""
Continue the deposit by using the ``--deposit-id`` argument given as a response
for the first step. You can continue adding content or metadata while you use
the ``--partial`` argument.
To only add one new archive to the deposit:
.. code:: console
$ swh deposit upload --username name --password secret \
--archive add-foo.tar.gz \
--deposit-id 42 \
--partial
To only add metadata to the deposit:
.. code:: console
$ swh deposit upload --username name --password secret \
--metadata add-foo.tar.gz.metadata.xml \
--deposit-id 42 \
--partial
3. Finalize deposit
"""""""""""""""""""
On your last addition (same command as before), by not declaring it
``--partial``, the deposit will be considered completed. Its status will be
changed to ``deposited``:
.. code:: console
$ swh deposit upload --username name --password secret \
--metadata add-foo.tar.gz.metadata.xml \
--deposit-id 42
Update deposit
--------------
* Update deposit metadata:
- only possible if the deposit status is ``done``, ``--deposit-id <id>`` and
``--swhid <swhid>`` are provided
- by using the ``--metadata`` flag, a path to an xml file
.. code:: console
$ swh deposit upload \
--username name --password secret \
--deposit-id 11 \
--swhid swh:1:dir:2ddb1f0122c57c8479c28ba2fc973d18508e6420 \
--metadata ../deposit-swh.update-metadata.xml
* Replace deposit:
- only possible if the deposit status is ``partial`` and
``--deposit-id <id>`` is provided
- by using the ``--replace`` flag
- ``--metadata-deposit`` replaces associated existing metadata
- ``--archive-deposit`` replaces associated archive(s)
- by default, with no flag or both, you'll replace associated
metadata and archive(s):
.. code:: console
$ swh deposit upload --username name --password secret \
--deposit-id 11 \
--archive updated-je-suis-gpl.tgz \
--replace
* Update a loaded deposit with a new version (this creates a new deposit):
- by using ``--add-to-origin`` with an origin URL previously created with
``--create-origin``, you will link the new deposit with its parent deposit:
.. code:: console
$ swh deposit upload --username name --password secret \
--archive je-suis-gpl-v2.tgz \
--add-to-origin 'http://example.org/je-suis-gpl'
Check the deposit's status
--------------------------
You can check the status of the deposit by using the ``--deposit-id`` argument:
.. code:: console
$ swh deposit status --username name --password secret \
--deposit-id 11
.. code:: json
{
"deposit_id": 11,
"deposit_status": "deposited",
"deposit_swh_id": null,
"deposit_status_detail": "Deposit is ready for additional checks \
(tarball ok, metadata, etc...)"
}
When the deposit has been loaded into the archive, the status will be
marked ``done``. In the response, will also be available the
<deposit_swh_id>, <deposit_swh_id_context>. For example:
.. code:: json
{
"deposit_id": 11,
"deposit_status": "done",
"deposit_swh_id": "swh:1:dir:d83b7dda887dc790f7207608474650d4344b8df9",
"deposit_swh_id_context": "swh:1:dir:d83b7dda887dc790f7207608474650d4344b8df9;\
origin=https://forge.softwareheritage.org/source/jesuisgpl/;\
visit=swh:1:snp:68c0d26104d47e278dd6be07ed61fafb561d0d20;\
anchor=swh:1:rev:e76ea49c9ffbb7f73611087ba6e999b19e5d71eb;path=/",
"deposit_status_detail": "The deposit has been successfully \
loaded into the Software Heritage archive"
}
.. rubric:: Footnotes
.. [#f1] the test instance of the deposit is not yet available to external users,
but it should be available soon.