Skip to content
Snippets Groups Projects
Commit b5fab550 authored by Renaud Boyer's avatar Renaud Boyer
Browse files

rework wording and refactor howtos

parent af835d43
No related branches found
No related tags found
No related merge requests found
Pipeline #14163 failed
What's a deposit?
=================
Most of the software source code artifacts present in the SWH Archive are gathered by
the mean of `loader`_ workers run by the SWH project from source code origins identified
by `lister`_ workers. This is a pull mechanism: it's the responsibility of the SWH
project to gather and collect source code artifacts that way.
Most of the software source code artifacts present in the Software Heritage Archive are
gathered by tools run by the SWH project, this is a pull mechanism: it's the
responsibility of the SWH project to gather and collect source code artifacts that way.
Alternatively, SWH allows its partners to push source code artifacts and metadata
directly into the Archive with a push-based mechanism. By using this possibility
different actors, holding software artifacts or metadata, can preserve their assets
without having to pass through an intermediate collaborative development platform, which
is already harvested by SWH (e.g GitHub, GitLab, etc.).
Alternatively, SWH allows its trusted partners to send source code artifacts and/or
metadata directly into the Archive with a push-based mechanism. By using this
possibility different actors, holding software artifacts or metadata, can preserve
their assets without having to pass through an intermediate collaborative development
platform, which is already harvested by SWH (e.g GitHub, GitLab, etc.).
**This mechanism is the code deposit.**
**This mechanism is the deposit.**
The main idea is the deposit is an authenticated access to an API allowing the user to
provide source code artifacts -- with metadata -- to be ingested in the SWH Archive. The
result of that is a `SWHID`_ that can be used to uniquely and persistently identify that
very piece of source code.
The result of this action is a `SWHID`_ that can be used to uniquely and persistently
identify that very piece of source code.
This unique identifier can then be used to `reference the source code
<https://hal.archives-ouvertes.fr/hal-02446202>`_ (e.g. in a `scientific paper
<https://www.softwareheritage.org/2020/05/26/citing-software-with-style/>`_) and
retrieve it using the `vault`_ feature of the SWH Archive platform.
This unique identifier can then be used to reference the source code (e.g. in a
scientific paper) and retrieve it using the features of the SWH Archive platform.
The differences between a piece of code uploaded using the deposit rather than simply
asking SWH to archive a repository using the `save code now`_ feature are:
The differences between a deposit and simply asking SWH to archive a repository using the pull features of the Archive are:
- a deposited artifact is provided from one of the SWH partners which is regarded as a
trusted authority,
- a deposited artifact requires metadata properties describing the source code artifact,
- a deposited artifact has a CodeMeta_ metadata entry attached to it,
- a deposited artifact has the same visibility on the SWH Archive than a collected
repository,
- a deposited artifact can be searched with its provided url property on the SWH
Archive,
- the deposit API uses the `SWORD v2`_ API, thus requires some tooling to send deposits
to SWH. These tools are provided with this repository.
Archive
- it is possible to make a metadata only deposit only about an artefact already
present in the Software Heritage archive.
A partner may wish to deposit only metadata about an origin or object already present in the Software Heritage archive.
Metadata?
---------
The **metadata-only deposit** is a special deposit where no content is provided and the data transferred to Software Heritage is only the metadata about an object in the archive.
The metadata of a software artefact is the real added value of the deposit service, it
allows a partner to provide extra information on a source code (details about the
author and its affiliation, external ids, mention in a scientific publication, etc.)
which are usually not present in the code itself.
Metadata is indexed by our search engine and provide new ways of finding content in the
archive.
To understand why metadata is so important to us read
:ref:`Why do we need metadata? <deposit-why-metadata>`.
Is it useful for me?
--------------------
Source code is fragile; it can disappear. It is important to note that software source code has an essential role in research and should be archived properly, alongside data and publications. Software that was built for research as part of the open science ecosystem should be archived, referenced, described and cited.
Source code is fragile; it can disappear.
If you are a repository for research software, you know software source code has an
essential role in research and should be archived properly, alongside data and
publications. Software that was built for research as part of the open science
ecosystem should be archived, referenced, described and cited.
When depositing in Software Heritage you can describe a software artifact properly with specific metadata properties and it will be safely saved in the universal software archive.
Also, as a metadata producer you can attach to an existing entry in the archive all the
TODO
As a metadata producer or aggregator TODO...
To understand why metadata is so important to us read
:ref:`Why do we need metadata? <deposit-why-metadata>`.
\ No newline at end of file
Ready to use our deposit services?
----------------------------------
Start by :ref:`requesting a user account <deposit-account>`.
\ No newline at end of file
......@@ -3,15 +3,21 @@
Request an account
==================
Becoming a deposit client is very easy, just write to deposit@softwareheritage.org
to setup the deposit partner agreement. With the agreement signed you can follow the
steps below.
.. admonition:: Deposit partner agreement
:class: warning
For this, as a client, you need to register an account on the swh keycloak
`production <https://archive.softwareheritage.org/oidc/login/>`_
or `staging <https://webapp.staging.swh.network/oidc/login/>`_ instance.
Access to the deposit services is restricted to organizations who signed the deposit
partner agreement. To learn more about this agreement please write to
deposit@softwareheritage.org
Once you have an account, you should get a set of access credentials as a login, a password and a collection name (identified as <username>, <pass> and <collection> in the remaining of this documentation).
With the agreement signed you will be able to register an account on our
`production <https://archive.softwareheritage.org/oidc/login/>`_ and
`staging <https://webapp.staging.swh.network/oidc/login/>`_ instances.
Once you have an account, you will get a set of access credentials as a login, a
password and a collection name (identified as <username>, <pass> and <collection> in
the remaining of this documentation).
A deposit account also comes with a “provider URL” which is used by SWH to build the Origin URL of deposits created using this account and confirm its ownership.
You are now ready to :ref:`prepare your artefacts and metadata <deposit-prepare>`.
.. _deposit-first:
Make a first code & metadata deposit
====================================
Checklist
---------
- You have access to your :ref:`account credentials <deposit-account>`
- You have a software artefact at hand and its associated metadata (if not you need to
:ref:`prepare your artefacts and metadata <deposit-prepare>`.)
- This is the first time you're depositing for this origin (if you already made
deposits for this origin you want to
:ref:`make a new deposit for an existing origin <deposit-version>`)
- The software artefact is not larger than 100Mo (if not you need to
:ref:`make a multi-step deposit <deposit-partial>`)
- You have either the CLI installed or a tool to make API calls, we will use curl
here, but commands could be easily adapted to another application
Send the artefact and the metadata
----------------------------------
.. admonition:: Deposit instance URL
:class: warning
In the examples below the staging deposit url https://deposit.staging.swh.network
is used to avoid experimenting on the production instance of the deposit server.
Make sure you switch to https://deposit.softwareheritage.org when you are ready.
.. tab-set::
.. tab-item:: API
.. code-block:: console
# 1) Note the 'In-Progress: false' header
# 2) Make sure the mimetype matches your file, here <softwareartefact> is a zip
curl -i -u <username>:<password> \
-F "file=@<softwareartefact>;type=application/zip;filename=payload" \
-F "atom=@<metadatafile>;type=application/atom+xml;charset=UTF-8" \
-H 'In-Progress: false' \
-XPOST https://deposit.staging.swh.network/1/<collection>/
.. tab-item:: CLI
.. code-block:: console
# 1) Note the 'no-partial' flag
swh deposit upload \
--username <username> --password <password> \
--url https://deposit.staging.swh.network/1 \
--create-origin <origin> \
--archive <softwareartefact> \
--metadata <metadatafile> \
--no-partial \
--format json
Will return the following response:
.. tab-set::
.. tab-item:: API
.. code-block:: http
<entry xmlns="http://www.w3.org/2005/Atom"
xmlns:sword="http://purl.org/net/sword/"
xmlns:dcterms="http://purl.org/dc/terms/"
xmlns:swhdeposit="https://www.softwareheritage.org/schema/2018/deposit"
>
<swhdeposit:deposit_id><deposit_id></swhdeposit:deposit_id>
<swhdeposit:deposit_date>Jan. 1, 2025, 09:00 a.m.</swhdeposit:deposit_date>
<swhdeposit:deposit_archive>None</swhdeposit:deposit_archive>
<!-- Note the 'deposited' status -->
<swhdeposit:deposit_status>deposited</swhdeposit:deposit_status>
<!-- Edit-IRI -->
<link rel="edit" href="/1/<collection>/<deposit_id>/metadata/" />
<!-- EM-IRI -->
<link rel="edit-media" href="/1/<collection>/<deposit_id>/media/"/>
<!-- SE-IRI -->
<link rel="http://purl.org/net/sword/terms/add" href="/1/<collection>/<deposit_id>/metadata/" />
<!-- State-IRI -->
<link rel="alternate" href="/1/<collection>/<deposit_id>/status/"/>
<sword:packaging>http://purl.org/net/sword/package/SimpleZip</sword:packaging>
</entry>
.. tab-item:: CLI
.. code-block:: json
{
# Note the 'deposited' status
'deposit_status': 'deposited',
'deposit_id': '<deposit_id>',
'deposit_date': 'Jan. 1, 2025, 09:00 a.m.',
'deposit_status_detail': None
}
A `deposited` status means the deposit is complete but still needs to be checked to
ensure data consistency. You can check your deposit status to follow the process.
Check a deposit status
----------------------
Your deposit will go :doc:`through multiple steps </references/workflow>` before appearing in the archive, you can check the status of your deposit and get its SWHID:
.. tab-set::
.. tab-item:: API
.. code-block:: console
curl -i -u <username>:<password> \
-XGET https://deposit.staging.swh.network/1/<collection>/<deposit_id>/status/
.. tab-item:: CLI
.. code-block:: console
swh deposit status \
--username <username> --password <password> \
--url https://deposit.staging.swh.network/1 \
--deposit-id <deposit_id> \
--format json
Will return the following response:
.. tab-set::
.. tab-item:: API
.. code-block:: http
HTTP/1.1 200 OK
Vary: Accept, Cookie
Allow: GET, POST, PUT, DELETE, HEAD, OPTIONS
Location: /1/<collection>/<deposit_id>/status/
Content-Type: application/xml
<entry xmlns="http://www.w3.org/2005/Atom"
xmlns:sword="http://purl.org/net/sword/"
xmlns:dcterms="http://purl.org/dc/terms/"
xmlns:swhdeposit="https://www.softwareheritage.org/schema/2018/deposit"
>
<swhdeposit:deposit_id><deposit_id></swhdeposit:deposit_id>
<swhdeposit:deposit_status>done</swhdeposit:deposit_status>
<swhdeposit:deposit_status_detail>The deposit has been successfully loaded into the Software Heritage archive</swhdeposit:deposit_status_detail>
<swhdeposit:deposit_swh_id>swh:1:dir:d83b7dda887dc790f7207608474650d4344b8df9</swhdeposit:deposit_swh_id>
<swhdeposit:deposit_swh_id_context>swh:1:dir:d83b7dda887dc790f7207608474650d4344b8df9;origin=<origin>;visit=swh:1:snp:68c0d26104d47e278dd6be07ed61fafb561d0d20;anchor=swh:1:rev:e76ea49c9ffbb7f73611087ba6e999b19e5d71eb;path=/</swhdeposit:deposit_swh_id>
</entry>
.. tab-item:: CLI
.. code-block:: json
{
"deposit_id": <deposit_id>,
"deposit_status": "done",
"deposit_swh_id": "swh:1:dir:d83b7dda887dc790f7207608474650d4344b8df9",
"deposit_swh_id_context": "swh:1:dir:d83b7dda887dc790f7207608474650d4344b8df9;origin=<origin>;visit=swh:1:snp:68c0d26104d47e278dd6be07ed61fafb561d0d20;anchor=swh:1:rev:e76ea49c9ffbb7f73611087ba6e999b19e5d71eb;path=/",
"deposit_status_detail": "The deposit has been successfully loaded into the Software Heritage archive"
}
......@@ -6,8 +6,9 @@ How to guides
account.rst
prepare.rst
deposit-code.rst
deposit-metadata.rst
first-deposit.rst
multistep-deposit.rst
metadata-deposit.rst
versions.rst
integrations.rst
participate.rst
\ No newline at end of file
.. _deposit-code-metadata:
.. _deposit-partial:
Make a code & metadata deposit
==============================
Make a multi-step code & metadata deposit
=========================================
You have a software artefact and its associated metadata you want to archive on
Software Heritage: you want to make code & metadata deposit.
.. admonition:: Partial deposits
:class: Note
If you only have metadata to add to an existing entry in the Software Heritage archive
you want to make a `metadata-only deposit <deposit-metadata-only>`
This method of depositing artefacts to the archive is a bit more complicated than
the :ref:`regular one`, if your artefacts are not larger than 100Mo we would
recommend sticking to the simpler (one shot) method.
Requisites
----------
1. Access to :ref:`account credentials <deposit-account>`
2. Have the origin url and prepared artefacts at hand, we will refer to it as
``<origin>``, ``<softwareartefact>`` ``<metadatafile>`` hereafter
3. Either the CLI installed or a tool to make API calls, we will use curl here, but
commands could be easily adapted to another application.
.. admonition:: Deposit instance URL
:class: warning
In the examples below the staging deposit url https://deposit.staging.swh.network
is used to avoid experimenting on the production instance of the deposit server.
Make sure you switch to https://deposit.softwareheritage.org when you are ready.
Make the deposit in one shot
----------------------------
If you have all the code artefacts ready in a single archive (and your metadata it is
easy to deposit both in a single command:
.. tab-set::
.. tab-item:: API
.. code-block:: console
# 1) Note the 'In-Progress: false' header
# 2) Make sure the mimetype matches your file, here <softwareartefact> is a zip
curl -i -u <username>:<pass> \
-F "file=@<softwareartefact>;type=application/zip;filename=payload" \
-F "atom=@<metadatafile>;type=application/atom+xml;charset=UTF-8" \
-H 'In-Progress: false' \
-XPOST https://deposit.staging.swh.network/1/<collection>/
.. tab-item:: CLI
.. code-block:: console
# 1) Note the 'no-partial' flag
swh deposit upload \
--username <username> --password <pass> \
--url https://deposit.staging.swh.network/1 \
--create-origin <origin> \
--archive <softwareartefact> \
--metadata <metadatafile> \
--no-partial \
--format json
Will return the following response:
.. tab-set::
.. tab-item:: API
.. code-block:: http
HTTP/1.1 201 Created
Vary: Accept, Cookie
Allow: GET, POST, PUT, DELETE, HEAD, OPTIONS
Location: /1/<collection>/<deposit_id>/metadata/
Content-Type: application/xml
<entry xmlns="http://www.w3.org/2005/Atom"
xmlns:sword="http://purl.org/net/sword/"
xmlns:dcterms="http://purl.org/dc/terms/"
xmlns:swhdeposit="https://www.softwareheritage.org/schema/2018/deposit"
>
<swhdeposit:deposit_id><deposit_id></swhdeposit:deposit_id>
<swhdeposit:deposit_date>Jan. 1, 2025, 09:00 a.m.</swhdeposit:deposit_date>
<swhdeposit:deposit_archive>None</swhdeposit:deposit_archive>
<!-- Note the 'deposited' status -->
<swhdeposit:deposit_status>deposited</swhdeposit:deposit_status>
<!-- Edit-IRI -->
<link rel="edit" href="/1/<collection>/<deposit_id>/metadata/" />
<!-- EM-IRI -->
<link rel="edit-media" href="/1/<collection>/<deposit_id>/media/"/>
<!-- SE-IRI -->
<link rel="http://purl.org/net/sword/terms/add" href="/1/<collection>/<deposit_id>/metadata/" />
<!-- State-IRI -->
<link rel="alternate" href="/1/<collection>/<deposit_id>/status/"/>
<sword:packaging>http://purl.org/net/sword/package/SimpleZip</sword:packaging>
</entry>
.. tab-item:: CLI
.. code-block:: json
If you have multiple code artefacts or if you need to make your deposit in two or
more times, you can make use of the partial deposit functionality.
{
# Note the 'deposited' status
'deposit_status': 'deposited',
'deposit_id': '<deposit_id>',
'deposit_date': 'Jan. 1, 2025, 09:00 a.m.',
'deposit_status_detail': None
}
Checklist
---------
A `deposited` status means the deposit is complete but still needs to be checked to ensure data consistency. See :ref:`Check a deposit status` to follow your deposit process.
- You have access to your :ref:`account credentials <deposit-account>`
- You have a software artefact at hand and its associated metadata (if not you need to
:ref:`prepare your artefacts and metadata <deposit-prepare>`.)
- This is the first time you're depositing for this origin (if you already made
deposits for this origin you want to
:ref:`make a new deposit for an existing origin <deposit-version>`)
- You have either the CLI installed or a tool to make API calls, we will use curl
here, but commands could be easily adapted to another application
And if you got an error message instead please check
:doc:`our troubleshooting reference </references/errors>` to sort it out.
Make the deposit in multiple steps
----------------------------------
If you have multiple code artefacts or if you need to make your deposit in two or
more times, you can make use of the partial deposit functionality. Use cases:
- the code artefact is larger than 100Mo (maximum file size allowed by our server), you
could split it in smaller archives and send it in multiple calls
- different services of your infrastructure will call our API
- etc.
In the example below we will make a first deposit with a code artefact then a second
one and finally a third one with the metadata.
......@@ -139,7 +43,7 @@ First partial deposit
.. code-block:: console
# Note the 'In-Progress: true' header
curl -i -u <username>:<pass> \
curl -i -u <username>:<password> \
-F "file=@<softwareartefact1>;type=application/zip;filename=payload" \
-H 'In-Progress: true' \
-XPOST https://deposit.staging.swh.network/1/<collection>/
......@@ -151,7 +55,7 @@ First partial deposit
# 1) Note the '--partial' flag
# 2) Note the `--create-origin` flag
swh deposit upload \
--username <username> --password <pass> \
--username <username> --password <password> \
--url https://deposit.staging.swh.network/1 \
--create-origin <origin> \
--archive <softwareartefact1> \
......@@ -226,7 +130,7 @@ last one.
# 1) Note the 'In-Progress: true' header
# 2) Note the '<deposit_id>' in the URL
# 3) Note the '/media/' in the URL (we're appending a new software artefact)
curl -i -u <username>:<pass> \
curl -i -u <username>:<password> \
-F "file=@<softwareartefact2>;type=application/zip;filename=payload" \
-H 'In-Progress: true' \
-XPOST https://deposit.staging.swh.network/1/<collection>/<deposit_id>/media/
......@@ -239,7 +143,7 @@ last one.
# 2) Note the `--deposit-id` argument
# 3) Note the '--archive' argument as we're sending a new software artefact
swh deposit upload \
--username <username> --password <pass> \
--username <username> --password <password> \
--url https://deposit.staging.swh.network/1 \
--archive <softwareartefact2> \
--deposit-id <deposit_id> \
......@@ -263,7 +167,7 @@ will send include "not partial anymore" parameter in our call.
# 1) Note the 'In-Progress: false' header
# 2) Note the '<deposit_id>' in the URL
# 3) Note the '/metadata/' in the URL (we're appending metadata not code)
curl -i -u <username>:<pass> \
curl -i -u <username>:<password> \
-F "atom=@<metadatafile>;type=application/atom+xml;charset=UTF-8" \
-H 'In-Progress: false' \
-XPOST https://deposit.staging.swh.network/1/<collection>/<deposit_id>/metadata/
......@@ -276,7 +180,7 @@ will send include "not partial anymore" parameter in our call.
# 2) Note the `--deposit-id` argument
# 3) Note the '--metadata' argument, as we're pushing metadata
swh deposit upload \
--username <username> --password <pass> \
--username <username> --password <password> \
--url https://deposit.staging.swh.network/1 \
--metadata <metadatafile> \
--deposit-id <deposit_id> \
......@@ -295,7 +199,72 @@ Your deposit will go :doc:`through multiple steps </references/workflow>` before
.. code-block:: console
curl -i -u <username>:<pass> \
curl -i -u <username>:<password> \
-XGET https://deposit.staging.swh.network/1/<collection>/<deposit_id>/status/
.. tab-item:: CLI
.. code-block:: console
swh deposit status \
--username <username> --password <password> \
--url https://deposit.staging.swh.network/1 \
--deposit-id <deposit_id> \
--format json
Will return the following response:
.. tab-set::
.. tab-item:: API
.. code-block:: http
HTTP/1.1 200 OK
Vary: Accept, Cookie
Allow: GET, POST, PUT, DELETE, HEAD, OPTIONS
Location: /1/<collection>/<deposit_id>/status/
Content-Type: application/xml
<entry xmlns="http://www.w3.org/2005/Atom"
xmlns:sword="http://purl.org/net/sword/"
xmlns:dcterms="http://purl.org/dc/terms/"
xmlns:swhdeposit="https://www.softwareheritage.org/schema/2018/deposit"
>
<swhdeposit:deposit_id><deposit_id></swhdeposit:deposit_id>
<swhdeposit:deposit_status>done</swhdeposit:deposit_status>
<swhdeposit:deposit_status_detail>The deposit has been successfully loaded into the Software Heritage archive</swhdeposit:deposit_status_detail>
<swhdeposit:deposit_swh_id>swh:1:dir:d83b7dda887dc790f7207608474650d4344b8df9</swhdeposit:deposit_swh_id>
<swhdeposit:deposit_swh_id_context>swh:1:dir:d83b7dda887dc790f7207608474650d4344b8df9;origin=<origin>;visit=swh:1:snp:68c0d26104d47e278dd6be07ed61fafb561d0d20;anchor=swh:1:rev:e76ea49c9ffbb7f73611087ba6e999b19e5d71eb;path=/</swhdeposit:deposit_swh_id>
</entry>
.. tab-item:: CLI
.. code-block:: json
{
"deposit_id": <deposit_id>,
"deposit_status": "done",
"deposit_swh_id": "swh:1:dir:d83b7dda887dc790f7207608474650d4344b8df9",
"deposit_swh_id_context": "swh:1:dir:d83b7dda887dc790f7207608474650d4344b8df9;origin=<origin>;visit=swh:1:snp:68c0d26104d47e278dd6be07ed61fafb561d0d20;anchor=swh:1:rev:e76ea49c9ffbb7f73611087ba6e999b19e5d71eb;path=/",
"deposit_status_detail": "The deposit has been successfully loaded into the Software Heritage archive"
}
A `deposited` status means the deposit is complete but still needs to be checked to
ensure data consistency. You can check your deposit status to follow the process.
Check a deposit status
----------------------
Your deposit will go :doc:`through multiple steps </references/workflow>` before appearing in the archive, you can check the status of your deposit and get its SWHID:
.. tab-set::
.. tab-item:: API
.. code-block:: console
curl -i -u <username>:<password> \
-XGET https://deposit.staging.swh.network/1/<collection>/<deposit_id>/status/
.. tab-item:: CLI
......@@ -303,7 +272,7 @@ Your deposit will go :doc:`through multiple steps </references/workflow>` before
.. code-block:: console
swh deposit status \
--username <username> --password <pass> \
--username <username> --password <password> \
--url https://deposit.staging.swh.network/1 \
--deposit-id <deposit_id> \
--format json
......
.. _deposit-prepare:
Prepare your metadata and artifacts
===================================
......
......@@ -14,9 +14,12 @@ Deposit
.. thumbnail:: images/software_life_cycle_en-1024x810.png
The deposit service allows a client (a repository, e.g. HAL) to submit software source archives and its associated metadata to the Software Heritage archive.
(we need a better picture)
Metadata can be also submitted referencing a repository url (origin) or a
The deposit service allows a partner to submit software source archives and their
associated metadata to the Software Heritage archive.
Metadata can be also submitted referencing a an existing repository url (origin) or a
:ref:`SWHIDs <persistent-identifiers>`.
Explanations
......@@ -41,8 +44,9 @@ To assist informed users with their deposits.
howto/account.rst
howto/prepare.rst
howto/deposit-code.rst
howto/deposit-metadata.rst
howto/first-deposit.rst
howto/multistep-deposit.rst
howto/metadata-deposit.rst
howto/versions.rst
howto/integrations.rst
howto/participate.rst
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment