Skip to content
Snippets Groups Projects

Compare revisions

Changes are shown as if the source revision was being merged into the target revision. Learn more about comparing revisions.

Source

Select target project
No results found

Target

Select target project
  • lunar/swh-deposit
  • anlambert/swh-deposit
  • swh/devel/swh-deposit
  • douardda/swh-deposit
  • ardumont/swh-deposit
  • marmoute/swh-deposit
  • rboyer/swh-deposit
7 results
Show changes
The metadata-deposit
====================
Goal
----
A client wishes to deposit only metadata about an object in the Software
Heritage archive.
The metadata-deposit is a special deposit where no content is
provided and the data transferred to Software Heritage is only
the metadata about an object or several objects in the archive.
Requirements
------------
The scope of the meta-deposit is different than the
sparse-deposit. While a sparse-deposit creates a revision with referenced
directories and content files, the metadata-deposit references one of the
following:
- origin
- snapshot
- revision
- release
A complete metadata example
---------------------------
The reference element is included in the metadata xml atomEntry under the
swh namespace:
TODO: publish schema at https://www.softwareheritage.org/schema/2018/deposit
.. code:: xml
<?xml version="1.0"?>
<entry xmlns="http://www.w3.org/2005/Atom"
xmlns:codemeta="https://doi.org/10.5063/SCHEMA/CODEMETA-2.0"
xmlns:swh="https://www.softwareheritage.org/schema/2018/deposit">
<author>
<name>HAL</name>
<email>hal@ccsd.cnrs.fr</email>
</author>
<client>hal</client>
<external_identifier>hal-01243573</external_identifier>
<codemeta:name>The assignment problem</codemeta:name>
<codemeta:url>https://hal.archives-ouvertes.fr/hal-01243573</codemeta:url>
<codemeta:identifier>other identifier, DOI, ARK</codemeta:identifier>
<codemeta:applicationCategory>Domain</codemeta:applicationCategory>
<codemeta:description>description</codemeta:description>
<codemeta:author>
<codemeta:name> author1 </codemeta:name>
<codemeta:affiliation> Inria </codemeta:affiliation>
<codemeta:affiliation> UPMC </codemeta:affiliation>
</codemeta:author>
<codemeta:author>
<codemeta:name> author2 </codemeta:name>
<codemeta:affiliation> Inria </codemeta:affiliation>
<codemeta:affiliation> UPMC </codemeta:affiliation>
</codemeta:author>
<swh:deposit>
<swh:reference>
<swh:origin url='https://github.com/user/repo'/>
</swh:reference>
</swh:deposit>
</entry>
Examples by target type
^^^^^^^^^^^^^^^^^^^^^^^
Reference an origin:
.. code:: xml
<swh:deposit>
<swh:reference>
<swh:origin url="https://github.com/user/repo"/>
</swh:reference>
</swh:deposit>
Reference a snapshot, revision or release:
.. code:: xml
With ${type} in {snp (snapshot), rev (revision), rel (release) }:
<swh:deposit>
<swh:reference>
<swh:object id="swh:1:${type}:aaaaaaaaaaaaaa..."/>
</swh:reference>
</swh:deposit>
Loading procedure
------------------
In this case, the metadata-deposit will be injected as a metadata entry at the
appropriate level (origin_metadata, revision_metadata, etc.) with the information
about the contributor of the deposit. Contrary to the complete and sparse
deposit, there will be no object creation.
The sparse-deposit
==================
Goal
----
A client wishes to transfer a tarball for which part of the content is
already in the SWH archive.
Requirements
------------
To do so, a list of paths with targets must be provided in the metadata and
the paths to the missing directories/content should not be included
in the tarball. The list will be referred to
as the manifest list using the entry name 'bindings' in the metadata.
+----------------------+-------------------------------------+
| path | swh-id |
+======================+=====================================+
| path/to/file.txt | swh:1:cnt:aaaaaaaaaaaaaaaaaaaaa... |
+----------------------+-------------------------------------+
| path/to/dir/ | swh:1:dir:aaaaaaaaaaaaaaaaaaaaa... |
+----------------------+-------------------------------------+
Note: the *name* of the file or the directory is given by the path and is not
part of the identified object.
TODO: see if a trailing "/" is mandatory for implementation.
A concrete example
------------------
The manifest list is included in the metadata xml atomEntry under the
swh namespace:
TODO: publish schema at https://www.softwareheritage.org/schema/2018/deposit
.. code:: xml
<?xml version="1.0"?>
<entry xmlns="http://www.w3.org/2005/Atom"
xmlns:codemeta="https://doi.org/10.5063/SCHEMA/CODEMETA-2.0"
xmlns:swh="https://www.softwareheritage.org/schema/2018/deposit">
<author>
<name>HAL</name>it mandatory to have a trailing "/",
<email>hal@ccsd.cnrs.fr</email>
</author>
<client>hal</client>
<external_identifier>hal-01243573</external_identifier>
<codemeta:name>The assignment problem</codemeta:name>
<codemeta:url>https://hal.archives-ouvertes.fr/hal-01243573</codemeta:url>
<codemeta:identifier>other identifier, DOI, ARK</codemeta:identifier>
<codemeta:applicationCategory>Domain</codemeta:applicationCategory>
<codemeta:description>description</codemeta:description>
<codemeta:author>
<codemeta:name> author1 </codemeta:name>
<codemeta:affiliation> Inria </codemeta:affiliation>
<codemeta:affiliation> UPMC </codemeta:affiliation>
</codemeta:author>
<codemeta:author>
<codemeta:name> author2 </codemeta:name>
<codemeta:affiliation> Inria </codemeta:affiliation>
<codemeta:affiliation> UPMC </codemeta:affiliation>
</codemeta:author>
<swh:deposit>
<swh:bindings>
<swh:binding source="path/to/file.txt" destination="swh:1:cnt:aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"/>
<swh:binding source="path/to/second_file.txt destination="swh:1:cnt:bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb"/>
<swh:binding source="path/to/dir/destination="swh:1:dir:ddddddddddddddddddddddddddddddddd"/>
</swh:bindings>
</swh:deposit>
</entry>
Deposit verification
--------------------
After checking the integrity of the deposit content and
metadata, the following checks should be added:
1. validate the manifest list structure with a correct swh-id for each path (syntax check on the swh-id format)
2. verify that the path name corresponds to the object type
3. locate the identifiers in the SWH archive
Each failing check should return a different error with the deposit
and result in a 'rejected' deposit.
Loading procedure
------------------
The injection procedure should include:
- load the tarball new data
- create new objects using the path name and create links from the path to the
SWH object using the identifier
- calculate identifier of the new objects at each level
- return final swh-id of the new revision
Invariant: the same content should yield the same swh-id,
that's why a complete deposit with all the content and
a sparse-deposit with the correct links will result
with the same root directory swh-id.
The same is expected with the revision swh-id if the metadata provided is
identical.
.. _swh-deposit-specs:
Blueprint Specifications
=========================
.. toctree::
:maxdepth: 1
:caption: Contents:
blueprint.rst
spec-loading.rst
spec-sparse-deposit.rst
spec-meta-deposit.rst
<?xml version="1.0" encoding="iso-8859-1"?>
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<xsd:element name="deposit">
<xsd:complexType>
<xsd:choice>
<xsd:element name="reference">
<xsd:complexType>
<xsd:choice>
<xsd:element name="object">
<xsd:complexType>
<xsd:attribute type="xsd:string" name="id"/>
</xsd:complexType>
</xsd:element>
<xsd:element name="origin">
<xsd:complexType>
<xsd:attribute type="xsd:string" name="url"/>
</xsd:complexType>
</xsd:element>
</xsd:choice>
</xsd:complexType>
</xsd:element>
<xsd:element name="bindings">
<xsd:complexType>
<xsd:sequence>
<xsd:element name="binding" minOccurs="0" maxOccurs="unbounded">
<xsd:complexType>
<xsd:attribute type="xsd:string" name="source"/>
<xsd:attribute type="xsd:string" name="destination"/>
</xsd:complexType>
</xsd:element>
</xsd:sequence>
</xsd:complexType>
</xsd:element>
</xsd:choice>
</xsd:complexType>
</xsd:element>
</xsd:schema>
Deployment of the swh-deposit
=============================
As usual, the debian packaged is created and uploaded to the swh debian
repository. Once the package is installed, we need to do a few things in
regards to the database.
Prepare the database setup (existence, connection, etc...).
-----------------------------------------------------------
This is defined through the packaged ``swh.deposit.settings.production``
module and the expected **/etc/softwareheritage/deposit/server.yml**.
As usual, the expected configuration files are deployed through our
puppet manifest (cf. puppet-environment/swh-site,
puppet-environment/swh-role, puppet-environment/swh-profile)
Migrate/bootstrap the db schema
-------------------------------
.. code:: shell
sudo django-admin migrate --settings=swh.deposit.settings.production
Load minimum defaults data
--------------------------
.. code:: shell
sudo django-admin loaddata \
--settings=swh.deposit.settings.production deposit_data
This adds the minimal:
- deposit request type 'archive' and 'metadata'
- 'hal' collection
Note: swh.deposit.fixtures.deposit\_data is packaged
Add client and collection
-------------------------
.. code:: shell
swh deposit admin \
--config-file /etc/softwareheritage/deposit/server.yml \
--platform production \
user create \
--collection <collection-name> \
--username <client-name> \
--password <to-define>
This adds a user ``<client-name>`` which can access the collection
``<collection-name>``. The password will be used for the authentication
access to the deposit api.
Note:
- If the collection does not exist, it is created alongside
- The password is plain text but stored encrypted (so yes, for now
we know the user's password)
- For production platform, you must either set an
SWH_CONFIG_FILENAME environment variable or pass alongside the
`--config-file` parameter
Reschedule a deposit
---------------------
.. code:: shell
swh deposit admin \
--config-file /etc/softwareheritage/deposit/server.yml \
--platform production \
deposit reschedule \
--deposit-id <deposit-id>
This will:
- check the deposit's status to something reasonable (failed or done). That
means that the checks have passed alright but something went wrong during the
loading (failed: loading failed, done: loading ok, still for some reasons as
in bugs, we need to reschedule it)
- reset the deposit's status to 'verified' (prior to any loading but after the
checks which are fine) and removes the different archives' identifiers
(swh-id, ...)
- trigger back the loading task through the scheduler
[pytest]
norecursedirs = docs
DJANGO_SETTINGS_MODULE = swh.deposit.settings.testing
# 200 Mib max size
max_upload_size: 209715200
[egg_info]
tag_build =
tag_date = 0
Metadata-Version: 2.1
Name: swh.deposit
Version: 0.0.71
Summary: Software Heritage Deposit Server
Home-page: https://forge.softwareheritage.org/source/swh-deposit/
Author: Software Heritage developers
Author-email: swh-devel@inria.fr
License: UNKNOWN
Project-URL: Source, https://forge.softwareheritage.org/source/swh-deposit
Project-URL: Bug Reports, https://forge.softwareheritage.org/maniphest
Project-URL: Funding, https://www.softwareheritage.org/donate
Description: # swh-deposit
This is [Software Heritage](https://www.softwareheritage.org)'s
[SWORD 2.0](http://swordapp.github.io/SWORDv2-Profile/SWORDProfile.html) Server
implementation, as well as a simple client to upload deposits on the server.
**S.W.O.R.D** (**S**imple **W**eb-Service **O**ffering **R**epository
**D**eposit) is an interoperability standard for digital file deposit.
This implementation will permit interaction between a client (a
repository) and a server (SWH repository) to permit deposits of
software source code archives and associated metadata.
The documentation is at ./docs/README-specification.md
Platform: UNKNOWN
Classifier: Programming Language :: Python :: 3
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: GNU General Public License v3 (GPLv3)
Classifier: Operating System :: OS Independent
Classifier: Development Status :: 5 - Production/Stable
Description-Content-Type: text/markdown
Provides-Extra: testing
Provides-Extra: server
MANIFEST.in
Makefile
README.md
requirements-server.txt
requirements-swh-server.txt
requirements-swh.txt
requirements-test.txt
requirements.txt
setup.py
version.txt
swh/__init__.py
swh.deposit.egg-info/PKG-INFO
swh.deposit.egg-info/SOURCES.txt
swh.deposit.egg-info/dependency_links.txt
swh.deposit.egg-info/entry_points.txt
swh.deposit.egg-info/requires.txt
swh.deposit.egg-info/top_level.txt
swh/deposit/__init__.py
swh/deposit/apps.py
swh/deposit/auth.py
swh/deposit/config.py
swh/deposit/errors.py
swh/deposit/manage.py
swh/deposit/models.py
swh/deposit/parsers.py
swh/deposit/signals.py
swh/deposit/urls.py
swh/deposit/utils.py
swh/deposit/wsgi.py
swh/deposit/api/__init__.py
swh/deposit/api/common.py
swh/deposit/api/converters.py
swh/deposit/api/deposit.py
swh/deposit/api/deposit_content.py
swh/deposit/api/deposit_status.py
swh/deposit/api/deposit_update.py
swh/deposit/api/service_document.py
swh/deposit/api/urls.py
swh/deposit/api/private/__init__.py
swh/deposit/api/private/deposit_check.py
swh/deposit/api/private/deposit_list.py
swh/deposit/api/private/deposit_read.py
swh/deposit/api/private/deposit_update_status.py
swh/deposit/api/private/urls.py
swh/deposit/cli/__init__.py
swh/deposit/cli/admin.py
swh/deposit/cli/client.py
swh/deposit/client/__init__.py
swh/deposit/fixtures/__init__.py
swh/deposit/fixtures/deposit_data.yaml
swh/deposit/loader/__init__.py
swh/deposit/loader/checker.py
swh/deposit/loader/loader.py
swh/deposit/loader/tasks.py
swh/deposit/migrations/0001_initial.py
swh/deposit/migrations/0002_depositrequest_archive.py
swh/deposit/migrations/0003_temporaryarchive.py
swh/deposit/migrations/0004_delete_temporaryarchive.py
swh/deposit/migrations/0005_auto_20171019_1436.py
swh/deposit/migrations/0006_depositclient_url.py
swh/deposit/migrations/0007_auto_20171129_1609.py
swh/deposit/migrations/0008_auto_20171130_1513.py
swh/deposit/migrations/0009_deposit_parent.py
swh/deposit/migrations/0010_auto_20180110_0953.py
swh/deposit/migrations/0011_auto_20180115_1510.py
swh/deposit/migrations/0012_deposit_status_detail.py
swh/deposit/migrations/0013_depositrequest_raw_metadata.py
swh/deposit/migrations/0014_auto_20180720_1221.py
swh/deposit/migrations/0015_depositrequest_typemigration.py
swh/deposit/migrations/0016_auto_20190507_1408.py
swh/deposit/migrations/__init__.py
swh/deposit/settings/__init__.py
swh/deposit/settings/common.py
swh/deposit/settings/development.py
swh/deposit/settings/production.py
swh/deposit/settings/testing.py
swh/deposit/static/robots.txt
swh/deposit/static/css/bootstrap-responsive.min.css
swh/deposit/static/css/style.css
swh/deposit/static/img/arrow-up-small.png
swh/deposit/static/img/swh-logo-deposit.png
swh/deposit/static/img/swh-logo-deposit.svg
swh/deposit/static/img/icons/swh-logo-32x32.png
swh/deposit/static/img/icons/swh-logo-deposit-180x180.png
swh/deposit/static/img/icons/swh-logo-deposit-192x192.png
swh/deposit/static/img/icons/swh-logo-deposit-270x270.png
swh/deposit/templates/__init__.py
swh/deposit/templates/homepage.html
swh/deposit/templates/layout.html
swh/deposit/templates/deposit/__init__.py
swh/deposit/templates/deposit/content.xml
swh/deposit/templates/deposit/deposit_receipt.xml
swh/deposit/templates/deposit/error.xml
swh/deposit/templates/deposit/service_document.xml
swh/deposit/templates/deposit/status.xml
swh/deposit/templates/rest_framework/api.html
swh/deposit/tests/__init__.py
swh/deposit/tests/common.py
swh/deposit/tests/test_utils.py
swh/deposit/tests/api/__init__.py
swh/deposit/tests/api/test_common.py
swh/deposit/tests/api/test_converters.py
swh/deposit/tests/api/test_deposit.py
swh/deposit/tests/api/test_deposit_atom.py
swh/deposit/tests/api/test_deposit_binary.py
swh/deposit/tests/api/test_deposit_check.py
swh/deposit/tests/api/test_deposit_delete.py
swh/deposit/tests/api/test_deposit_list.py
swh/deposit/tests/api/test_deposit_multipart.py
swh/deposit/tests/api/test_deposit_read_archive.py
swh/deposit/tests/api/test_deposit_read_metadata.py
swh/deposit/tests/api/test_deposit_status.py
swh/deposit/tests/api/test_deposit_update.py
swh/deposit/tests/api/test_deposit_update_status.py
swh/deposit/tests/api/test_parser.py
swh/deposit/tests/api/test_service_document.py
swh/deposit/tests/loader/__init__.py
swh/deposit/tests/loader/common.py
swh/deposit/tests/loader/conftest.py
swh/deposit/tests/loader/test_checker.py
swh/deposit/tests/loader/test_client.py
swh/deposit/tests/loader/test_loader.py
swh/deposit/tests/loader/test_tasks.py
\ No newline at end of file
[console_scripts]
swh-deposit=swh.deposit.cli:main
[swh.cli.subcommands]
deposit=swh.deposit.cli:deposit
\ No newline at end of file
vcversioner
click
xmltodict
iso8601
requests
swh.core>=0.0.60
[server]
Django<2.0
djangorestframework
swh.core[http]
swh.loader.tar>=0.0.39
swh.loader.core>=0.0.32
swh.scheduler>=0.0.39
swh.model>=0.0.26
[testing]
pytest<4
pytest-django
swh.scheduler[testing]
Django<2.0
djangorestframework
swh.core[http]
swh.loader.tar>=0.0.39
swh.loader.core>=0.0.32
swh.scheduler>=0.0.39
swh.model>=0.0.26
swh
[tox]
envlist=flake8,py3
[testenv:py3]
deps =
# the dependency below is needed for now as a workaround for
# https://github.com/pypa/pip/issues/6239
swh.core[http] >= 0.0.61
.[testing]
pytest-cov
pifpaf
pytest-django
commands =
pifpaf run postgresql -- pytest --cov {envsitepackagesdir}/swh/deposit --cov-branch {posargs} {envsitepackagesdir}/swh/deposit
[testenv:flake8]
skip_install = true
deps =
flake8
commands =
{envpython} -m flake8 \
--exclude=.tox,.git,__pycache__,.tox,.eggs,*.egg,swh/deposit/migrations
v0.0.71-0-gc1e6ffa
\ No newline at end of file