diff --git a/PKG-INFO b/PKG-INFO index 89536499e7ff39d69be47708efc3f14ea4960645..98d649b9acbc4e4a72589469187740f346437b52 100644 --- a/PKG-INFO +++ b/PKG-INFO @@ -1,6 +1,6 @@ Metadata-Version: 1.0 Name: swh.deposit -Version: 0.0.58 +Version: 0.0.59 Summary: Software Heritage Deposit Server Home-page: https://forge.softwareheritage.org/source/swh-deposit/ Author: Software Heritage developers diff --git a/docs/endpoints/status.rst b/docs/endpoints/status.rst index ad8a0e747e851a605b817e870536904d5e36ef9b..ed17ac87d048964cd1d736deb891a08bdff6c60e 100644 --- a/docs/endpoints/status.rst +++ b/docs/endpoints/status.rst @@ -54,10 +54,13 @@ Sample response <entry xmlns="http://www.w3.org/2005/Atom" xmlns:sword="http://purl.org/net/sword/" xmlns:dcterms="http://purl.org/dc/terms/"> - <deposit_id>150</deposit_id> + <deposit_id>160</deposit_id> <deposit_status>done</deposit_status> <deposit_status_detail>The deposit has been successfully loaded into the Software Heritage archive</deposit_status_detail> - <deposit_swh_id>swh:1:rev:c648730299c2a4f4df3c1fe6e527ef3681f9527e</deposit_swh_id> + <deposit_swh_id>swh:1:dir:d83b7dda887dc790f7207608474650d4344b8df9</deposit_swh_id> + <deposit_swh_id_context>swh:1:dir:d83b7dda887dc790f7207608474650d4344b8df9;origin=https://forge.softwareheritage.org/source/jesuisgpl/</deposit_swh_id> + <deposit_swh_anchor_id>swh:1:rev:e76ea49c9ffbb7f73611087ba6e999b19e5d71eb</deposit_swh_id> + <deposit_swh_anchor_id_context>swh:1:rev:e76ea49c9ffbb7f73611087ba6e999b19e5d71eb;origin=https://forge.softwareheritage.org/source/jesuisgpl/</deposit_swh_id> </entry> Rejected deposit: diff --git a/docs/getting-started.rst b/docs/getting-started.rst index d6288ac0db08f9d89113785c4e7f28f38ec8dc91..59d7f858128d7abf78f8fe53f65ca97563df003d 100644 --- a/docs/getting-started.rst +++ b/docs/getting-started.rst @@ -175,7 +175,7 @@ multisteps deposit The steps to create a multisteps deposit: 1. Create an incomplete deposit -~~~~~~~~~~~~~~~~~~~ +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ First use the ``--partial`` argument to declare there is more to come .. code:: shell @@ -186,7 +186,7 @@ First use the ``--partial`` argument to declare there is more to come 2. Add content or metadata to the deposit -~~~~~~~~~~~~~~~~~~~ +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Continue the deposit by using the ``--deposit-id`` argument given as a response for the first step. You can continue adding content or metadata while you use the ``--partial`` argument. @@ -235,7 +235,7 @@ Update deposit ``--deposit-id <id>`` is provided - by using the ``--replace`` flag - + - ``--metadata-deposit`` replaces associated existing metadata - ``--archive-deposit`` replaces associated archive(s) - by default, with no flag or both, you'll replace associated @@ -268,7 +268,7 @@ You can check the status of the deposit by using the ``--deposit-id`` argument: .. code:: shell -$ swh-deposit --username name --password secret --deposit-id '11' --status + $ swh-deposit --username name --password secret --deposit-id '11' --status .. code:: json @@ -292,14 +292,18 @@ The different statuses: When the deposit has been loaded into the archive, the status will be marked ``done``. In the response, will also be available the -<deposit_swh_id>. For example: +<deposit_swh_id>, <deposit_swh_id_context>, <deposit_swh_anchor_id>, +<deposit_swh_anchor_id_context>. For example: .. code:: json { 'deposit_id': '11', 'deposit_status': 'done', - 'deposit_swh_id': 'swh:1:rev:34898aa991c90b447c27d2ac1fc09f5c8f12783e', + 'deposit_swh_id': 'swh:1:dir:d83b7dda887dc790f7207608474650d4344b8df9', + 'deposit_swh_id_context': 'swh:1:dir:d83b7dda887dc790f7207608474650d4344b8df9;origin=https://forge.softwareheritage.org/source/jesuisgpl/', + 'deposit_swh_anchor_id': 'swh:1:rev:e76ea49c9ffbb7f73611087ba6e999b19e5d71eb', + 'deposit_swh_anchor_id_context': 'swh:1:rev:e76ea49c9ffbb7f73611087ba6e999b19e5d71eb;origin=https://forge.softwareheritage.org/source/jesuisgpl/', 'deposit_status_detail': 'The deposit has been successfully \ loaded into the Software Heritage archive' } diff --git a/docs/index.rst b/docs/index.rst index 23e304b520fcefbbebda1b3175f2f936f94f2c77..e8ffe3ef101a8139886de436b941a47ba53bd779 100644 --- a/docs/index.rst +++ b/docs/index.rst @@ -12,6 +12,7 @@ Software Heritage Deposit metadata.rst dev-info.rst sys-info.rst + specs/specs.rst Indices and tables ================== diff --git a/docs/blueprint.rst b/docs/specs/blueprint.rst similarity index 84% rename from docs/blueprint.rst rename to docs/specs/blueprint.rst index 1fa91cd9c125009f85fe71e6f73c74ea894cd339..e0b93e8f0e303882b39fcc250c8b4323a8bab41f 100644 --- a/docs/blueprint.rst +++ b/docs/specs/blueprint.rst @@ -8,13 +8,13 @@ Deposit creation From client's deposit repository server to SWH's repository server: 1. The client requests for the server's abilities and its associated collection - (GET query to the *SD/service document uri*) + (GET query to the *SD/service document uri*) 2. The server answers the client with the service document which gives the - *collection uri* (also known as *COL/collection IRI*). + *collection uri* (also known as *COL/collection IRI*). 3. The client sends a deposit (optionally a zip archive, some metadata or both) - through the *collection uri*. + through the *collection uri*. This can be done in: @@ -22,16 +22,16 @@ From client's deposit repository server to SWH's repository server: * one POST request (metadata or archive) + other PUT or POST request to the *update uris* (*edit-media iri* or *edit iri*) - 1. Server validates the client's input or returns detailed error if any + a. Server validates the client's input or returns detailed error if any - 2. Server stores information received (metadata or software archive source + b. Server stores information received (metadata or software archive source code or both) 4. The server notifies the client it acknowledged the client's request. An - ``http 201 Created`` response with a deposit receipt in the body response is - sent back. That deposit receipt will hold the necessary information to - eventually complete the deposit later on if it was incomplete (also known as - status ``partial``). + ``http 201 Created`` response with a deposit receipt in the body response is + sent back. That deposit receipt will hold the necessary information to + eventually complete the deposit later on if it was incomplete (also known as + status ``partial``). Schema representation ^^^^^^^^^^^^^^^^^^^^^ diff --git a/docs/specs/metadata_example.xml b/docs/specs/metadata_example.xml new file mode 100644 index 0000000000000000000000000000000000000000..c681e5598b22eb45d054c00a8e4ae27246e4fcdf --- /dev/null +++ b/docs/specs/metadata_example.xml @@ -0,0 +1,35 @@ +<?xml version="1.0"?> + <entry xmlns="http://www.w3.org/2005/Atom" + xmlns:codemeta="https://doi.org/10.5063/SCHEMA/CODEMETA-2.0" + xmlns:swh="swh.xsd"> + "{http://www.w3.org/2005/Atom}author": { + "{http://www.w3.org/2005/Atom}email": "hal@ccsd.cnrs.fr", + "{http://www.w3.org/2005/Atom}name": "HAL" + }, + <author> + <name>HAL</name> + <email>hal@ccsd.cnrs.fr</email> + </author> + <client>hal</client> + <external_identifier>hal-01243573</external_identifier> + <codemeta:name>The assignment problem</codemeta:name> + <codemeta:url>https://hal.archives-ouvertes.fr/hal-01243573</codemeta:url> + <codemeta:identifier>other identifier, DOI, ARK</codemeta:identifier> + <codemeta:applicationCategory>Domain</codemeta:applicationCategory> + <codemeta:description>description</codemeta:description> + <codemeta:author> + <codemeta:name> author1 </codemeta:name> + <codemeta:affiliation> Inria </codemeta:affiliation> + <codemeta:affiliation> UPMC </codemeta:affiliation> + </codemeta:author> + <codemeta:author> + <codemeta:name> author2 </codemeta:name> + <codemeta:affiliation> Inria </codemeta:affiliation> + <codemeta:affiliation> UPMC </codemeta:affiliation> + </codemeta:author> + <swh:deposit> + <swh:bindings> + <swh:binding source="path/to/file.txt" destination="aaaaaaaaaaa..."/> + </swh:bindings> + </swh:deposit> + </entry> diff --git a/docs/spec-loading.rst b/docs/specs/spec-loading.rst similarity index 100% rename from docs/spec-loading.rst rename to docs/specs/spec-loading.rst diff --git a/docs/specs/spec-meta-deposit.rst b/docs/specs/spec-meta-deposit.rst new file mode 100644 index 0000000000000000000000000000000000000000..517757fc9d0bf208f29c069087bd38e16d48f686 --- /dev/null +++ b/docs/specs/spec-meta-deposit.rst @@ -0,0 +1,84 @@ +The metadata-deposit +==================== + +Goal +---- +A client wishes to deposit only metadata about an object in the Software +Heritage archive. + +The meta-deposit is a special deposit where no content is +provided and the data transfered to Software Heritage is only +the metadata about an object or several objects in the archive. + +Requirements +------------ +The scope of the meta-deposit is different than the +sparse-deposit. While a sparse-deposit creates a revision with referenced +directories and content files, the meta-deposit references one of the following: + +- origin +- snapshot +- revision +- release + + +A complete metadata example +--------------------------- +The reference element is included in the metadata xml atomEntry under the +swh namespace: + +.. code:: xml + + <?xml version="1.0"?> + <entry xmlns="http://www.w3.org/2005/Atom" + xmlns:codemeta="https://doi.org/10.5063/SCHEMA/CODEMETA-2.0" + xmlns:swh="swh.xsd"> + <author> + <name>HAL</name> + <email>hal@ccsd.cnrs.fr</email> + </author> + <client>hal</client> + <external_identifier>hal-01243573</external_identifier> + <codemeta:name>The assignment problem</codemeta:name> + <codemeta:url>https://hal.archives-ouvertes.fr/hal-01243573</codemeta:url> + <codemeta:identifier>other identifier, DOI, ARK</codemeta:identifier> + <codemeta:applicationCategory>Domain</codemeta:applicationCategory> + <codemeta:description>description</codemeta:description> + <codemeta:author> + <codemeta:name> author1 </codemeta:name> + <codemeta:affiliation> Inria </codemeta:affiliation> + <codemeta:affiliation> UPMC </codemeta:affiliation> + </codemeta:author> + <codemeta:author> + <codemeta:name> author2 </codemeta:name> + <codemeta:affiliation> Inria </codemeta:affiliation> + <codemeta:affiliation> UPMC </codemeta:affiliation> + </codemeta:author> + <swh:deposit> + <swh:reference> + <swh:origin url='https://github.com/user/repo'/> + </swh:reference> + </swh:deposit> + </entry> + +Examples by target type +^^^^^^^^^^^^^^^^^^^^^^^ + +With ${type} in {snp (snapshot), rev (revision), rel (release) }: + +.. code:: xml + + <swh:deposit> + <swh:reference> + <swh:object id="swh:1:${type}:aaaaaaaaaaaaaa..."/> + </swh:reference> + </swh:deposit> + + + +Loading procedure +------------------ + +In this case, the meta-deposit will be injected as a metadata entry at the +appropriate level (origin_metadata, revision_metadata, etc.). Contrary to the +complete and sparse deposit, there will be no object creation. diff --git a/docs/specs/spec-sparse-deposit.rst b/docs/specs/spec-sparse-deposit.rst new file mode 100644 index 0000000000000000000000000000000000000000..e08f5728179da682d5083e1d20479bd810308e71 --- /dev/null +++ b/docs/specs/spec-sparse-deposit.rst @@ -0,0 +1,101 @@ +The sparse-deposit +================== + +Goal +---- +A client wishes to transfer a tarball for which part of the content is +already in the SWH archive. + +Requirements +------------ +To do so, a list of paths with targets must be provided in the metadata and +the paths to the missing directories/content should not be included +in the tarball. The list will be referred to +as the manifest list using the entry name 'bindings' in the metadata. + ++----------------------+-------------------------------------+ +| path | swh-id | ++======================+=====================================+ +| path/to/file.txt | swh:1:cnt:aaaaaaaaaaaaaaaaaaaaa... | ++----------------------+-------------------------------------+ +| path/to/dir/ | swh:1:dir:aaaaaaaaaaaaaaaaaaaaa... | ++----------------------+-------------------------------------+ + +Note: the *name* of the file or the directory is given by the path and is not +part of the identified object. + +A concrete example +------------------ +The manifest list is included in the metadata xml atomEntry under the +swh namespace: + +.. code:: xml + + <?xml version="1.0"?> + <entry xmlns="http://www.w3.org/2005/Atom" + xmlns:codemeta="https://doi.org/10.5063/SCHEMA/CODEMETA-2.0" + xmlns:swh="swh.xsd"> + <author> + <name>HAL</name> + <email>hal@ccsd.cnrs.fr</email> + </author> + <client>hal</client> + <external_identifier>hal-01243573</external_identifier> + <codemeta:name>The assignment problem</codemeta:name> + <codemeta:url>https://hal.archives-ouvertes.fr/hal-01243573</codemeta:url> + <codemeta:identifier>other identifier, DOI, ARK</codemeta:identifier> + <codemeta:applicationCategory>Domain</codemeta:applicationCategory> + <codemeta:description>description</codemeta:description> + <codemeta:author> + <codemeta:name> author1 </codemeta:name> + <codemeta:affiliation> Inria </codemeta:affiliation> + <codemeta:affiliation> UPMC </codemeta:affiliation> + </codemeta:author> + <codemeta:author> + <codemeta:name> author2 </codemeta:name> + <codemeta:affiliation> Inria </codemeta:affiliation> + <codemeta:affiliation> UPMC </codemeta:affiliation> + </codemeta:author> + <swh:deposit> + <swh:bindings> + <swh:binding source="path/to/file.txt" + destination="swh:1:cnt:aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"/> + <swh:binding source="path/to/second_file.txt + destination="swh:1:cnt:bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb"/> + <swh:binding source="path/to/dir/ + destination="swh:1:dir:ddddddddddddddddddddddddddddddddd"/> + + </swh:bindings> + </swh:deposit> + </entry> + + +Deposit verification +-------------------- + +After checking the integrity of the deposit content and +metadata, the following checks should be added: + +1. validate the manifest list structure with a correct swh-id for each path (syntax check on the swh-id format) +2. verify that the path name corresponds to the object type +3. locate the identifiers in the SWH archive + +Each failing check should return a different error with the deposit +and result in a 'rejected' deposit. + +Loading procedure +------------------ +The injection procedure should include: + +- load the tarball new data +- create new objects using the path name and create links from the path to the + SWH object using the identifier +- calculate identifier of the new objects at each level +- return final swh-id of the new revision + +Invariant: the same content should yield the same swh-id, +that's why a complete deposit with all the content and +a sparse-deposit with the correct links will result +with the same root directory swh-id. +The same is expected with the revision swh-id if the metadata provided is +identical. diff --git a/docs/specs/specs.rst b/docs/specs/specs.rst new file mode 100644 index 0000000000000000000000000000000000000000..bb86993d19eb8a5876d78c7e0e1c8e1426914930 --- /dev/null +++ b/docs/specs/specs.rst @@ -0,0 +1,13 @@ +.. _swh-deposit-specs: + +Blueprint Specifications +========================= + +.. toctree:: + :maxdepth: 1 + :caption: Contents: + + blueprint.rst + spec-loading.rst + spec-sparse-deposit.rst + spec-meta-deposit.rst diff --git a/docs/specs/swh.xsd b/docs/specs/swh.xsd new file mode 100644 index 0000000000000000000000000000000000000000..4dbf0ac6e961939c20a0b579b9941bf00a099544 --- /dev/null +++ b/docs/specs/swh.xsd @@ -0,0 +1,41 @@ +<?xml version="1.0" encoding="iso-8859-1"?> +<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema"> + <xsd:element name="deposit"> + <xsd:complexType> + <xsd:choice> + + <xsd:element name="reference"> + <xsd:complexType> + <xsd:choice> + <xsd:element name="object"> + <xsd:complexType> + <xsd:attribute type="xsd:string" name="id"/> + </xsd:complexType> + </xsd:element> + <xsd:element name="origin"> + <xsd:complexType> + <xsd:attribute type="xsd:string" name="url"/> + </xsd:complexType> + </xsd:element> + </xsd:choice> + </xsd:complexType> + </xsd:element> + + + <xsd:element name="bindings"> + <xsd:complexType> + <xsd:sequence> + <xsd:element name="binding" minOccurs="0" maxOccurs="unbounded"> + <xsd:complexType> + <xsd:attribute type="xsd:string" name="source"/> + <xsd:attribute type="xsd:string" name="destination"/> + </xsd:complexType> + </xsd:element> + </xsd:sequence> + </xsd:complexType> + </xsd:element> + + </xsd:choice> + </xsd:complexType> + </xsd:element> +</xsd:schema> diff --git a/swh.deposit.egg-info/PKG-INFO b/swh.deposit.egg-info/PKG-INFO index 89536499e7ff39d69be47708efc3f14ea4960645..98d649b9acbc4e4a72589469187740f346437b52 100644 --- a/swh.deposit.egg-info/PKG-INFO +++ b/swh.deposit.egg-info/PKG-INFO @@ -1,6 +1,6 @@ Metadata-Version: 1.0 Name: swh.deposit -Version: 0.0.58 +Version: 0.0.59 Summary: Software Heritage Deposit Server Home-page: https://forge.softwareheritage.org/source/swh-deposit/ Author: Software Heritage developers diff --git a/swh.deposit.egg-info/SOURCES.txt b/swh.deposit.egg-info/SOURCES.txt index 7e2c81175b7d4bc68341668f015b0f6159b85ba3..0c14a3264d1bf790d022f3dfee24f173ea39f820 100644 --- a/swh.deposit.egg-info/SOURCES.txt +++ b/swh.deposit.egg-info/SOURCES.txt @@ -31,14 +31,12 @@ debian/rules debian/source/format docs/.gitignore docs/Makefile -docs/blueprint.rst docs/conf.py docs/dev-info.rst docs/getting-started.rst docs/index.rst docs/metadata.rst docs/spec-api.rst -docs/spec-loading.rst docs/sys-info.rst docs/_static/.placeholder docs/_templates/.placeholder @@ -51,6 +49,13 @@ docs/endpoints/update-metadata.rst docs/images/deposit-create-chart.png docs/images/deposit-delete-chart.png docs/images/deposit-update-chart.png +docs/specs/blueprint.rst +docs/specs/metadata_example.xml +docs/specs/spec-loading.rst +docs/specs/spec-meta-deposit.rst +docs/specs/spec-sparse-deposit.rst +docs/specs/specs.rst +docs/specs/swh.xsd resources/deposit/server.yml swh/__init__.py swh/manage.py @@ -69,6 +74,7 @@ swh/deposit/models.py swh/deposit/parsers.py swh/deposit/signals.py swh/deposit/urls.py +swh/deposit/utils.py swh/deposit/wsgi.py swh/deposit/api/__init__.py swh/deposit/api/common.py @@ -134,6 +140,7 @@ swh/deposit/templates/deposit/status.xml swh/deposit/templates/rest_framework/api.html swh/deposit/tests/__init__.py swh/deposit/tests/common.py +swh/deposit/tests/test_utils.py swh/deposit/tests/api/__init__.py swh/deposit/tests/api/test_common.py swh/deposit/tests/api/test_deposit.py diff --git a/swh/deposit/api/private/__init__.py b/swh/deposit/api/private/__init__.py index f4acbba4e154117a39f509f32f648e5dfba58ff9..b1c5fb9888e141871bef2bacdd2c9f20368fb074 100644 --- a/swh/deposit/api/private/__init__.py +++ b/swh/deposit/api/private/__init__.py @@ -3,6 +3,7 @@ # License: GNU General Public License version 3, or any later version # See top-level LICENSE file for more information +from swh.deposit import utils from ...config import METADATA_TYPE from ...models import DepositRequest, Deposit @@ -45,7 +46,6 @@ class DepositReadMixin: metadata dict from the deposit. """ - metadata = {} - for dr in self._deposit_requests(deposit, request_type=METADATA_TYPE): - metadata.update(dr.metadata) - return metadata + metadata = (m.metadata for m in self._deposit_requests( + deposit, request_type=METADATA_TYPE)) + return utils.merge(*metadata) diff --git a/swh/deposit/api/private/deposit_read.py b/swh/deposit/api/private/deposit_read.py index f34903af4bbbec708845d2ada04ad92f47f17b40..833dcf297b0b6851b8c79fc74cd2c3227b7cb4ff 100644 --- a/swh/deposit/api/private/deposit_read.py +++ b/swh/deposit/api/private/deposit_read.py @@ -82,18 +82,6 @@ class SWHDepositReadArchives(SWHGetDepositAPI, SWHPrivateAPIView, if not os.path.exists(self.extraction_dir): os.makedirs(self.extraction_dir) - def retrieve_archives(self, deposit_id): - """Given a deposit identifier, returns its associated archives' path. - - Yields: - path to deposited archives - - """ - deposit_requests = self._deposit_requests( - deposit_id, request_type=ARCHIVE_TYPE) - for deposit_request in deposit_requests: - yield deposit_request.archive.path - def process_get(self, req, collection_name, deposit_id): """Build a unique tarball from the multiple received and stream that content to the client. @@ -107,9 +95,9 @@ class SWHDepositReadArchives(SWHGetDepositAPI, SWHPrivateAPIView, Tuple status, stream of content, content-type """ - archive_paths = list(self.retrieve_archives(deposit_id)) - with aggregate_tarballs(self.extraction_dir, - archive_paths) as path: + archive_paths = [r.archive.path for r in self._deposit_requests( + deposit_id, request_type=ARCHIVE_TYPE)] + with aggregate_tarballs(self.extraction_dir, archive_paths) as path: return FileResponse(open(path, 'rb'), status=status.HTTP_200_OK, content_type='application/octet-stream') diff --git a/swh/deposit/tests/api/test_deposit_read_metadata.py b/swh/deposit/tests/api/test_deposit_read_metadata.py index a55018f5391b7da527be9009937e776720324c9e..bcc546f3513f69e42db06113ae6ad37634d1282f 100644 --- a/swh/deposit/tests/api/test_deposit_read_metadata.py +++ b/swh/deposit/tests/api/test_deposit_read_metadata.py @@ -51,8 +51,8 @@ class DepositReadMetadataTest(APITestCase, WithAuthTestCase, BasicTestCase, }, 'origin_metadata': { 'metadata': { - '@xmlns': 'http://www.w3.org/2005/Atom', - 'author': 'some awesome author', + '@xmlns': ['http://www.w3.org/2005/Atom'], + 'author': ['some awesome author', 'another one', 'no one'], 'external_identifier': 'some-external-id', 'url': 'https://hal-test.archives-ouvertes.fr/' + 'some-external-id' @@ -79,8 +79,8 @@ class DepositReadMetadataTest(APITestCase, WithAuthTestCase, BasicTestCase, 'committer': SWH_PERSON, 'date': None, 'metadata': { - '@xmlns': 'http://www.w3.org/2005/Atom', - 'author': 'some awesome author', + '@xmlns': ['http://www.w3.org/2005/Atom'], + 'author': ['some awesome author', 'another one', 'no one'], 'external_identifier': 'some-external-id', 'url': 'https://hal-test.archives-ouvertes.fr/' + 'some-external-id' @@ -137,8 +137,8 @@ class DepositReadMetadataTest(APITestCase, WithAuthTestCase, BasicTestCase, }, 'origin_metadata': { 'metadata': { - '@xmlns': 'http://www.w3.org/2005/Atom', - 'author': 'some awesome author', + '@xmlns': ['http://www.w3.org/2005/Atom'], + 'author': ['some awesome author', 'another one', 'no one'], 'external_identifier': 'some-external-id', 'url': 'https://hal-test.archives-ouvertes.fr/' + 'some-external-id' @@ -166,8 +166,8 @@ class DepositReadMetadataTest(APITestCase, WithAuthTestCase, BasicTestCase, 'type': 'tar', 'message': 'hal: Deposit %s in collection hal' % deposit_id, 'metadata': { - '@xmlns': 'http://www.w3.org/2005/Atom', - 'author': 'some awesome author', + '@xmlns': ['http://www.w3.org/2005/Atom'], + 'author': ['some awesome author', 'another one', 'no one'], 'external_identifier': 'some-external-id', 'url': 'https://hal-test.archives-ouvertes.fr/' + 'some-external-id' @@ -177,7 +177,7 @@ class DepositReadMetadataTest(APITestCase, WithAuthTestCase, BasicTestCase, 'branch_name': 'master', } - self.assertEquals(data, expected_meta) + self.assertEqual(data, expected_meta) @istest def access_to_nonexisting_deposit_returns_404_response(self): diff --git a/swh/deposit/tests/common.py b/swh/deposit/tests/common.py index e595a783b18c6ae78dee441f501ed611b124c133..c27997174b0de9826c2bbd472374fe6257c09678 100644 --- a/swh/deposit/tests/common.py +++ b/swh/deposit/tests/common.py @@ -348,12 +348,13 @@ class CommonCreationRoutine(TestCase): <entry xmlns="http://www.w3.org/2005/Atom"> <external_identifier>some-external-id</external_identifier> <url>https://hal-test.archives-ouvertes.fr/some-external-id</url> + <author>some awesome author</author> </entry>""" self.atom_entry_data1 = b"""<?xml version="1.0"?> <entry xmlns="http://www.w3.org/2005/Atom"> - <author>some awesome author</author> - + <author>another one</author> + <author>no one</author> </entry>""" self.atom_entry_data2 = b"""<?xml version="1.0"?> diff --git a/swh/deposit/tests/test_utils.py b/swh/deposit/tests/test_utils.py new file mode 100644 index 0000000000000000000000000000000000000000..1dfe46e6e4683b9e56669d98e02773b1e042fc43 --- /dev/null +++ b/swh/deposit/tests/test_utils.py @@ -0,0 +1,138 @@ +# Copyright (C) 2018 The Software Heritage developers +# See the AUTHORS file at the top-level directory of this distribution +# License: GNU General Public License version 3, or any later version +# See top-level LICENSE file for more information + +import unittest + +from nose.tools import istest + +from swh.deposit import utils + + +class UtilsTestCase(unittest.TestCase): + """Utils library + + """ + @istest + def merge(self): + """Calling utils.merge on dicts should merge without losing information + + """ + d0 = { + 'author': 'someone', + 'license': [['gpl2']], + 'a': 1 + } + + d1 = { + 'author': ['author0', {'name': 'author1'}], + 'license': [['gpl3']], + 'b': { + '1': '2' + } + } + + d2 = { + 'author': map(lambda x: x, ['else']), + 'license': 'mit', + 'b': { + '2': '3', + } + } + + d3 = { + 'author': (v for v in ['no one']), + } + + actual_merge = utils.merge(d0, d1, d2, d3) + + expected_merge = { + 'a': 1, + 'license': [['gpl2'], ['gpl3'], 'mit'], + 'author': [ + 'someone', 'author0', {'name': 'author1'}, 'else', 'no one'], + 'b': { + '1': '2', + '2': '3', + } + } + self.assertEquals(actual_merge, expected_merge) + + @istest + def merge_2(self): + d0 = { + 'license': 'gpl2', + 'runtime': { + 'os': 'unix derivative' + } + } + + d1 = { + 'license': 'gpl3', + 'runtime': 'GNU/Linux' + } + + expected = { + 'license': ['gpl2', 'gpl3'], + 'runtime': [ + { + 'os': 'unix derivative' + }, + 'GNU/Linux' + ], + } + + actual = utils.merge(d0, d1) + self.assertEqual(actual, expected) + + @istest + def merge_edge_cases(self): + input_dict = { + 'license': ['gpl2', 'gpl3'], + 'runtime': [ + { + 'os': 'unix derivative' + }, + 'GNU/Linux' + ], + } + # against empty dict + actual = utils.merge(input_dict, {}) + self.assertEqual(actual, input_dict) + + # against oneself + actual = utils.merge(input_dict, input_dict, input_dict) + self.assertEqual(input_dict, input_dict) + + @istest + def merge_one_dict(self): + """Merge one dict should result in the same dict value + + """ + input_and_expected = {'anything': 'really'} + actual = utils.merge(input_and_expected) + self.assertEqual(actual, input_and_expected) + + @istest + def merge_raise(self): + """Calling utils.merge with any no dict argument should raise + + """ + d0 = { + 'author': 'someone', + 'a': 1 + } + + d1 = ['not a dict'] + + with self.assertRaises(ValueError): + utils.merge(d0, d1) + + with self.assertRaises(ValueError): + utils.merge(d1, d0) + + with self.assertRaises(ValueError): + utils.merge(d1) + + self.assertEquals(utils.merge(d0), d0) diff --git a/swh/deposit/utils.py b/swh/deposit/utils.py new file mode 100644 index 0000000000000000000000000000000000000000..7979ec5b9e5411896fc379e2a896da8602495536 --- /dev/null +++ b/swh/deposit/utils.py @@ -0,0 +1,55 @@ +# Copyright (C) 2018 The Software Heritage developers +# See the AUTHORS file at the top-level directory of this distribution +# License: GNU General Public License version 3, or any later version +# See top-level LICENSE file for more information + +from types import GeneratorType + + +def merge(*dicts): + """Given an iterator of dicts, merge them losing no information. + + Args: + *dicts: arguments are all supposed to be dict to merge into one + + Returns: + dict merged without losing information + + """ + def _extend(existing_val, value): + """Given an existing value and a value (as potential lists), merge + them together without repetition. + + """ + if isinstance(value, (list, map, GeneratorType)): + vals = value + else: + vals = [value] + for v in vals: + if v in existing_val: + continue + existing_val.append(v) + return existing_val + + d = {} + for data in dicts: + if not isinstance(data, dict): + raise ValueError( + 'dicts is supposed to be a variable arguments of dict') + + for key, value in data.items(): + existing_val = d.get(key) + if not existing_val: + d[key] = value + continue + if isinstance(existing_val, (list, map, GeneratorType)): + new_val = _extend(existing_val, value) + elif isinstance(existing_val, dict): + if isinstance(value, dict): + new_val = merge(existing_val, value) + else: + new_val = _extend([existing_val], value) + else: + new_val = _extend([existing_val], value) + d[key] = new_val + return d diff --git a/version.txt b/version.txt index e54d1d4cb4dcb0278ff983199d42384f7a0b0408..61d91691d34e3de8be087672fc82fd40be06c15e 100644 --- a/version.txt +++ b/version.txt @@ -1 +1 @@ -v0.0.58-0-gf264ef1 \ No newline at end of file +v0.0.59-0-g19ca52e \ No newline at end of file