- Feb 28, 2022
-
-
vlorentz authored
For now, this only checks they aren't just a string
-
vlorentz authored
This only checks the name is a string.
-
vlorentz authored
-
vlorentz authored
For now this increases code complexity, but this will allow addition of other check more easily.
-
vlorentz authored
-
vlorentz authored
-
vlorentz authored
The leading newline prevented textwrap.dedent from removing them.
-
vlorentz authored
It was never actually tested...
-
vlorentz authored
This fixes crashes when running 'pytest -k migration', because swh/deposit/tests_migration/ lacks a conftest to initialize the database.
-
- Feb 24, 2022
-
-
Antoine R. Dumont authored
This will refuse the metadata-only deposit if the metadata provenance does not match. This is doing a similar check already done when doing deposit with origin url mismatching that same (client) provider url. Related to T3677
-
vlorentz authored
We don't have much use for it anymore, let's use ElementTree everywhere for consistency.
-
Antoine R. Dumont authored
This should ease deposit listing in whatever forms (backend db read or client consuming deposit listing). Deposit types stand for: - meta: metadata-only deposit - code: content deposit This commit includes a migration schema script which adds a new column 'type'. The script is also in charge of migration existing data with the right type values.. Related to T3677
-
Antoine R. Dumont authored
Related to T3677
-
vlorentz authored
xmltodict was already on the way out for the deposit, and the latest libexpat security update broke it entirely when dealing with namespaces, which means we cannot use it until this is addressed. https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1006317 Functional changes of this commit: 1. No more writes to the 'metadata' jsonb column in the DB (as it strongly depends on xmltodict) 2. ServiceDocumentDepositClient always outputs a list of collections, instead of None/dict/List[dict] depending on the number of collections (artefact of using xmltodict, which is replaced by proper parsing)
-
vlorentz authored
No one uses that, and it's redundant, as we provide the original XML
-
vlorentz authored
-
- Feb 23, 2022
-
-
Antoine R. Dumont authored
This now lists the deposit with their associated raw metadata if any is present. This will allow adaptations in the moderation view [1] to display the metadata provenance url (provided it's parsed out of the raw metadata). [1] The moderation view consumes this internal api. Related to T3677
-
vlorentz authored
-
vlorentz authored
This will be useful for metadata-only deposit, as there is not necessarily and origin in the referenced SWHID; and even when there is, it is usually not the actual source of the metadata. Therefore, we need this new field to link back to the provenance of the metadata.
-
- Feb 22, 2022
-
-
vlorentz authored
This commit does not touch the external API though; ie. `metadata_dict` is still present in the JSON API, and the equivalent `jsonb` field remains in the database. They will probably be removed in a future commit because they are not very useful, though. Rationale: I find xmltodict's approach of translating XML tree to native structures to be intrinsically flawed for non-trivial handling of XML, because the data structure is: * implementation-defined (by xmltodict, which is python-only) and it may change across versions * does not intrinsically store namespaces, and relies on an internal prefix map (though it isn't much of an issue right now, as we do not need composability and all the changed APIs are private) * not stable; for example, `<a><b>foo</b></a>` and `<a><b>foo</b><b>bar</b></a>` are encoded completely differently (the former is a `Dict[str, str]`, the latter is `Dict[str, list]`. And every operation manipulating this data structure needs to check presence, number *and* type on every access. Consider this part of this commit for example: ``` - swh_deposit = metadata.get("swh:deposit") - if not swh_deposit: - return None - - swh_reference = swh_deposit.get("swh:reference") - if not swh_reference: - return None - - swh_origin = swh_reference.get("swh:origin") - if swh_origin: - url = swh_origin.get("@url") - if url: - return url + ref_origin = metadata.find( + "swh:deposit/swh:reference/swh:origin[@url]", namespaces=NAMESPACES + ) + if ref_origin is not None: + return ref_origin.attrib["url"] ``` the use of XPath makes it considerably shorter; and the original version did not even check number/type (ie. it would crash if an element was duplicated).
-
Antoine R. Dumont authored
If the user is providing the `--metadata-provenance-url`, the xml generated will forward that information to the deposit server. If the user is providing the metadata file directly, a warning will be logged to notify the user of the missing metadata provenance url (if it is missing). Related to T3677
-
vlorentz authored
Either is valid according to https://schema.org/docs/gs.html ; but we need to pick one, as they are opaque identifiers. And codemeta chose http:// (because it was the only one to be valid back then), so we should stick to this one.
-
vlorentz authored
We don't use that feature at all as far as I am aware. I also find that it complicates any metadata handling (especially the validation I would like to add in the near future), and probably does not match semantics intended by SWORD (merging occurs on PUT requests, as we don't implement PATCH)
-
- Feb 21, 2022
-
-
Antoine R. Dumont authored
Prior to this commit, only rejected deposit were storing problem details. Now that we can have warnings even in case of 'verified' deposit, we need to store that details for post-analysis. Note that this also fixes the docstring of the overall class which were out of date since the beginning (duplicated from another class). Related to T3677
-
Antoine R. Dumont authored
This introduces a new check about the metadata provenance. While it's a suggested field, it's definitely something that we want deposit clients to send us. So warn when it's not the case. That does not reject the deposit but it's worth keeping that detail in the backend. Related to T3677
-
Antoine R. Dumont authored
Related to T3677
-
Antoine Lambert authored
Tests still pass and it aligns the test requirements with other swh modules.
-
- Feb 10, 2022
-
-
Antoine Lambert authored
To install the new hook: $ pre-commit install -t commit-msg
-
vlorentz authored
-
vlorentz authored
-
vlorentz authored
-
- Feb 07, 2022
-
-
Antoine R. Dumont authored
Related to T3916
-
- Jan 21, 2022
- Jan 18, 2022
-
-
Antoine R. Dumont authored
instead of not being detected and crash as an internal server error (500) Related to T3856
-
Antoine R. Dumont authored
This fixes the build [1] [2] [1] ``` 10:49:23 Warning, treated as error: 10:49:23 /var/lib/jenkins/workspace/DDEP/tests-on-diff/docs/README.rst:40:hardcoded link 'https://archive.softwareheritage.org/save/' could be replaced by an extlink (try using ':swh_web:`save/`' instead) ``` [2] https://jenkins.softwareheritage.org/view/swh-draft/job/DDEP/job/tests/1741/console
-
- Jan 12, 2022
-
-
vlorentz authored
The next release of swh-model will remove it from the constructor, attributes, and dict. However, this keeps using 'offset' despite the removable from swh-model, because it needs to be JSON-serializable.
-
- Jan 11, 2022
- Jan 10, 2022
-