- Mar 08, 2022
-
- Feb 28, 2022
-
-
Nicolas Dandrimont authored
This function is only used by server-side API checks. Having it defined in the main utils module makes the deposit client transitively depend on Django (via swh.deposit.errors), which does not seem necessary.
-
vlorentz authored
For now this increases code complexity, but this will allow addition of other check more easily.
-
vlorentz authored
-
- Feb 22, 2022
-
-
vlorentz authored
This commit does not touch the external API though; ie. `metadata_dict` is still present in the JSON API, and the equivalent `jsonb` field remains in the database. They will probably be removed in a future commit because they are not very useful, though. Rationale: I find xmltodict's approach of translating XML tree to native structures to be intrinsically flawed for non-trivial handling of XML, because the data structure is: * implementation-defined (by xmltodict, which is python-only) and it may change across versions * does not intrinsically store namespaces, and relies on an internal prefix map (though it isn't much of an issue right now, as we do not need composability and all the changed APIs are private) * not stable; for example, `<a><b>foo</b></a>` and `<a><b>foo</b><b>bar</b></a>` are encoded completely differently (the former is a `Dict[str, str]`, the latter is `Dict[str, list]`. And every operation manipulating this data structure needs to check presence, number *and* type on every access. Consider this part of this commit for example: ``` - swh_deposit = metadata.get("swh:deposit") - if not swh_deposit: - return None - - swh_reference = swh_deposit.get("swh:reference") - if not swh_reference: - return None - - swh_origin = swh_reference.get("swh:origin") - if swh_origin: - url = swh_origin.get("@url") - if url: - return url + ref_origin = metadata.find( + "swh:deposit/swh:reference/swh:origin[@url]", namespaces=NAMESPACES + ) + if ref_origin is not None: + return ref_origin.attrib["url"] ``` the use of XPath makes it considerably shorter; and the original version did not even check number/type (ie. it would crash if an element was duplicated).
-
- Feb 21, 2022
-
-
Antoine R. Dumont authored
This introduces a new check about the metadata provenance. While it's a suggested field, it's definitely something that we want deposit clients to send us. So warn when it's not the case. That does not reject the deposit but it's worth keeping that detail in the backend. Related to T3677
-
- Dec 21, 2020
-
-
vlorentz authored
Otherwise, querying /1/private/<deposit_id>/meta/ will crash because it fails to parse the date. Resolves T2906.
-
- Dec 10, 2020
-
-
vlorentz authored
Accept <codemeta:name> and <codemeta:author> as alternatives to <atom:name>/<atom:title> and <atom:author>. This was broken by a8e86a92, as the check_metadata() checks whether there is a tag that *contains* the expected name, checking for 'author' (the old name of 'atom:author') accidentally matched 'codemeta:author' as well. This resulted in the right behavior in the majority of the cases (accepted 'codemeta:author'), but for the wrong reason, and explicitly renaming to atom:author broke this. Ditto for name. A future commit will remove the substring matching to remove false positives (eg. 'atom:authorblahblah' should not be accepted as 'atom:author')
- Nov 20, 2020
-
-
vlorentz authored
This mostly does not change the protocol used (except in the error messages), it's just an internal change for the server and for the client. The only change in the protocol is that local tags (eg. `<entry>...</entry>`) are no longer assumed to be in the Atom namespace (like `<entry xmlns="http://www.w3.org/2005/Atom">...</entry>`), but they never should have been in the first place. Default namespaces / unprefixed tags are a footgun because it's too easy to add tags in the default namespace without noticing, or use the wrong namespace.
-
- Sep 28, 2020
-
-
Antoine R. Dumont authored
-
Antoine R. Dumont authored
This refactor the common code executed by the checker so we functionally check everything the same way.
-