Skip to content
Snippets Groups Projects
  1. Mar 15, 2022
  2. Mar 08, 2022
  3. Mar 04, 2022
  4. Feb 28, 2022
  5. Feb 24, 2022
  6. Feb 23, 2022
  7. Feb 22, 2022
    • vlorentz's avatar
      server: Use xml.etree.ElementTree instead of nested dicts internally · 55ae87b1
      vlorentz authored
      This commit does not touch the external API though; ie. `metadata_dict`
      is still present in the JSON API, and the equivalent `jsonb` field remains
      in the database. They will probably be removed in a future commit
      because they are not very useful, though.
      
      Rationale:
      
      I find xmltodict's approach of translating XML tree to native structures
      to be intrinsically flawed for non-trivial handling of XML, because the
      data structure is:
      
      * implementation-defined (by xmltodict, which is python-only) and it may
        change across versions
      * does not intrinsically store namespaces, and relies on an internal
        prefix map  (though it isn't much of an issue right now, as we do not need
        composability and all the changed APIs are private)
      * not stable; for example, `<a><b>foo</b></a>` and `<a><b>foo</b><b>bar</b></a>`
        are encoded completely differently (the former is a `Dict[str, str]`,
        the latter is `Dict[str, list]`.
      
      And every operation manipulating this data structure needs to check
      presence, number *and* type on every access. Consider this part of this
      commit for example:
      
      ```
      -    swh_deposit = metadata.get("swh:deposit")
      -    if not swh_deposit:
      -        return None
      -
      -    swh_reference = swh_deposit.get("swh:reference")
      -    if not swh_reference:
      -        return None
      -
      -    swh_origin = swh_reference.get("swh:origin")
      -    if swh_origin:
      -        url = swh_origin.get("@url")
      -        if url:
      -            return url
      +    ref_origin = metadata.find(
      +        "swh:deposit/swh:reference/swh:origin[@url]", namespaces=NAMESPACES
      +    )
      +    if ref_origin is not None:
      +        return ref_origin.attrib["url"]
      ```
      
      the use of XPath makes it considerably shorter; and the original version
      did not even check number/type (ie. it would crash if an element was
      duplicated).
      55ae87b1
    • Antoine R. Dumont's avatar
      deposit.cli.client: Allow user to define the metadata provenance url · b9f565aa
      Antoine R. Dumont authored
      If the user is providing the `--metadata-provenance-url`, the xml generated will forward
      that information to the deposit server. If the user is providing the metadata file
      directly, a warning will be logged to notify the user of the missing metadata provenance
      url (if it is missing).
      
      Related to T3677
      b9f565aa
    • vlorentz's avatar
      Fix URI of schema.org · a10ed57b
      vlorentz authored
      Either is valid according to https://schema.org/docs/gs.html ;
      but we need to pick one, as they are opaque identifiers.
      And codemeta chose http:// (because it was the only one to be
      valid back then), so we should stick to this one.
      a10ed57b
    • vlorentz's avatar
      Remove metadata merging; use only the latest document · 7727a9c0
      vlorentz authored
      We don't use that feature at all as far as I am aware.
      
      I also find that it complicates any metadata handling (especially the validation
      I would like to add in the near future), and probably does not match semantics
      intended by SWORD (merging occurs on PUT requests, as we don't implement PATCH)
      7727a9c0
  8. Feb 21, 2022
  9. Feb 10, 2022
Loading