Skip to content
Snippets Groups Projects
  1. Mar 08, 2022
  2. Feb 28, 2022
  3. Feb 22, 2022
    • vlorentz's avatar
      server: Use xml.etree.ElementTree instead of nested dicts internally · 55ae87b1
      vlorentz authored
      This commit does not touch the external API though; ie. `metadata_dict`
      is still present in the JSON API, and the equivalent `jsonb` field remains
      in the database. They will probably be removed in a future commit
      because they are not very useful, though.
      
      Rationale:
      
      I find xmltodict's approach of translating XML tree to native structures
      to be intrinsically flawed for non-trivial handling of XML, because the
      data structure is:
      
      * implementation-defined (by xmltodict, which is python-only) and it may
        change across versions
      * does not intrinsically store namespaces, and relies on an internal
        prefix map  (though it isn't much of an issue right now, as we do not need
        composability and all the changed APIs are private)
      * not stable; for example, `<a><b>foo</b></a>` and `<a><b>foo</b><b>bar</b></a>`
        are encoded completely differently (the former is a `Dict[str, str]`,
        the latter is `Dict[str, list]`.
      
      And every operation manipulating this data structure needs to check
      presence, number *and* type on every access. Consider this part of this
      commit for example:
      
      ```
      -    swh_deposit = metadata.get("swh:deposit")
      -    if not swh_deposit:
      -        return None
      -
      -    swh_reference = swh_deposit.get("swh:reference")
      -    if not swh_reference:
      -        return None
      -
      -    swh_origin = swh_reference.get("swh:origin")
      -    if swh_origin:
      -        url = swh_origin.get("@url")
      -        if url:
      -            return url
      +    ref_origin = metadata.find(
      +        "swh:deposit/swh:reference/swh:origin[@url]", namespaces=NAMESPACES
      +    )
      +    if ref_origin is not None:
      +        return ref_origin.attrib["url"]
      ```
      
      the use of XPath makes it considerably shorter; and the original version
      did not even check number/type (ie. it would crash if an element was
      duplicated).
      55ae87b1
  4. Feb 21, 2022
  5. Dec 21, 2020
  6. Dec 10, 2020
    • vlorentz's avatar
      Use string equality instead of substring search to check for mandatory fields. · c436adcf
      vlorentz authored
      eg. 'atom:authorblahblah' should not be accepted when we expect 'atom:author'
      v0.7.1
      c436adcf
    • vlorentz's avatar
      Accept <codemeta:name> and <codemeta:author> as alternatives to... · 00795b41
      vlorentz authored
      Accept <codemeta:name> and <codemeta:author> as alternatives to <atom:name>/<atom:title> and <atom:author>.
      
      This was broken by a8e86a92,
      as the check_metadata() checks whether there is a tag that *contains*
      the expected name, checking for 'author' (the old name of 'atom:author')
      accidentally matched 'codemeta:author' as well.
      
      This resulted in the right behavior in the majority of the cases
      (accepted 'codemeta:author'), but for the wrong reason, and explicitly
      renaming to atom:author broke this.
      
      Ditto for name.
      
      A future commit will remove the substring matching to remove
      false positives (eg. 'atom:authorblahblah' should not be accepted
      as 'atom:author')
      00795b41
  7. Nov 20, 2020
    • vlorentz's avatar
      Explicitly use the atom: prefix internally. · a8e86a92
      vlorentz authored
      This mostly does not change the protocol used (except in the error messages),
      it's just an internal change for the server and for the client.
      
      The only change in the protocol is that local tags (eg. `<entry>...</entry>`)
      are no longer assumed to be in the Atom namespace (like
      `<entry xmlns="http://www.w3.org/2005/Atom">...</entry>`), but they
      never should have been in the first place.
      
      Default namespaces / unprefixed tags are a footgun because it's too
      easy to add tags in the default namespace without noticing,
      or use the wrong namespace.
      a8e86a92
  8. Sep 28, 2020
Loading