Skip to content

apidoc: Stop parsing docutils trees with regexps on its pseudo-XML

Motivation:

This commit started as a simple change: I wanted to replace:

`<type> <IRI>`

with:

``<type> <IRI>``

Unfortunately, this syntax looks too much like XML for its own good, so it was stripped by the process_paragraph method, because it reads the docutils pseudo-XML representation and strips every tag it doesn't know about. (I'm saying pseudo-XML, because my poor <type> <IRI> string was not escaped with XML entities, so it was in fact undistinguishable from actual XML tags).

Changes:

Therefore, stops using the XML-like string representation of docutils trees, and visits tree nodes directly instead. Conveniently, this is already in a node visit, so we can reuse that; simply by iterating recursively instead of stopping the recursion as soon as we see a known node (ie. the visitors actually visited only nodes very close to the root).

This means that we needed to add methods to handle each node type, and produce its ReST output. And since we don't have a global view anymore, we need to return the produced ReST instead of appending directly to self.data["description"], because handlers of parent nodes may need to re-indent their children's output.o

This results in cleaner code (and also closer to what we expect from a visitor transformer), so it's a win too.

This has some other nice side-effects:

  • our custom role code is now neatly restricted in visit_problematic, so it can't overflow, because docutils runs visit_problematic with only the role's string as child
  • it detects unexpected nodes, such as the title_reference roles, which is usually produced when accidentally using single-backquotes instead of double-backquotes to wrap inline code blocks (it happens a lot when one is used to markdown)

Test Plan

I checked most of the endpoints' documentation, and it's visually identical.


Migrated from D5971 (view on Phabricator)

Merge request reports