Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow for autodoc to parse Markdown docstrings #228

Open
chrisjsewell opened this issue Aug 25, 2020 · 49 comments
Open

Allow for autodoc to parse Markdown docstrings #228

chrisjsewell opened this issue Aug 25, 2020 · 49 comments
Labels
enhancement New feature or request help wanted Extra attention is needed

Comments

@chrisjsewell
Copy link
Member

Originally posted by @asmeurer in #163 (comment)

This issue will be of relevance here: sphinx-doc/sphinx#8018

@chrisjsewell chrisjsewell added the enhancement New feature or request label Aug 25, 2020
@asmeurer
Copy link
Contributor

There's also the question of numpydoc, which defines its own syntax for some things like parameters. Should myst use the same syntax, but just using Markdown markup in the text? Or should it use something more markdownic?

@chrisjsewell
Copy link
Member Author

I have just added definition list syntax rendering 😄 : see https://myst-parser.readthedocs.io/en/latest/using/syntax-optional.html#definition-lists

I think this could come in handy for an autodoc extension. Something like:

# Parameters

param1
: Description of param1

param2
: Description of param2

Thats maybe more markdownic?

@asmeurer
Copy link
Contributor

I don't know. There's also the Google docstring style, which is a little different (and preferred by many people). It would probably be a good idea to get broader community feedback on these things.

@chrisjsewell
Copy link
Member Author

chrisjsewell commented Aug 25, 2020

It would probably be a good idea to get broader community feedback on these things.

Yep absolutely

But note, numpydoc and Google formats are both built around rST syntax.
A markdown extension would use markdown-it-py to initially parse the docstring, and so any format has to be compatible with it in some fashion: utilising existing syntax plugins, or writing new ones.

@Carreau
Copy link

Carreau commented Sep 8, 2020

If it matters I'm working on decoupling parsing from rendering of docstring in IPython/Jupyter ; basically saying the if you can write a parser that goes from __doc__ to some well defined data structure with the right fields/info, then IPython (and by extension Jupyter) will know how to render it properly/nicely. (This could also pull some informations out of __signature__).

So, if the raw rendering to user In IPython/Jupyter is bothering you and influencing the syntax you are choosing, this will likely become less of an issue for users.

@chrisjsewell
Copy link
Member Author

chrisjsewell commented Sep 9, 2020

Thanks @Carreau, I'll bear that in mind 😄

While you're here; I just added https://myst-parser.readthedocs.io/en/latest/using/syntax-optional.html#auto-generated-header-anchors, so that you can write e.g. [](path/to/doc.md#heading-anchor) and it will work correctly both directly on GitHub and building via sphinx.

These anchor slugs, I've found, are a bit changeable in their implementation across renderers, but generally they are converging to the GitHub "specification".

Jupyter Notebook/Lab seems to be a bit outdated in this respect (or at least the versions I tested)?
They don't lower-case or remove punctuation, etc.

I'm surprised by this, because I thought they were both generally built around markedjs at the moment (please move to markdown-it 😉), which does implement this behaviour: https://github.com/styfle/marked/blob/a41d8f9aa69a4095aedae93c6e6ee5522a588217/lib/marked.js#L1991

@john-hen
Copy link
Contributor

I'm very much interested in this feature as I've been using Markdown doc-strings for a while and would like to move from recommonmark to MyST.

By the way, it took me quite a while to get to this GitHub issue here. It would have helped if the section in the docs regarding the autodoc extension clearly stated that Markdown is not supported in doc-strings.

@choldgraf
Copy link
Member

That's a great point @John-Hennig - any interest in adding a PR to add a ```{warning} block there that also links to this issue in case folks want to give feedback?

@dmwyatt
Copy link

dmwyatt commented May 19, 2021

Originally posted by @asmeurer in #163 (comment)

This issue will be of relevance here: sphinx-doc/sphinx#8018

From the feedback autodoc issue it sounds like it might just be better to write a replacement for autodoc rather than trying to extend it?

@oricou
Copy link

oricou commented May 25, 2021

Here is a trick to have Markdown docstring with commonmark. I guess it could be done with myst_parser.

https://stackoverflow.com/questions/56062402/force-sphinx-to-interpret-markdown-in-python-docstrings-instead-of-restructuredt

Sphinx's Autodoc extension emits an event named autodoc-process-docstring every time it processes a doc-string. You can hook into that mechanism to convert the syntax from Markdown to reStructuredText.

import commonmark

def docstring(app, what, name, obj, options, lines):
    md  = '\n'.join(lines)
    ast = commonmark.Parser().parse(md)
    rst = commonmark.ReStructuredTextRenderer().render(ast)
    lines.clear()
    lines += rst.splitlines()

def setup(app):
    app.connect('autodoc-process-docstring', docstring)

@dmwyatt
Copy link

dmwyatt commented May 26, 2021

It's funny that you posted that as I made this comment on that a few hours ago.

@john-hen
Copy link
Contributor

Here is a trick to have Markdown docstring with commonmark. I guess it could be done with myst_parser.

Yes, it does work with MyST. Since my earlier comment here, I have replaced Recommonmark with MyST in my projects and, as before, I'm using Commonmark.py to render the Markdown doc-strings. I've also updated my Stackoverflow answer to reflect that and mention MyST now that Recommonmark has been deprecated.

This works great for me, actually. But all I need in doc-strings is syntax highlighting of code examples. So nothing fancy. People who want advanced features such as math rendering, cross references, or possibly NumPy style, will have to wait for native doc-string support in MyST.

@oricou
Copy link

oricou commented May 26, 2021

@John-Hennig Great, could you share your code with MyST? TIA.

@astrojuanlu
Copy link
Contributor

Today I found https://github.com/mkdocstrings/mkdocstrings, is it related to the scope of this issue?

@choldgraf
Copy link
Member

@astrojuanlu mmmm probably not, because that seems to work with the mkdocs documentation engine, not Sphinx, no? Or is it usable for Sphinx as well?

@astrojuanlu
Copy link
Contributor

Right, it's based on MkDocs - I brought it up because it could inform the format of the docstring, regardless of the implementation.

@chrisjsewell
Copy link
Member Author

If anyone is motivated to tackle this, I would say an initial step would be to implement a https://docutils.sourceforge.io/docs/ref/rst/restructuredtext.html#field-lists plugin within https://github.com/executablebooks/mdit-py-plugins.

Using this, we could implement the classic doctring structure:

def func(a):
    """Function description.

    :param a: Parameter description, but with *Markdown* syntax
    """

@chrisjsewell
Copy link
Member Author

chrisjsewell commented Dec 6, 2021

UPDATE:

With #455 implemented, it is now fully possible to use sphinx's python-domain directives in MyST 🎉 (see https://myst-parser.readthedocs.io/en/latest/syntax/optional.html#field-lists).
For example, this will be properly parsed:

```{py:function} send_message(sender, priority)

Send a message to a recipient

:param str sender: The person sending the message
:return: the message id
:rtype: int
```

The sticking point now for autodoc (and similarly for readthedocs/sphinx-autoapi#287) is that the auto directives first use Documenter sub-classes to generate source text (which is subsequently parsed), but the source text generation is currently hard-coded to RST
(see https://github.com/sphinx-doc/sphinx/blob/edd14783f3cc6222066fd63efbe28c2728617e18/sphinx/ext/autodoc/__init__.py#L299)

For example,

```{autoclass} myst_parser.docutils_renderer.DocutilsRenderer
```

Is first converted to the text

.. py:class:: DocutilsRenderer(*args, **kwds)
   :module: myst_parser.docutils_renderer

   A markdown-it-py renderer to ...

which MyST cannot parse.

Primarily you just need to overwrite some aspects of these documenters, to handle converting to MyST, something like.

class MystFunctionDocumenter(FunctionDocumenter):
     def add_directive_header(self, sig: str) -> None:
         if parser_is_rst:
            super().add_directive_header(sig)
         if parser_is_myst:
             ...

then you load them via an extension:

def setup(app: Sphinx) -> Dict[str, Any]:
    app.add_autodocumenter(MystFunctionDocumenter)

this is certainly achievable.

One final thing (as noted sphinx-doc/sphinx#8018 (comment)), is that ideally you would be able to also switch the parser, based on if your docstrings were written in RST or Markdown, i.e. it would not matter whether you called autoclass from an RST or Markdown, it would always be parsed as Markdown.

@john-hen
Copy link
Contributor

john-hen commented Dec 6, 2021

Converting the directive header may be fairly straightforward, but some of the domain directives will have a body with content that contains domain directives again. So these directives will be nested. It's quite a bit easier to do that in reST than it is in Markdown.

For example, let's say we have this module.py:

"""Doc-string of the module."""

class Class:
    """Doc-string of the class."""

    def method(self):
        """Doc-string of the method."""

We document is like so in index.rst:

.. automodule:: module
    :members:

And conf.py is simply:

extensions = ['sphinx.ext.autodoc']
import sys
sys.path.insert(0, '.')

When running sphinx-build . html -vv we see in the build log that Autodoc replaces the automodule directive with the following output:

.. py:module:: module

Doc-string of the module.


.. py:class:: Class()
   :module: module

   Doc-string of the class.


   .. py:method:: Class.method()
      :module: module

      Doc-string of the method.

It is already possible to render this with MyST:

```{py:module} module
```

Doc-string of the module.

````{py:class} Class()

Doc-string of the class.


```{py:method} Class.method()
:module: module

Doc-string of the method.
```
````

This produces the exact same HTML. But I had to put quadruple back-ticks at the outer scope to achieve the nesting. With reST, Autodoc just needs to increase the indentation level as it generates the body content of the directive line by line.

Maybe it's enough to just start with some extra back-ticks at the outer scope, for good measure. Nesting is usually not more than one level deep anyway. But the indentation also breaks the Markdown build. That's possibly an easy fix too, like override the content_indent attribute of the Documenter class. But Autodoc adds lines to the output in many different places, and often the indentation is just part of the string literal. That's where I gave up the last time I looked into this. I might give this another shot, but this could easily get quite complicated.

@chrisjsewell
Copy link
Member Author

Now integrated into the documentation 😄 https://myst-parser.readthedocs.io/en/latest/syntax/code_and_apis.html#documenting-whole-apis

@astrojuanlu
Copy link
Contributor

Are there examples of MyST docstrings in the wild? The ones I see in the docs borrow the :param style from reST, but I'd love to see more distinct MyST features being showcased.

Fantastic job everyone!

@chrisjsewell
Copy link
Member Author

The ones I see in the docs borrow the :param style from reST, but I'd love to see more distinct MyST features being showcased.

Well that is a syntax available in MyST: https://myst-parser.readthedocs.io/en/latest/syntax/optional.html#field-lists

Personally, I find that the best, most concise way, to document parameters, so I don't have any problem "borrowing" it.
Plus it potentially makes it easier for people to transition.

Did you have anything else in mind?

@hmgaudecker
Copy link

This looks great, many thanks!

In order to decide which project to try it out on: I could not find anything on whether numpy- and/or google-style docstrings should work with MyST? Apologies if I missed that.

@chrisjsewell
Copy link
Member Author

I could not find anything on whether numpy- and/or google-style docstrings should work with MyST?

You tell me 😅
I haven't tried, as far as the sphinx style works, it's just acting on "pre-parsed" AST

numpy looks like it is acting on definition lists, which are slightly different in myst, so probably not automatically, the parsing of headings also may require a "fix" in autodoc2
Maybe similar with google docstrings

@chrisjsewell
Copy link
Member Author

chrisjsewell commented Mar 1, 2023

Yeh no actually, looking at the napoleon code, it does horrible parsing of the whole docstring and turning it into rst, so that is a no go, in terms of it "just working" out the box

But I'm sure we can come up with something better 😄

@hmgaudecker
Copy link

Wow, thanks for the quick replies and research!

As far as I am concerned, it would be great to have support for Google-style, just so much more readable than the :param-style. But I'll definitely try autodoc2 in some smaller projects not requiring this.

The numpy-style probably would require some discussion on whether one wants to stick to the rst-style underlining of headers or markdown headers. I would not have a strong opinion (and may convert everything I have to Google-style, anyhow).

@chrisjsewell
Copy link
Member Author

chrisjsewell commented Mar 1, 2023

So just to explain a little

here: https://github.com/sphinx-extensions2/sphinx-autodoc2/blob/13933a5b25a780e03f227414d432420706962212/src/autodoc2/sphinx/docstring.py#L125
you have your "python object", with the docstring, and then you can parse that (to docutils nodes/AST) however you want

in autodoc+napoleon, the key point is here: https://github.com/sphinx-doc/sphinx/blob/30f347226fa4ccf49b3f7ef860fd962af5bdf4f0/sphinx/ext/napoleon/__init__.py#L320
napoleon takes the docstring from autodoc, and then mutates to a different string, before giving it back to autodoc to create the final text, which it eventually parses to AST similarly: https://github.com/sphinx-doc/sphinx/blob/30f347226fa4ccf49b3f7ef860fd962af5bdf4f0/sphinx/ext/autodoc/directive.py#L147

The problem being that both napoleon and autodoc only generate RST

@astrojuanlu
Copy link
Contributor

Oh, didn't realize :param is now MyST, thanks! I was also interested in NumPy-style docstrings

@chrisjsewell
Copy link
Member Author

When we talk about numpy/google style, I would start by asking;
would you agree that, given we now have type annotations and type checking, it is no longer good practice to put types in the docstring?
That would simplify things a little

@hmgaudecker
Copy link

I would.

@chrisjsewell
Copy link
Member Author

chrisjsewell commented Mar 1, 2023

Then, if you don't want sphinx style, I would suggest a bit of a hybrid, that would work for both rst and myst:
basically a heading followed by a field list, e.g.

for MyST

# Parameters
:x: a description
:y: a description

or for RST

Parameters
-----------
:x: a description
:y: a description

this would be very easy to parse, just with the standard rst/myst parser,
then you just run a "transform" on the AST, that finds these headings and "propogates" them down to the field list, i.e. to get back to the sphinx style

:param x: a description
:param y: a description

@rmorshea
Copy link

rmorshea commented Mar 1, 2023

That field list syntax would be perfectly acceptable for me personally coming from Google style docstrings. With that said, it would certainly be helpful for projects trying to transition to MyST if Google/Numpy styles were supported as it would require less work and receive less push back from those who might already find RST->MyST to be an uncomfortable change.

@hmgaudecker
Copy link

That field list syntax would be perfectly acceptable for me personally coming from Google style docstrings. With that said, it would certainly be helpful for projects trying to transition to MyST if Google/Numpy styles were supported as it would require less work and receive less push back from those who might already find RST->MyST to be an uncomfortable change.

Agreed. Though a converter script might do the job and ease the maintenance burden.

@chrisjsewell
Copy link
Member Author

Note something like this may be of use: https://pypi.org/project/docstring-parser/

Google/numpy are a bit weird, in that they are "pseudo rst", with effectively a bespoke "structure" with nested rst. But I guess one could parse the structure first, even with myst, then parse properly

@naquiroz
Copy link

naquiroz commented Mar 7, 2023

Is there a possible workaround (maybe using/adding other dependencies) for auto parsing docstrings that use both myst and napoleon?

@chrisjsewell
Copy link
Member Author

chrisjsewell commented Mar 10, 2023

Is there a possible workaround (maybe using/adding other dependencies) for auto parsing docstrings that use both myst and napoleon?

Oh indeed, thats what I mean by the above, you just need a "hook" in: https://github.com/sphinx-extensions2/sphinx-autodoc2/blob/13933a5b25a780e03f227414d432420706962212/src/autodoc2/sphinx/docstring.py#L125
to allow for "re-interpretation" of the docstring

@naquiroz
Copy link

That solution however requires using sphinx-autodoc2 which is not as popular as I would like, so I would rather wait. I need a more battle-tested approach.

@chrisjsewell
Copy link
Member Author

That solution however requires using sphinx-autodoc2 which is not as popular as I would like, so I would rather wait. I need a more battle-tested approach.

Ah well thats a chicken and egg 😅 I only created it a few weeks ago, and need your guys help to test/improve it, all issues/PRs welcome 🙏

@mj023
Copy link

mj023 commented Apr 20, 2023

Great Project! But we also ran into the issue of not wanting to use the Sphinx-Style when I started transitioning one of our projects to autodoc2. I was able to convert some of our Numpy-Docstrings using Pyment, but we ultimately put it on hold. I think the proposed Heading + Fieldlist style would already be enough for us.

@pawamoy
Copy link

pawamoy commented May 25, 2024

@chrisjsewell

Note something like this may be of use: https://pypi.org/project/docstring-parser/
Google/numpy are a bit weird, in that they are "pseudo rst", with effectively a bespoke "structure" with nested rst. But I guess one could parse the structure first, even with myst, then parse properly

That's the approach I took with Griffe: it parses the different styles into the same data structures/classes. Basically, it parses a docstring into a list of sections, each section having its own specific kind and contents (regular text, arguments, returns, exceptions, etc.).

It's (almost) markup agnostic: regular text sections as well as as any item description (parameter, returned value, etc.) can be written in Markdown, rST, Asciidoc, whatever the end user prefers. I wrote almost because Griffe's parsers still check for fenced code blocks (using triple-backticks) to prevent parsing of sections inside Markdown code blocks. This is not an issue for rST since they would be indented and therefore not matched.

Anyway, just a shameless plug 😄 See usage examples here: https://mkdocstrings.github.io/griffe/parsing_docstrings/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests