Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pluggable system for generating types from docstrings #2240

Closed
wants to merge 8 commits into from

Conversation

chadrik
Copy link
Contributor

@chadrik chadrik commented Oct 11, 2016

This is a rough draft of a system that would allow third-party tools to extend mypy with the ability to parse PEP484 type annotations within function docstrings.

The idea is that these changes would eventually be extensible via a plugin system (issue #1240), but for now, one can use them by monkey-patching mypy.docstrings.parse_docstring prior to calling mypy.main.main from within their own scripts:

import mypy.docstrings
mypy.docstrings.parse_docstring = my_parse_docstring

if __name__ == '__main__':
    from mypy.main import main
    main(None)

Before I invest more time in this, I'm interested to know if the general idea is acceptable.

Related to issue #1840.

@JukkaL
Copy link
Collaborator

JukkaL commented Oct 11, 2016

I'm not really excited about having an API based on monkey patching. However, if we explicitly warn about this being an experimental and temporary hack, this would not be very likely to cause too much trouble, unless Guido is philosophically against it :-)

This is probably better than a command line argument, since those are more visible and we already have too many of them. Another alternative would a semi-hidden config file option, but that would likely require dynamic loading of code, which might easily be somewhat tricky to get right.

Pluses of this approach:

  • Simple code, easy to review
  • Doesn't seem too fragile
  • Should be (just) powerful enough to unblock the docstring use case with minimal impact to mypy

Negatives:

  • Need a custom mypy entry point script
  • Not composable with potential future monkey patching based APIs (may need an entry point script for every set of monkey patches) -- not suitable as a general extension mechanism
  • Generally ugly

Copy link
Member

@gvanrossum gvanrossum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still think that parsing types out of docstrings is an ill-fated project, but if you insist, this patch is pretty minimal, and lays the responsibility squarely where it belongs (i.e. not the mypy developers).

I would rephrase it as using a hook rather than a monkey-patch though, and add an interface you have to use to register the hook.

Finally, if you do proceed with this, it needs unittests.


Returns a 2-tuple: dictionary of arg name to Type, and return Type.
"""
return None, None
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add the proper Optional to the return type (it's not enforced yet, but it will be in the future, and it's needed here).

@@ -1,4 +1,5 @@
from functools import wraps
from inspect import cleandoc
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer "import inspect" here. So the code that uses it is clear about the origin of the cleandoc() function.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will do. I noticed that nearly all functions in these modules were imported using from x import y so I was just mimicking the prevailing style, but glad to change it.

Parse a docstring and return type representations. This function can
be overridden by third-party tools which aim to add typing via docstrings.

Returns a 2-tuple: dictionary of arg name to Type, and return Type.
Copy link
Member

@gvanrossum gvanrossum Oct 11, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if Type is the right thing to return here. Turning a string representing a type as found in the source code into a Type object is a fairly complicated process, and we already have code for that (the code that parses type comments). Maybe this should just return strings?

As a nit, I'd use the same convention for the return type as used in __annotations__ and a few of inspect's APIs -- just use a key "return" since that can't be an argument name anyway. (OTOH maybe the structure used here is more compatible with mypy's internal conventions, so I don't insist on this.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if Type is the right thing to return here. Turning a string representing a type as found in the source code into a Type object is a fairly complicated process, and we already have code for that (the code that parses type comments).

I'm using mypy.parsetype.parse_str_as_type to handle the conversion within my little experimental project, so it's pretty straight-forward. Returning strings means that mypy will have to have more of an opinion about how to deal with error handling and such, but it would avoid the need for the parse_docstring function to take a line number (which is there explicitly for passing to parse_str_as_type). I'll look into whether I can return Dict[str, str] while still maintaining the current functionality in my project.

As a nit, I'd use the same convention for the return type as used in annotations and a few of inspect's APIs -- just use a key "return" since that can't be an argument name anyway.

Yeah, I like that. It's a simpler structure to document.

@@ -5,6 +5,7 @@
"""

import re
from inspect import cleandoc
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This version of the parser is doomed, we'll remove it as soon as we can use the fast parser on all platforms. Maybe you could skip changing this file and just state that your feature depends on --fast-parser?


def make_callable(args: List[Argument], type_map: Dict[str, Type],
ret_type: Type) -> CallableType:
if type_map is not None:
Copy link
Member

@gvanrossum gvanrossum Oct 11, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this function in this file? It seems to be used only by parse.py so it really belongs there, unless you also plan to monkey-patch this.

There's a missing Optional in the signature. But it almost feels like the caller should check whether type_map is None and not even call this.

(Also, why is this only needed by parser.py and not by fastparse*.py?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this function in this file?

I was trying to keep the docstring-related footprint in the parser modules to a minimum, and I thought it could also be useful to override, although in my personal experiments I haven't needed it.

why is this only needed by parse.py and not by fastparse*.py?

Conceptually these docstring-annotations are alternatives to comment-annotations, so I was able to minimize changes to code by using the same structures that would have delivered comment-annotations to instead deliver docstring-annotations. From the perspective of the calling code it's the same thing. In parse.py comment-annotations are expected to provide a "kind" for each argument (and there's a bit of code to ensure that the kinds are compatible with those parsed from the actual function signature) whereas fastparse always uses the kind from the function signature. I guess the authors of parse.py found it convenient to use an existing object -- CallableType -- to hold the extra data.

If we ditch parse.py support then it's a moot point.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, it's because you're emulating the result of parse_type_comment() which in turn calls parse_signature() which returns a Callable.

So here's another exhortation to return strings and let mypy parse the strings. Strings are what you have in your docstring and strings provide a much more stable API between an app (e.g. mypy) and an extension (your code for parsing docstrings) than Type objects. If we change the signature of Callable or we change the meaning of Type, your extension will break, and we can't make any promises that such internals won't change (not until we've had much more experience with extensions). But strings are pretty stable and you can always construct strings. So I'm saying you should just return a string that's got the same format as a type comment, i.e. # type: (blah, blah) -> blah (but without the # type: prefix).

doc = cleandoc(doc.decode('unicode_escape'))
type_map, rtype = docstrings.parse_docstring(doc, n.lineno)
if type_map is not None:
arg_types = [type_map.get(name) for name in arg_names]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd assert that this has the right length, otherwise you may get crazy crashes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The length of arg_types comes indirectly, via arg_names, from the from the ast-parsed args so it should always be correct:

        args = self.transform_args(n.args, n.lineno)
        arg_kinds = [arg.kind for arg in args]
        arg_names = [arg.variable.name() for arg in args]

Although this reminds me that I did want to have something that checks that all types in the type map were used exactly once.

cur = self.current()
if type is None and isinstance(cur, StrLit):
type_map, ret_type = docstrings.parse_docstring(
cleandoc(cur.parsed()), cur.line)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just realized that you call cleandoc() in all parse_docstring() calls. So move that into parse_docstring().

@chadrik
Copy link
Contributor Author

chadrik commented Oct 15, 2016

  • Removed parse.py support
  • Allowed the docstring parsing hook to return strings instead of Type
  • The type mapping uses a 'return' key for return values, so there's no need for a 2-tuple result
  • cleandoc is used as inspect.cleandoc
  • Added a basic registration system for hooks

I went out on a limb and made a general purpose mypy.hooks instead of mypy.docstrings. The helper function parse_docstring, which processes the results returned by the hook for use by the parser, now lives in fastparse because I don't think there's enough for a dedicated docstrings module. That means that import inspect remains in fastparse.

@gvanrossum
Copy link
Member

DO you need help debugging the CI failures?


docstring_parser = hooks.get_docstring_parser()
if docstring_parser is not None:
type_map = docstring_parser(inspect.cleandoc(docstring), line)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still believe calling cleandoc() is up to the custom parser. They may have some other approach to parsing the contents of the docstring.

@@ -15,6 +15,7 @@
two in a typesafe way.
"""
from functools import wraps
from inspect import cleandoc
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No longer used.



def get_docstring_parser() -> Optional[docstring_parser_type]:
return hooks.get('docstring_parser')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The get and set functions look an afwul lot like Java-style accessor methods. If you want type-safety it's probably better to define a Hooks class whose instance variables are the known hooks.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To clarify, are you looking for a single registry for all hooks, like this:

class Hooks:
    # The docstring_parser hook must take a docstring for a function [...etc...]
    docstring_parser = None  # type: Callable[[str], Optional[Dict[str, str]]]

    # another explanation...
    future_hook = None  # type: Callable[whatever]

registry = Hooks()

Where the end user would then override the attribute:

import mypy.hooks
mypy.hooks.registry.docstring_parser = my_parser

If not, can you clarify a bit more.


hooks = {} # type: Dict[str, Callable]

docstring_parser_type = Callable[[str, int], Optional[Dict[str, Union[str, Type]]]]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A type alias like this should have a CapWords name. I would also prefer this to always return a string.

@gvanrossum
Copy link
Member

gvanrossum commented Oct 19, 2016 via email

@chadrik
Copy link
Contributor Author

chadrik commented Oct 19, 2016

Addressed the latest round of notes and hopefully fixed the CI issues. Unittests are forthcoming.

@chadrik
Copy link
Contributor Author

chadrik commented Oct 19, 2016

The current issue with CI is that the stubs for typed_ast.ast27.get_docstring are incorrect:

def get_docstring(node: AST, clean: bool = ...) -> str: ...

The actual result is bytes. The same function returns str in python 3.x. Should this be fixed in typed_ast or the stubs?

@gvanrossum
Copy link
Member

@ddfisher can you look at the ast27 stub issue above?

@ddfisher
Copy link
Collaborator

Went and double-checked the str return values in typed_ast; made this PR: python/typeshed#624.

@@ -22,6 +22,8 @@
)
from mypy import defaults
from mypy import experiments
from mypy import hooks
from mypy.parsetype import parse_str_as_type
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This calls into the old parser, which we want to avoid doing from the fast parser, as it'll eventually replace the old one. You can use the fast parser to do this instead: see TypeConverter below.

@rowillia
Copy link
Contributor

rowillia commented Nov 3, 2016

@chadrik I'm slated to implement something similar, and I'm wondering if I can convince you to take a slightly different approach so we can join forces 😄 .

We've got big chunks of code using the rst format described here - https://www.jetbrains.com/help/pycharm/2016.1/type-hinting-in-pycharm.html#legacy . Our plan was to write a parser that would convert these Docstring comments into PEP 484 compliant comments so they'll light up in all tools who parse PEP 484 types, not just MyPy. This will of course require a big codemod, but it should be fairly mechanical.

Would that approach work for you?

@chadrik
Copy link
Contributor Author

chadrik commented Nov 3, 2016

@rowillia the tool that I am writing can parse rst, google, and numpy formatted docstrings, extract the type annotations from them, and return them to mypy. Regardless of which markup/formatting is used for the docstrings, the types contained therein must be valid PEP484 strings.

Our plan was to write a parser that would convert these Docstring comments into PEP 484 compliant comments so they'll light up in all tools who parse PEP 484 types, not just MyPy.

After your converted files are saved, they should be fully compatible with the "plugin" that I'm writing.

I would actually really love to use such a parser since we have a similar problem, except our docstrings are using numpy-formatting. It would be great if your parser could be extended, possibly by me, to add support for other formats. Just something to keep in mind when you're designing it. Oh, another thing to remember, your parser will need to insert from typing import <insert types here> at the top of each converted file.

@chadrik
Copy link
Contributor Author

chadrik commented Nov 3, 2016

New tests are coming soon.

@gvanrossum
Copy link
Member

In response to the discussion going on here, there's a fledgling "2to3 fixer" in the mypy repo under misc/fix_annotate.py. It currently just looks at the arguments and inserts enough Any values to match those (less self or cls) but might be a nice starting point for something like this?

@chadrik
Copy link
Contributor Author

chadrik commented Nov 3, 2016

That looks very useful, thanks. In fact, it probably makes sense to connect this tool up to mypy.hooks.docstring_parser. I imagine that would be incredibly useful for just about anyone who would consider using this.

It's unclear how this is supposed to be used. Is it a plugin for lib2to3?

@gvanrossum
Copy link
Member

Hey @chadrik, are you still planning to work on this? You promised new tests.

(In response to your question about misc/fix_annotate.py, that's indeed a lib2to3 plugin. But I don't think the answer to this question should be needed to unblock this PR.)

@chadrik
Copy link
Contributor Author

chadrik commented Dec 16, 2016

Hi @gvanrossum, yes I'm still planning to do this. My free time has been devoured by offspring :)
I just have to dig into the test suite and learn how to add a new fixture.

@chadrik
Copy link
Contributor Author

chadrik commented Feb 4, 2017

I had a look at writing the tests today.

I'd like to propose that we add a way to install a hook from the command-line and/or env var. It not only makes this feature easier to use, it also makes it far easier to test. In its current state, I can't use a .test fixture because the only way to install a hook is by writing your own entry-point script to do the monkey-patching.

Suggested use would be:

mypy --hook=docstring_parser:module.submodule.func1 mypy --hook=future_hook:module.submodule.func2

Alternate flag name ideas: --add-hook, --with-hook

@gvanrossum
Copy link
Member

That's a good point. Perhaps a config file option makes more sense though? (Or both, if you really think a command-line flag has a use case that's not covered by the config file.)

@chadrik
Copy link
Contributor Author

chadrik commented Feb 5, 2017

Yes, a config file option would be better. I'll put something together and get this rebased against master.

@gvanrossum
Copy link
Member

Mind if we just close this? I'm no fan of keeping lingering old PRs open that have requested changes and merge conflicts. This one is now over half a year old and repeated promises to continue to work on it have not born fruit. If you find the time to work on this again, just open a new PR, we'll look at it with fresh eyes!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants