Pluggable system for generating types from docstrings #2240

chadrik · 2016-10-11T05:39:57Z

This is a rough draft of a system that would allow third-party tools to extend mypy with the ability to parse PEP484 type annotations within function docstrings.

The idea is that these changes would eventually be extensible via a plugin system (issue #1240), but for now, one can use them by monkey-patching mypy.docstrings.parse_docstring prior to calling mypy.main.main from within their own scripts:

import mypy.docstrings
mypy.docstrings.parse_docstring = my_parse_docstring

if __name__ == '__main__':
    from mypy.main import main
    main(None)

Before I invest more time in this, I'm interested to know if the general idea is acceptable.

Related to issue #1840.

JukkaL · 2016-10-11T12:35:23Z

I'm not really excited about having an API based on monkey patching. However, if we explicitly warn about this being an experimental and temporary hack, this would not be very likely to cause too much trouble, unless Guido is philosophically against it :-)

This is probably better than a command line argument, since those are more visible and we already have too many of them. Another alternative would a semi-hidden config file option, but that would likely require dynamic loading of code, which might easily be somewhat tricky to get right.

Pluses of this approach:

Simple code, easy to review
Doesn't seem too fragile
Should be (just) powerful enough to unblock the docstring use case with minimal impact to mypy

Negatives:

Need a custom mypy entry point script
Not composable with potential future monkey patching based APIs (may need an entry point script for every set of monkey patches) -- not suitable as a general extension mechanism
Generally ugly

gvanrossum

I still think that parsing types out of docstrings is an ill-fated project, but if you insist, this patch is pretty minimal, and lays the responsibility squarely where it belongs (i.e. not the mypy developers).

I would rephrase it as using a hook rather than a monkey-patch though, and add an interface you have to use to register the hook.

Finally, if you do proceed with this, it needs unittests.

gvanrossum · 2016-10-11T15:51:48Z

mypy/docstrings.py

+
+    Returns a 2-tuple: dictionary of arg name to Type, and return Type.
+    """
+    return None, None


Please add the proper Optional to the return type (it's not enforced yet, but it will be in the future, and it's needed here).

gvanrossum · 2016-10-11T15:55:48Z

mypy/fastparse.py

@@ -1,4 +1,5 @@
 from functools import wraps
+from inspect import cleandoc


I prefer "import inspect" here. So the code that uses it is clear about the origin of the cleandoc() function.

will do. I noticed that nearly all functions in these modules were imported using from x import y so I was just mimicking the prevailing style, but glad to change it.

gvanrossum · 2016-10-11T15:58:37Z

mypy/docstrings.py

+    Parse a docstring and return type representations.  This function can
+    be overridden by third-party tools which aim to add typing via docstrings.
+
+    Returns a 2-tuple: dictionary of arg name to Type, and return Type.


I wonder if Type is the right thing to return here. Turning a string representing a type as found in the source code into a Type object is a fairly complicated process, and we already have code for that (the code that parses type comments). Maybe this should just return strings?

As a nit, I'd use the same convention for the return type as used in __annotations__ and a few of inspect's APIs -- just use a key "return" since that can't be an argument name anyway. (OTOH maybe the structure used here is more compatible with mypy's internal conventions, so I don't insist on this.)

I wonder if Type is the right thing to return here. Turning a string representing a type as found in the source code into a Type object is a fairly complicated process, and we already have code for that (the code that parses type comments).

I'm using mypy.parsetype.parse_str_as_type to handle the conversion within my little experimental project, so it's pretty straight-forward. Returning strings means that mypy will have to have more of an opinion about how to deal with error handling and such, but it would avoid the need for the parse_docstring function to take a line number (which is there explicitly for passing to parse_str_as_type). I'll look into whether I can return Dict[str, str] while still maintaining the current functionality in my project.

As a nit, I'd use the same convention for the return type as used in annotations and a few of inspect's APIs -- just use a key "return" since that can't be an argument name anyway.

Yeah, I like that. It's a simpler structure to document.

gvanrossum · 2016-10-11T16:02:10Z

mypy/parse.py

@@ -5,6 +5,7 @@
 """

 import re
+from inspect import cleandoc


This version of the parser is doomed, we'll remove it as soon as we can use the fast parser on all platforms. Maybe you could skip changing this file and just state that your feature depends on --fast-parser?

gvanrossum · 2016-10-11T16:05:09Z

mypy/docstrings.py

+
+def make_callable(args: List[Argument], type_map: Dict[str, Type],
+                  ret_type: Type) -> CallableType:
+    if type_map is not None:


Why is this function in this file? It seems to be used only by parse.py so it really belongs there, unless you also plan to monkey-patch this.

There's a missing Optional in the signature. But it almost feels like the caller should check whether type_map is None and not even call this.

(Also, why is this only needed by parser.py and not by fastparse*.py?)

Why is this function in this file?

I was trying to keep the docstring-related footprint in the parser modules to a minimum, and I thought it could also be useful to override, although in my personal experiments I haven't needed it.

why is this only needed by parse.py and not by fastparse*.py?

Conceptually these docstring-annotations are alternatives to comment-annotations, so I was able to minimize changes to code by using the same structures that would have delivered comment-annotations to instead deliver docstring-annotations. From the perspective of the calling code it's the same thing. In parse.py comment-annotations are expected to provide a "kind" for each argument (and there's a bit of code to ensure that the kinds are compatible with those parsed from the actual function signature) whereas fastparse always uses the kind from the function signature. I guess the authors of parse.py found it convenient to use an existing object -- CallableType -- to hold the extra data.

If we ditch parse.py support then it's a moot point.

I see, it's because you're emulating the result of parse_type_comment() which in turn calls parse_signature() which returns a Callable.

So here's another exhortation to return strings and let mypy parse the strings. Strings are what you have in your docstring and strings provide a much more stable API between an app (e.g. mypy) and an extension (your code for parsing docstrings) than Type objects. If we change the signature of Callable or we change the meaning of Type, your extension will break, and we can't make any promises that such internals won't change (not until we've had much more experience with extensions). But strings are pretty stable and you can always construct strings. So I'm saying you should just return a string that's got the same format as a type comment, i.e. # type: (blah, blah) -> blah (but without the # type: prefix).

gvanrossum · 2016-10-11T16:07:55Z

mypy/fastparse.py

+                    doc = cleandoc(doc.decode('unicode_escape'))
+                    type_map, rtype = docstrings.parse_docstring(doc, n.lineno)
+                    if type_map is not None:
+                        arg_types = [type_map.get(name) for name in arg_names]


I'd assert that this has the right length, otherwise you may get crazy crashes.

The length of arg_types comes indirectly, via arg_names, from the from the ast-parsed args so it should always be correct:

args = self.transform_args(n.args, n.lineno) arg_kinds = [arg.kind for arg in args] arg_names = [arg.variable.name() for arg in args]

Although this reminds me that I did want to have something that checks that all types in the type map were used exactly once.

gvanrossum · 2016-10-11T17:58:00Z

mypy/parse.py

+                cur = self.current()
+                if type is None and isinstance(cur, StrLit):
+                    type_map, ret_type = docstrings.parse_docstring(
+                        cleandoc(cur.parsed()), cur.line)


I just realized that you call cleandoc() in all parse_docstring() calls. So move that into parse_docstring().

chadrik · 2016-10-15T02:16:38Z

Removed parse.py support
Allowed the docstring parsing hook to return strings instead of Type
The type mapping uses a 'return' key for return values, so there's no need for a 2-tuple result
cleandoc is used as inspect.cleandoc
Added a basic registration system for hooks

I went out on a limb and made a general purpose mypy.hooks instead of mypy.docstrings. The helper function parse_docstring, which processes the results returned by the hook for use by the parser, now lives in fastparse because I don't think there's enough for a dedicated docstrings module. That means that import inspect remains in fastparse.

gvanrossum · 2016-10-17T23:26:48Z

DO you need help debugging the CI failures?

gvanrossum · 2016-10-17T23:28:03Z

mypy/fastparse.py

+
+    docstring_parser = hooks.get_docstring_parser()
+    if docstring_parser is not None:
+        type_map = docstring_parser(inspect.cleandoc(docstring), line)


I still believe calling cleandoc() is up to the custom parser. They may have some other approach to parsing the contents of the docstring.

gvanrossum · 2016-10-17T23:29:39Z

mypy/fastparse2.py

@@ -15,6 +15,7 @@
 two in a typesafe way.
 """
 from functools import wraps
+from inspect import cleandoc


No longer used.

gvanrossum · 2016-10-17T23:33:16Z

mypy/hooks.py

+
+
+def get_docstring_parser() -> Optional[docstring_parser_type]:
+    return hooks.get('docstring_parser')


The get and set functions look an afwul lot like Java-style accessor methods. If you want type-safety it's probably better to define a Hooks class whose instance variables are the known hooks.

To clarify, are you looking for a single registry for all hooks, like this:

class Hooks: # The docstring_parser hook must take a docstring for a function [...etc...] docstring_parser = None # type: Callable[[str], Optional[Dict[str, str]]] # another explanation... future_hook = None # type: Callable[whatever] registry = Hooks()

Where the end user would then override the attribute:

import mypy.hooks mypy.hooks.registry.docstring_parser = my_parser

If not, can you clarify a bit more.

gvanrossum · 2016-10-17T23:34:35Z

mypy/hooks.py

+
+hooks = {}  # type: Dict[str, Callable]
+
+docstring_parser_type = Callable[[str, int], Optional[Dict[str, Union[str, Type]]]]


A type alias like this should have a CapWords name. I would also prefer this to always return a string.

gvanrossum · 2016-10-19T18:47:10Z

Yes, that's what I meant. Thinking more about it, you could also make these just module attributes. It's a little different from the original monkey-patching approach because it's a variable declared to be a Callable of a specific type, not just a function definition (to mypy these are very different). (In fact module attributes may be better because mypy sometimes confuses class variables with a Callable type with methods.

chadrik · 2016-10-19T22:26:39Z

Addressed the latest round of notes and hopefully fixed the CI issues. Unittests are forthcoming.

chadrik · 2016-10-19T22:55:45Z

The current issue with CI is that the stubs for typed_ast.ast27.get_docstring are incorrect:

def get_docstring(node: AST, clean: bool = ...) -> str: ...

The actual result is bytes. The same function returns str in python 3.x. Should this be fixed in typed_ast or the stubs?

gvanrossum · 2016-10-19T23:07:57Z

@ddfisher can you look at the ast27 stub issue above?

ddfisher · 2016-10-19T23:32:19Z

Went and double-checked the str return values in typed_ast; made this PR: python/typeshed#624.

ddfisher · 2016-10-19T23:34:16Z

mypy/fastparse.py

@@ -22,6 +22,8 @@
 )
 from mypy import defaults
 from mypy import experiments
+from mypy import hooks
+from mypy.parsetype import parse_str_as_type


This calls into the old parser, which we want to avoid doing from the fast parser, as it'll eventually replace the old one. You can use the fast parser to do this instead: see TypeConverter below.

rowillia · 2016-11-03T04:57:52Z

@chadrik I'm slated to implement something similar, and I'm wondering if I can convince you to take a slightly different approach so we can join forces 😄 .

We've got big chunks of code using the rst format described here - https://www.jetbrains.com/help/pycharm/2016.1/type-hinting-in-pycharm.html#legacy . Our plan was to write a parser that would convert these Docstring comments into PEP 484 compliant comments so they'll light up in all tools who parse PEP 484 types, not just MyPy. This will of course require a big codemod, but it should be fairly mechanical.

Would that approach work for you?

chadrik · 2016-11-03T05:45:25Z

@rowillia the tool that I am writing can parse rst, google, and numpy formatted docstrings, extract the type annotations from them, and return them to mypy. Regardless of which markup/formatting is used for the docstrings, the types contained therein must be valid PEP484 strings.

Our plan was to write a parser that would convert these Docstring comments into PEP 484 compliant comments so they'll light up in all tools who parse PEP 484 types, not just MyPy.

After your converted files are saved, they should be fully compatible with the "plugin" that I'm writing.

I would actually really love to use such a parser since we have a similar problem, except our docstrings are using numpy-formatting. It would be great if your parser could be extended, possibly by me, to add support for other formats. Just something to keep in mind when you're designing it. Oh, another thing to remember, your parser will need to insert from typing import <insert types here> at the top of each converted file.

- simplify and improve registration of mypy.hooks - do not call cleandoc (that's up to the hook user) - doc parser hooks must return strings and not Type

chadrik · 2016-11-03T06:39:42Z

New tests are coming soon.

gvanrossum · 2016-11-03T16:25:04Z

In response to the discussion going on here, there's a fledgling "2to3 fixer" in the mypy repo under misc/fix_annotate.py. It currently just looks at the arguments and inserts enough Any values to match those (less self or cls) but might be a nice starting point for something like this?

chadrik · 2016-11-03T16:57:51Z

That looks very useful, thanks. In fact, it probably makes sense to connect this tool up to mypy.hooks.docstring_parser. I imagine that would be incredibly useful for just about anyone who would consider using this.

It's unclear how this is supposed to be used. Is it a plugin for lib2to3?

gvanrossum · 2016-12-13T22:12:40Z

Hey @chadrik, are you still planning to work on this? You promised new tests.

(In response to your question about misc/fix_annotate.py, that's indeed a lib2to3 plugin. But I don't think the answer to this question should be needed to unblock this PR.)

chadrik · 2016-12-16T02:28:06Z

Hi @gvanrossum, yes I'm still planning to do this. My free time has been devoured by offspring :)
I just have to dig into the test suite and learn how to add a new fixture.

chadrik · 2017-02-04T22:20:51Z

I had a look at writing the tests today.

I'd like to propose that we add a way to install a hook from the command-line and/or env var. It not only makes this feature easier to use, it also makes it far easier to test. In its current state, I can't use a .test fixture because the only way to install a hook is by writing your own entry-point script to do the monkey-patching.

Suggested use would be:

mypy --hook=docstring_parser:module.submodule.func1 mypy --hook=future_hook:module.submodule.func2

Alternate flag name ideas: --add-hook, --with-hook

gvanrossum · 2017-02-05T19:29:14Z

That's a good point. Perhaps a config file option makes more sense though? (Or both, if you really think a command-line flag has a use case that's not covered by the config file.)

chadrik · 2017-02-05T23:20:25Z

Yes, a config file option would be better. I'll put something together and get this rebased against master.

gvanrossum · 2017-04-12T17:01:55Z

Mind if we just close this? I'm no fan of keeping lingering old PRs open that have requested changes and merge conflicts. This one is now over half a year old and repeated promises to continue to work on it have not born fruit. If you find the time to work on this again, just open a new PR, we'll look at it with fresh eyes!

gvanrossum reviewed Oct 11, 2016

View reviewed changes

gvanrossum requested changes Oct 17, 2016

View reviewed changes

ddfisher reviewed Oct 19, 2016

View reviewed changes

chadrik added 7 commits November 2, 2016 22:45

Rough draft of pluggable system for producing types from docstrings

0a47ddb

Revert changes to parse.py since it is deprecated.

89e2a12

Add basic system for registering hooks and use it for docstring parser.

d44291c

Address docparser review notes.

05c66cd

- simplify and improve registration of mypy.hooks - do not call cleandoc (that's up to the hook user) - doc parser hooks must return strings and not Type

Remove unused import

6f56a15

Remove useless cast.

6b0c77b

Use new parser to parse strings into types

775f575

chadrik force-pushed the docstrings branch from 87e654d to 775f575 Compare November 3, 2016 05:47

get tests passing.

bbd5964

gvanrossum closed this Apr 12, 2017

chadrik mentioned this pull request Apr 23, 2017

Pluggable system for generating types from docstrings (revisited) #3225

Closed

		@@ -1,4 +1,5 @@
		from functools import wraps
		from inspect import cleandoc



		def get_docstring_parser() -> Optional[docstring_parser_type]:
		return hooks.get('docstring_parser')


		hooks = {} # type: Dict[str, Callable]

		docstring_parser_type = Callable[[str, int], Optional[Dict[str, Union[str, Type]]]]

Pluggable system for generating types from docstrings #2240

Pluggable system for generating types from docstrings #2240

Conversation

chadrik commented Oct 11, 2016

JukkaL commented Oct 11, 2016

gvanrossum left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gvanrossum Oct 11, 2016 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gvanrossum Oct 11, 2016 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

chadrik commented Oct 15, 2016

gvanrossum commented Oct 17, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gvanrossum commented Oct 19, 2016 via email

chadrik commented Oct 19, 2016

chadrik commented Oct 19, 2016

gvanrossum commented Oct 19, 2016

ddfisher commented Oct 19, 2016

Choose a reason for hiding this comment

rowillia commented Nov 3, 2016

chadrik commented Nov 3, 2016 • edited Loading

chadrik commented Nov 3, 2016

gvanrossum commented Nov 3, 2016

chadrik commented Nov 3, 2016

gvanrossum commented Dec 13, 2016

chadrik commented Dec 16, 2016

chadrik commented Feb 4, 2017

gvanrossum commented Feb 5, 2017

chadrik commented Feb 5, 2017

gvanrossum commented Apr 12, 2017

gvanrossum Oct 11, 2016 •

edited

Loading

gvanrossum Oct 11, 2016 •

edited

Loading

chadrik commented Nov 3, 2016 •

edited

Loading