-
-
Notifications
You must be signed in to change notification settings - Fork 2.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pluggable system for generating types from docstrings #2240
Conversation
I'm not really excited about having an API based on monkey patching. However, if we explicitly warn about this being an experimental and temporary hack, this would not be very likely to cause too much trouble, unless Guido is philosophically against it :-) This is probably better than a command line argument, since those are more visible and we already have too many of them. Another alternative would a semi-hidden config file option, but that would likely require dynamic loading of code, which might easily be somewhat tricky to get right. Pluses of this approach:
Negatives:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still think that parsing types out of docstrings is an ill-fated project, but if you insist, this patch is pretty minimal, and lays the responsibility squarely where it belongs (i.e. not the mypy developers).
I would rephrase it as using a hook rather than a monkey-patch though, and add an interface you have to use to register the hook.
Finally, if you do proceed with this, it needs unittests.
|
||
Returns a 2-tuple: dictionary of arg name to Type, and return Type. | ||
""" | ||
return None, None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add the proper Optional to the return type (it's not enforced yet, but it will be in the future, and it's needed here).
@@ -1,4 +1,5 @@ | |||
from functools import wraps | |||
from inspect import cleandoc |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer "import inspect" here. So the code that uses it is clear about the origin of the cleandoc() function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will do. I noticed that nearly all functions in these modules were imported using from x import y
so I was just mimicking the prevailing style, but glad to change it.
Parse a docstring and return type representations. This function can | ||
be overridden by third-party tools which aim to add typing via docstrings. | ||
|
||
Returns a 2-tuple: dictionary of arg name to Type, and return Type. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if Type is the right thing to return here. Turning a string representing a type as found in the source code into a Type object is a fairly complicated process, and we already have code for that (the code that parses type comments). Maybe this should just return strings?
As a nit, I'd use the same convention for the return type as used in __annotations__
and a few of inspect's APIs -- just use a key "return" since that can't be an argument name anyway. (OTOH maybe the structure used here is more compatible with mypy's internal conventions, so I don't insist on this.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if Type is the right thing to return here. Turning a string representing a type as found in the source code into a Type object is a fairly complicated process, and we already have code for that (the code that parses type comments).
I'm using mypy.parsetype.parse_str_as_type
to handle the conversion within my little experimental project, so it's pretty straight-forward. Returning strings means that mypy will have to have more of an opinion about how to deal with error handling and such, but it would avoid the need for the parse_docstring
function to take a line number (which is there explicitly for passing to parse_str_as_type
). I'll look into whether I can return Dict[str, str]
while still maintaining the current functionality in my project.
As a nit, I'd use the same convention for the return type as used in annotations and a few of inspect's APIs -- just use a key "return" since that can't be an argument name anyway.
Yeah, I like that. It's a simpler structure to document.
@@ -5,6 +5,7 @@ | |||
""" | |||
|
|||
import re | |||
from inspect import cleandoc |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This version of the parser is doomed, we'll remove it as soon as we can use the fast parser on all platforms. Maybe you could skip changing this file and just state that your feature depends on --fast-parser
?
|
||
def make_callable(args: List[Argument], type_map: Dict[str, Type], | ||
ret_type: Type) -> CallableType: | ||
if type_map is not None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this function in this file? It seems to be used only by parse.py so it really belongs there, unless you also plan to monkey-patch this.
There's a missing Optional in the signature. But it almost feels like the caller should check whether type_map is None and not even call this.
(Also, why is this only needed by parser.py and not by fastparse*.py?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this function in this file?
I was trying to keep the docstring-related footprint in the parser modules to a minimum, and I thought it could also be useful to override, although in my personal experiments I haven't needed it.
why is this only needed by parse.py and not by fastparse*.py?
Conceptually these docstring-annotations are alternatives to comment-annotations, so I was able to minimize changes to code by using the same structures that would have delivered comment-annotations to instead deliver docstring-annotations. From the perspective of the calling code it's the same thing. In parse.py comment-annotations are expected to provide a "kind" for each argument (and there's a bit of code to ensure that the kinds are compatible with those parsed from the actual function signature) whereas fastparse always uses the kind from the function signature. I guess the authors of parse.py found it convenient to use an existing object -- CallableType
-- to hold the extra data.
If we ditch parse.py support then it's a moot point.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, it's because you're emulating the result of parse_type_comment() which in turn calls parse_signature() which returns a Callable.
So here's another exhortation to return strings and let mypy parse the strings. Strings are what you have in your docstring and strings provide a much more stable API between an app (e.g. mypy) and an extension (your code for parsing docstrings) than Type objects. If we change the signature of Callable or we change the meaning of Type, your extension will break, and we can't make any promises that such internals won't change (not until we've had much more experience with extensions). But strings are pretty stable and you can always construct strings. So I'm saying you should just return a string that's got the same format as a type comment, i.e. # type: (blah, blah) -> blah
(but without the # type:
prefix).
doc = cleandoc(doc.decode('unicode_escape')) | ||
type_map, rtype = docstrings.parse_docstring(doc, n.lineno) | ||
if type_map is not None: | ||
arg_types = [type_map.get(name) for name in arg_names] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd assert that this has the right length, otherwise you may get crazy crashes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The length of arg_types
comes indirectly, via arg_names
, from the from the ast-parsed args so it should always be correct:
args = self.transform_args(n.args, n.lineno)
arg_kinds = [arg.kind for arg in args]
arg_names = [arg.variable.name() for arg in args]
Although this reminds me that I did want to have something that checks that all types in the type map were used exactly once.
cur = self.current() | ||
if type is None and isinstance(cur, StrLit): | ||
type_map, ret_type = docstrings.parse_docstring( | ||
cleandoc(cur.parsed()), cur.line) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just realized that you call cleandoc() in all parse_docstring() calls. So move that into parse_docstring().
I went out on a limb and made a general purpose |
DO you need help debugging the CI failures? |
|
||
docstring_parser = hooks.get_docstring_parser() | ||
if docstring_parser is not None: | ||
type_map = docstring_parser(inspect.cleandoc(docstring), line) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still believe calling cleandoc() is up to the custom parser. They may have some other approach to parsing the contents of the docstring.
@@ -15,6 +15,7 @@ | |||
two in a typesafe way. | |||
""" | |||
from functools import wraps | |||
from inspect import cleandoc |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No longer used.
|
||
|
||
def get_docstring_parser() -> Optional[docstring_parser_type]: | ||
return hooks.get('docstring_parser') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The get and set functions look an afwul lot like Java-style accessor methods. If you want type-safety it's probably better to define a Hooks class whose instance variables are the known hooks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To clarify, are you looking for a single registry for all hooks, like this:
class Hooks:
# The docstring_parser hook must take a docstring for a function [...etc...]
docstring_parser = None # type: Callable[[str], Optional[Dict[str, str]]]
# another explanation...
future_hook = None # type: Callable[whatever]
registry = Hooks()
Where the end user would then override the attribute:
import mypy.hooks
mypy.hooks.registry.docstring_parser = my_parser
If not, can you clarify a bit more.
|
||
hooks = {} # type: Dict[str, Callable] | ||
|
||
docstring_parser_type = Callable[[str, int], Optional[Dict[str, Union[str, Type]]]] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A type alias like this should have a CapWords name. I would also prefer this to always return a string.
Yes, that's what I meant. Thinking more about it, you could also make these
just module attributes. It's a little different from the original
monkey-patching approach because it's a variable declared to be a Callable
of a specific type, not just a function definition (to mypy these are very
different). (In fact module attributes may be better because mypy sometimes
confuses class variables with a Callable type with methods.
|
Addressed the latest round of notes and hopefully fixed the CI issues. Unittests are forthcoming. |
The current issue with CI is that the stubs for def get_docstring(node: AST, clean: bool = ...) -> str: ... The actual result is |
@ddfisher can you look at the ast27 stub issue above? |
Went and double-checked the |
@@ -22,6 +22,8 @@ | |||
) | |||
from mypy import defaults | |||
from mypy import experiments | |||
from mypy import hooks | |||
from mypy.parsetype import parse_str_as_type |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This calls into the old parser, which we want to avoid doing from the fast parser, as it'll eventually replace the old one. You can use the fast parser to do this instead: see TypeConverter
below.
@chadrik I'm slated to implement something similar, and I'm wondering if I can convince you to take a slightly different approach so we can join forces 😄 . We've got big chunks of code using the rst format described here - https://www.jetbrains.com/help/pycharm/2016.1/type-hinting-in-pycharm.html#legacy . Our plan was to write a parser that would convert these Docstring comments into PEP 484 compliant comments so they'll light up in all tools who parse PEP 484 types, not just MyPy. This will of course require a big codemod, but it should be fairly mechanical. Would that approach work for you? |
@rowillia the tool that I am writing can parse rst, google, and numpy formatted docstrings, extract the type annotations from them, and return them to mypy. Regardless of which markup/formatting is used for the docstrings, the types contained therein must be valid PEP484 strings.
After your converted files are saved, they should be fully compatible with the "plugin" that I'm writing. I would actually really love to use such a parser since we have a similar problem, except our docstrings are using numpy-formatting. It would be great if your parser could be extended, possibly by me, to add support for other formats. Just something to keep in mind when you're designing it. Oh, another thing to remember, your parser will need to insert |
- simplify and improve registration of mypy.hooks - do not call cleandoc (that's up to the hook user) - doc parser hooks must return strings and not Type
New tests are coming soon. |
In response to the discussion going on here, there's a fledgling "2to3 fixer" in the mypy repo under misc/fix_annotate.py. It currently just looks at the arguments and inserts enough |
That looks very useful, thanks. In fact, it probably makes sense to connect this tool up to It's unclear how this is supposed to be used. Is it a plugin for |
Hey @chadrik, are you still planning to work on this? You promised new tests. (In response to your question about misc/fix_annotate.py, that's indeed a lib2to3 plugin. But I don't think the answer to this question should be needed to unblock this PR.) |
Hi @gvanrossum, yes I'm still planning to do this. My free time has been devoured by offspring :) |
I had a look at writing the tests today. I'd like to propose that we add a way to install a hook from the command-line and/or env var. It not only makes this feature easier to use, it also makes it far easier to test. In its current state, I can't use a .test fixture because the only way to install a hook is by writing your own entry-point script to do the monkey-patching. Suggested use would be:
Alternate flag name ideas: |
That's a good point. Perhaps a config file option makes more sense though? (Or both, if you really think a command-line flag has a use case that's not covered by the config file.) |
Yes, a config file option would be better. I'll put something together and get this rebased against master. |
Mind if we just close this? I'm no fan of keeping lingering old PRs open that have requested changes and merge conflicts. This one is now over half a year old and repeated promises to continue to work on it have not born fruit. If you find the time to work on this again, just open a new PR, we'll look at it with fresh eyes! |
This is a rough draft of a system that would allow third-party tools to extend mypy with the ability to parse PEP484 type annotations within function docstrings.
The idea is that these changes would eventually be extensible via a plugin system (issue #1240), but for now, one can use them by monkey-patching
mypy.docstrings.parse_docstring
prior to callingmypy.main.main
from within their own scripts:Before I invest more time in this, I'm interested to know if the general idea is acceptable.
Related to issue #1840.