Added google style docstr support + numpy docstr fixes #796

Erotemic · 2016-11-07T01:04:55Z

The purpose of this PR is to address issue #504 properly and improve upon the original attempt submitted in PR #617. It took a bit more time than expected, but I think I have everything working and tested.

In addition to enhancing jedi with ability to and return values from google style docstrings, I also included several fixes for numpy style docstrings.

A summary of what is changed is as follows:

* [ENH] jedi can now extract type hints for parameters and return values from google style docstrings
* [ENH] added tests for new google style parsing
* [ENH] added doctest examples to several functions in jedi/evaluate/docstrings.py
* [FIX] numpy return type hints are now correctly parsed and used
* [FIX] jedi can correctly extracts type hints for alternative-type parameters from numpy docstrings
* [FIX] fixed an exception in python2.7 for numpy set-of-values parameters.

Here is a breif description of how the new google style parsing was implemented:

To add support for the google style docstrings I attempted to mirror what was happening for numpy style docstrings. I created a file called jedi/evaluate/docscrape_google.py to work in a simlar way to numpydoc.docscrape. The main work in this file is done by split_google_docblocks which separates the docstring into a list of sections represented as a list of tuples. Each tuple is a pair of section names and unparsed section lines. This split is made primarily using indentation information and a list of known google-style tags. Header aliases are taken into account so Arguments and Args section headers all resolve back to Args in the returned list.

The publicly facing functions are parse_google_args and parse_google_returns which get the appropriate Args / Returns / Yields section and then calls either parse_google_retblock or parse_google_argblock to grab a list of type hints represented as dictionaries. Back in jedi/evaluate/docstrings.py I use this new functionality to augment the behavior or find_return_types and _search_param_in_docstr. These functions will now try to use google style docstrings if parsing Epydoc/Sphinx docstrings fails.

When first looking into jedi/evaluate/docstrings.py I was having a hard time getting a feel for the information flow of the module. My style of coding is very integrated with an IPython terminal. I ended up creating several example doctests just as a byproduct of needing to define valid input arguments in IPython.

In regards to the numpy docstring changes:

When running tests in test/test_evaluate/test_docstring.py I noticed that test_numpydoc_alternative_types and test_numpydoc_docstring_set_of_values were failing. I tracked the issue down to _search_param_in_numpydocstr. One test was failing because support for alternative types was just not there, so I added that. The other test was failing because of a python2.7 issue where ast.literal_eval raises a ValueError: malformed string when trying to parse set literals. This feature is just not supported in python2.7. My fix for this was to replace curly brakets with square brakets in python2. I also ended up separating both of these functionalies into a new function called _expand_typestr so I could re-use functionality for the google docstring implementations.

Also, when in this file I also noticed that find_return_types was not making any use of numpy docs. I made a function _search_return_in_numpydocstr to fix this.

I added new test to test numpy return type hints in test_docstrings.py and ensured that the other numpy tests are no longer failing.

Overall I am happy with the current state of the PR and I feel like I'm ready to submit it for review.
Let me know if there is anything that doesn't make sense or if any changes need to be made. There are a few places that I would like to point out / would like feedback on:

In a few places I made modifications that don't actually change the behavior. The reason I did this was mostly due to debugging and needing to inspect how code was working as well as copy lines into IPython. In some of these cases I did not revert the changes. If either I or someone else needs to debug this code again it might be helpful to have those changes. If it would be better to revert back to the original state I can do this.
In parse_google_retblock there is a corner case that I'm not sure how to handle. Without making a grammar I can't figure out how to differentiate the case where only a type is specified with the case that only a description is specified. For now I'm just assuming that if this occurs the type is specified and if it is not a valid type I'm relying on this value to be thrown out further upstream. If anyone has a better way to do this / suggestions for what the grammar should look like I'm open to hearing it.
Should this change be added to the CHANGELOG.rst like I have done or is there a different way to do it?
I'll make other minor comments in the diff view where necessary

…e tests. More doctests. Google style return parser. Tests for follow args. Fixed text_type issue.

Erotemic · 2016-11-07T01:05:54Z

jedi/evaluate/docscrape_google.py

+            # retdict = {'type': type_part, 'desc': desc_part}
+            retdict = {'type': type_part}
+            retdict_list.append(retdict)
+    return retdict_list


I'm not 100% sure about the implementation of this function, but I'm also not sure how else to do it.

Erotemic · 2016-11-07T01:07:10Z

jedi/evaluate/docstrings.py

-    """)
-    if string is None:
-        return []
+    """))


This line was causing errors when code was not unicode in python2.7

Erotemic · 2016-11-07T01:08:15Z

jedi/evaluate/docstrings.py

-    return chain.from_iterable(_execute_array_values(evaluator, d) for d in definitions)
+    types_list = [_execute_array_values(evaluator, d) for d in definitions]
+    type_list = list(chain.from_iterable(types_list))
+    return type_list


this was changed to help with debugging.

Also please revert. I do the same thing a lot and then revert it after.

Erotemic · 2016-11-07T01:09:02Z

jedi/evaluate/docstrings.py

-            [p for param_str in _search_param_in_docstr(docstring, str(param.name))
-                for p in _evaluate_for_statement_string(evaluator, param_str, module)]
+            [p for string in _search_param_in_docstr(docstr, param_str)
+                for p in _evaluate_for_statement_string(evaluator, string, module)]


I changed names here to be consistent with the names used in the functions that were being called.

Erotemic · 2016-11-07T01:09:52Z

jedi/evaluate/docstrings.py

+        if not found:
+            # Check for numpy style return hint
+            found = _search_return_in_numpydocstr(docstr)
+        return found


previously numpy return types were never considered

Erotemic · 2016-11-07T01:10:02Z

jedi/evaluate/docstrings.py

        for p in DOCSTRING_RETURN_PATTERNS:
-            match = p.search(code)
+            match = p.search(docstr)


changed name for consistency

Erotemic · 2016-11-07T01:10:25Z

test/test_evaluate/test_docstring.py

 """

 from textwrap import dedent
 import jedi
 from ..helpers import unittest

 try:
-    import numpydoc
+    import numpydoc  # NOQA


suppressed a linting error

Erotemic · 2016-11-07T01:11:22Z

test/test_evaluate/test_docstring.py

+    cd $(python -c "import jedi; from os.path import dirname, abspath; print(dirname(dirname(abspath(jedi.__file__))))")
+    # Run the doctests in this module
+    tox -e py test/test_evaluate/test_docstring.py
+


Helpful command line to start running tests quicker.

I don't think you need to mention that here. What you can do is put something like this into the docs, but especially the second command is general knowledge (you can find that in every tox tutorial). It's even easier if you use pytest. The cd command is also not extremely useful, since most people tend to just go to that place (in my case $HOME/programming/jedi).

I agree with all your comments in this review except for the first part of this one (although I'll agree the cd is a bit cumbersome). Even though it might be easy to lookup a tox tutorial and figure out how to run tests and then determine how to transfer the commands learned in the tutorial to your own project, I think its immensely helpful to have a copy-pastable command line in a python source file that tells you how to run it.

I just spent some time reworking some of this PR into a new one. I don't use tox on a regular basis, so I forgot what the exact command was to run the tests. The fact that I came back here, found this note, and didn't have to context switch and wonder "how is this done again?", is a working example for why having a docstring in your test file (rather than something you have to go searching for in the docs) that tells you how to run it is a very worthwhile .

I disagree. This would mean that I would add such a string to every single test module. Generally I would also do testing with pytest and not tox, it's usually faster. just saying ;-)

Many people in the python community are familiar with pytest. It's not an obscure tool. I know that it might take some time to get used to it. I know that it takes some time to read certain documentation. However I think that's just how writing software works. It takes time. Doing things quickly mostly ends up catastrophic (checkout the first versions of Jedi).

On my personal projects, I typically do keep a string like that in every module. Usually its just a byproduct of me typing it once, I copy and paste it into the file I'm currently editing, and then it just lives there. I've found that it makes it easier to bring new people onto a project if they have a quick starting point to get the thing they want to test running. Of course py.test can do this, but I historically worked with junior coders, and py.test's syntax requires lots of typing and is not the most intuitive. I've also got an awesome <leader>a vim keybinding that copies the current line in my editor into the nearest terminal and executes it, so perhaps my opinion is just a function of my workflow.

Regardless, its your codebase and I respect your design choices. I've got a new PR up in #963 which shouldn't have any of this chaff in it.

What is complicated in pytest <path>? I cannot really imagine a simpler syntax.

I think it would be nice if you could specify a module name (given you've done something like pip install -e . without needing the path, similar to python's -m flag. I also feel like pytest path_to_module::funcname and pytest path_to_module::classname::funcnameis less intuitive than something like pytest <path_to_module> <funcname> and pytest <path_to_module> <classname>.<funcname>. Its nit-picky to be sure, but that's my feeling on it. On the other hand, it is nice that the :: means that a single test node is a single string (with no spaces).

I've never even used that. I just use filenames + -k which gives me substring matching and which is exactly what i want.

Erotemic · 2016-11-07T01:15:11Z

jedi/evaluate/docstrings.py

+    >>> from jedi.evaluate.docstrings import _search_param_in_docstr
+    >>> from jedi.evaluate.docstrings import _evaluate_for_statement_string
+    >>> from jedi.evaluate.docstrings import _execute_array_values
+    >>> from jedi.evaluate.docstrings import _execute_types_in_stmt


The extra imports make it easier to run doctests from IPython, but I can understand if the result is a bit too bulky to be kept in. Let me know if I should remove the extra imports.

I don't really get why you use them in the first place. I would just remove it all. doctests serve as documentation IMO and this is more confusing than anything.

Erotemic · 2016-11-07T01:19:15Z

jedi/evaluate/docstrings.py

-                    return list(types)
-                else:
-                    return [p_type]
+                return _expand_typestr(p_type)


Moved this logic to a new function so it can be used by google style docstrings as well.

Erotemic · 2016-11-07T01:20:13Z

jedi/evaluate/docstrings.py

@@ -1,11 +1,13 @@
 """
 Docstrings are another source of information for functions and classes.
 :mod:`jedi.evaluate.dynamic` tries to find all executions of functions, while
-the docstring parsing is much easier. There are two different types of
+the docstring parsing is much easier. There are three different types of


Actually, should this be four if we include numpy?

I think so.

Erotemic · 2016-11-07T01:20:52Z

jedi/evaluate/docstrings.py

+        # stuffed with content from a function call.
+        pseudo_cls.parent = module
+        type_list = _execute_types_in_stmt(evaluator, stmt)
+    return type_list


this was changed to help with debugging.

Please revert this. It's not more readable.

Erotemic · 2016-11-07T01:22:26Z

test/test_evaluate/test_docstring.py

+            z.''')
+        script = jedi.Script(s)
+        names = [c.name for c in script.completions()]
+        assert 'numerator' in names


Checks if google docstring return type inference is working

Erotemic · 2016-11-07T01:22:54Z

test/test_evaluate/test_docstring.py

+        assert 'isupper' in names
+        assert 'capitalize' in names
+        assert 'numerator' in names
+        assert 'append' in names


The above tests mirror existing numpy style tests

Erotemic · 2016-11-07T01:23:41Z

test/test_evaluate/test_docstring.py

+            z.''')
+        script = jedi.Script(s)
+        names = [c.name for c in script.completions()]
+        assert 'numerator' in names


Checks if numpy docstring return type inference is working (previously this was not implemented)

Erotemic · 2016-11-07T01:26:34Z

test/test_evaluate/test_docstring.py

+        evaluator = script._evaluator
+        types = docstrings.find_return_types(evaluator, func)
+        assert len(types) == 1
+        assert types[0].base.obj is builtins.int


Not sure if you want the jedi.evaluate.docstrings functions follow_param and find_return_types to be explicitly tested in this file, but it helped me a lot to have an example giving me an entry point into the code.

Erotemic · 2016-11-07T01:26:55Z

test/test_evaluate/test_docstring.py

+        evaluator = script._evaluator
+        types = docstrings.find_return_types(evaluator, func)
+        assert len(types) == 1
+        assert types[0].base.obj is builtins.int


Google style mirrors of tests for follow_param and find_return_types.

Erotemic · 2016-11-07T01:28:51Z

jedi/evaluate/docscrape_google.py

+        if type_ is not None:
+            pass
+        argdict_list.append(argdict)
+    return argdict_list


This is where each argline is parsed after it is known to belong to an Args block

Erotemic · 2016-11-07T01:29:31Z

jedi/evaluate/docscrape_google.py

+        key = tag_aliases.get(key, key)
+        groups.append((key, subblock))
+        line_offset += len(lines)
+    return groups


The above function does the main work of splitting a docstring into blocks based on google-style headers.

Erotemic · 2016-11-07T01:30:42Z

jedi/evaluate/docstrings.py

+    if found is not None:
+        return found
+
+    return []


Modified to add in call to _search_param_in_googledocstr

Erotemic · 2016-11-07T01:33:00Z

jedi/evaluate/docstrings.py

+            if m:
+                p_type = m.group(1)
+            found.extend(_expand_typestr(p_type))
+        return found


New functionality to parse for numpy return types. I had to work around what seems to be a bug in numpydoc where p_type is swapped with p_name when the return type if anonymous. This code should still work even once the bug is fixed.

Erotemic · 2016-11-07T01:34:45Z

jedi/evaluate/docstrings.py

+    # Otherwise just return the typestr wrapped in a list
+    else:
+        types = [p_type]
+    return types


This code was originally in _search_param_in_numpydocstr but I separated it out so I could re-use it. I also added the fix for the python2 ast.literal_eval problem here.

Erotemic · 2016-11-07T01:35:17Z

jedi/evaluate/docstrings.py

+            typestr = garg['type']
+            found = _expand_typestr(typestr)
+            break
+    return found


New code for getting google style param hints.

Erotemic · 2016-11-07T01:35:30Z

jedi/evaluate/docstrings.py

+    for retdict in google_rets:
+        p_type = retdict['type']
+        found.extend(_expand_typestr(p_type))
+    return list(set(found))


New code for getting google style return type hints.

davidhalter · 2016-11-11T14:34:09Z

This is a lot of code and comments. Thank you!

It might take a while until I have this one reviewed, just because of the sheer size (and lack of my own time).

bcolsen · 2017-02-14T05:32:16Z

I hope @Erotemic doesn't mind, but I updated some the docstrings.py code from this PR to merge against the current dev branch as PR: #868. I removed the google docstrings part due to my lack of experience but I hope it makes the PR easier to merge for @davidhalter. The google docstrings part can be easily applied as a new PR.

Erotemic · 2017-02-15T15:00:42Z

@bcolsen, Splitting this into separate PR into one for fixes and one for new features is probably best. Thanks for doing the work on that.

davidhalter · 2017-02-15T18:07:05Z

All: Sorry for taking so long on this one. It's not intentional, I'm just having troubles to find the time to look over this one. It's huge and I think some things could be smaller.

I'm having more time the next few weeks. I'll try to look into it.

bcolsen · 2017-02-15T22:02:09Z

I think that the docstrings.py file could be more modular so that every docstring type(e.g. sphinx, epydoc, numpy, or google) could be implemented as a "plugin" type structure. This might have some code overlap between docstring types but would be easily extendible.

However, I don't think this would be a smaller change.

davidhalter · 2017-02-19T14:38:29Z

Well, I think that plugin architecture wouldn't exactly help with the issue on hand here, that google docstrings just take quite a bit of code to parse them.

Erotemic · 2017-02-19T17:56:26Z

The google docstrings don't take that much code to parse. The docscrape_google.py file does all the parsing and is only 315 lines (about half of which is documentation and doctests).

Most of the work is either tests, fixes, or boilerplate to integrate the docscrape_google.py functionality with the way that sphinx and numpy do it. I needed to refactor a bit of code to make things reusable between all the different docstring parsers.

bcolsen · 2017-02-19T18:59:43Z

Currently PR #868(which implements @Erotemic changes to docstrings.py from this PR ) has -36 and +103 (although I should include the tests functions). Even most of these changes just take current docstrings.py code and place them in conditionals(meld shows this better than github's diff).

@davidhalter, Would it be easier for you to review this code if it's broken into smaller PR's like #868?

If so (And if @Erotemic agrees) this PR could be closed and a smaller patch could be made to implement google docstrings after #868 is accepted.

davidhalter

Please tell me ASAP if you have time in the next 1-4 weeks to actually do this. If you don't I'll merge #868 in the meantime and we can just reuse this branch for google docstrings. Depends on how much time/effort you put in.

Thanks a lot for the hard work. First of it all: I'm sorry to let you wait for such a long time. There were things that were more important for both Jedi and my real life. I'm sorry that it doesn't honor your contribution.

It's a lot of code and that has been my problem here. I didn't really have the time to properly review it. I have read a lot of the code now and it looks generally good. However, there's a few things I'd like you to improve (apart from the things I've commented on):

Some API's have changed, you need to rebase and change some code. It's not a crazy amount of changes, but a lot of things were just happy before and now it isn't.
Please remove at least the imports from the doctests if possible. I'm not a big fan of big docstrings, sorry :-)
Revert the things that I mentioned. I think if you want to make my life easier you should not have tried to mix features (numpydocs/google) with some refactorings. That's just too hard to review. Different pull requests for this would have been much easier. But now that we have it it's ok.
In google docstrings I would actually not prefer to have a strong idea of what blocks we have to parse etc. IMO there should be a function that returns a dictionary that looks like {"Args": "...", "foo": "bar\n", "Example": "..."}. From that on you can work with simple regular expressions (like the ones we already have for the other languages).
Please don't use google docstrings anywhere (except in tests).

davidhalter · 2017-04-30T18:17:06Z

jedi/evaluate/docstrings.py

+    >>> from jedi.evaluate.docstrings import _search_param_in_docstr
+    >>> from jedi.evaluate.docstrings import _evaluate_for_statement_string
+    >>> from jedi.evaluate.docstrings import _execute_array_values
+    >>> from jedi.evaluate.docstrings import _execute_types_in_stmt


I don't really get why you use them in the first place. I would just remove it all. doctests serve as documentation IMO and this is more confusing than anything.

davidhalter · 2017-04-30T18:18:45Z

jedi/evaluate/docstrings.py

+        # stuffed with content from a function call.
+        pseudo_cls.parent = module
+        type_list = _execute_types_in_stmt(evaluator, stmt)
+    return type_list


Please revert this. It's not more readable.

davidhalter · 2017-04-30T18:19:20Z

jedi/evaluate/docstrings.py

-    return chain.from_iterable(_execute_array_values(evaluator, d) for d in definitions)
+    types_list = [_execute_array_values(evaluator, d) for d in definitions]
+    type_list = list(chain.from_iterable(types_list))
+    return type_list


Also please revert. I do the same thing a lot and then revert it after.

davidhalter · 2017-04-30T18:23:24Z

jedi/evaluate/docstrings.py

        )
    func = param.parent_function
    module = param.get_parent_until()

-    types = eval_docstring(func.raw_doc)
+    docstr = func.raw_doc


raw_doc doesn't exist anymore. once you rebase you will see that. It's better now, because you can just use something like function_context.py__doc__().

davidhalter · 2017-04-30T18:28:59Z

jedi/evaluate/docstrings.py

+    return found
+
+
+def _search_return_in_gooogledocstr(docstr):


why three o's?

davidhalter · 2017-04-30T18:30:40Z

jedi/evaluate/docstrings.py

    Search `docstr` for type(-s) of `param_str`.

+    >>> from jedi.evaluate.docstrings import _search_param_in_docstr
+    >>> from jedi.evaluate.docstrings import _search_param_in_googledocstr
+    >>> from jedi.evaluate.docstrings import _search_param_in_numpydocstr


I don't think you need these imports here? You can just omit them, because we're in the same module, can't you? I might be wrong as well.

bcolsen · 2017-05-01T05:16:04Z

@Erotemic Just a reminder PR #868 is my rebase of your code to the new api. I hope it can help as a guide.

Also I think the option to go ahead with PR #868 first is still open, and then add Google docstings as a separate PR.

Erotemic · 2017-05-05T14:19:34Z

@bcolsen and @davidhalter I'm not going to have time in the next few weeks to work on this at all (I'm in the middle of finishing and defending my thesis). You can go ahead with merging #868. Once all of this is said and done, I'll try to come back and update any remaining code to add the features in a new smaller branch.

Just of note, working on this has been helpful to me outside of jedi, as it serves to execute my doctests in the minimal version of my utility library https://github.com/Erotemic/ubelt (/shameless self promotion)

davidhalter · 2017-09-09T18:30:47Z

I'm going to close this one, because #868 has been merged and quite a bit of the code here is therefore obsolete. Please open a new PR for google style docstrings.

Erotemic added 9 commits November 6, 2016 16:40

Initial commit of google style jedi parsing

21c9d07

fixed tox errors

8c4f6b7

renamed google docscrape

56e5e5b

Fixed issue with literal_eval numpy docs on python2. More google styl…

a236660

…e tests. More doctests. Google style return parser. Tests for follow args. Fixed text_type issue.

Fixed 2/3 doctest compatibility

31739ee

both google and numpy return hints should now work

14b0db3

Removed personalized code

b7eca50

Fixed helper commandline to use tox

17351b1

wip

99c0cfb

Erotemic commented Nov 7, 2016

View reviewed changes

forgot to remove unhelpful debugging line

f5413e0

Erotemic commented Nov 7, 2016

View reviewed changes

jedi/evaluate/docstrings.py

if found is not None:

return found

return []

Copy link

Author

Erotemic Nov 7, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Modified to add in call to _search_param_in_googledocstr

Erotemic commented Nov 7, 2016

View reviewed changes

Erotemic added 3 commits November 6, 2016 20:42

Fixed python2.6 dict comprehension issue

c604e31

Fixed another python 2.6 issue

24a2302

even more 26 fixes

a140121

bcolsen mentioned this pull request Dec 9, 2016

Patch Jedi to get completions for compiled objects in Numpy and Matplotlib spyder-ide/spyder#3808

Merged

bcolsen mentioned this pull request Dec 17, 2016

PR: Return types for CompiledObjects using docstrings.py #818

Closed

bcolsen mentioned this pull request Feb 14, 2017

Return types from numpydocs and compiled objects return types from docstrings #868

Merged

davidhalter requested changes Apr 30, 2017

View reviewed changes

davidhalter closed this Sep 9, 2017

Erotemic mentioned this pull request Sep 10, 2017

Added support for google-style docstrings #963

Closed

Added google style docstr support + numpy docstr fixes #796

Added google style docstr support + numpy docstr fixes #796

Conversation

Erotemic commented Nov 7, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Erotemic Nov 7, 2016 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

davidhalter commented Nov 11, 2016

bcolsen commented Feb 14, 2017

Erotemic commented Feb 15, 2017

davidhalter commented Feb 15, 2017 • edited Loading

bcolsen commented Feb 15, 2017

davidhalter commented Feb 19, 2017

Erotemic commented Feb 19, 2017

bcolsen commented Feb 19, 2017

davidhalter left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bcolsen commented May 1, 2017

Erotemic commented May 5, 2017

davidhalter commented Sep 9, 2017

Erotemic Nov 7, 2016 •

edited

Loading

davidhalter commented Feb 15, 2017 •

edited

Loading

davidhalter left a comment •

edited

Loading