Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added google style docstr support + numpy docstr fixes #796

Closed
wants to merge 13 commits into from

Conversation

Erotemic
Copy link

@Erotemic Erotemic commented Nov 7, 2016

The purpose of this PR is to address issue #504 properly and improve upon the original attempt submitted in PR #617. It took a bit more time than expected, but I think I have everything working and tested.

In addition to enhancing jedi with ability to and return values from google style docstrings, I also included several fixes for numpy style docstrings.

A summary of what is changed is as follows:

* [ENH] jedi can now extract type hints for parameters and return values from google style docstrings
* [ENH] added tests for new google style parsing
* [ENH] added doctest examples to several functions in jedi/evaluate/docstrings.py
* [FIX] numpy return type hints are now correctly parsed and used
* [FIX] jedi can correctly extracts type hints for alternative-type parameters from numpy docstrings
* [FIX] fixed an exception in python2.7 for numpy set-of-values parameters. 

Here is a breif description of how the new google style parsing was implemented:

To add support for the google style docstrings I attempted to mirror what was happening for numpy style docstrings. I created a file called jedi/evaluate/docscrape_google.py to work in a simlar way to numpydoc.docscrape. The main work in this file is done by split_google_docblocks which separates the docstring into a list of sections represented as a list of tuples. Each tuple is a pair of section names and unparsed section lines. This split is made primarily using indentation information and a list of known google-style tags. Header aliases are taken into account so Arguments and Args section headers all resolve back to Args in the returned list.

The publicly facing functions are parse_google_args and parse_google_returns which get the appropriate Args / Returns / Yields section and then calls either parse_google_retblock or parse_google_argblock to grab a list of type hints represented as dictionaries. Back in jedi/evaluate/docstrings.py I use this new functionality to augment the behavior or find_return_types and _search_param_in_docstr. These functions will now try to use google style docstrings if parsing Epydoc/Sphinx docstrings fails.

When first looking into jedi/evaluate/docstrings.py I was having a hard time getting a feel for the information flow of the module. My style of coding is very integrated with an IPython terminal. I ended up creating several example doctests just as a byproduct of needing to define valid input arguments in IPython.


In regards to the numpy docstring changes:

When running tests in test/test_evaluate/test_docstring.py I noticed that test_numpydoc_alternative_types and test_numpydoc_docstring_set_of_values were failing. I tracked the issue down to _search_param_in_numpydocstr. One test was failing because support for alternative types was just not there, so I added that. The other test was failing because of a python2.7 issue where ast.literal_eval raises a ValueError: malformed string when trying to parse set literals. This feature is just not supported in python2.7. My fix for this was to replace curly brakets with square brakets in python2. I also ended up separating both of these functionalies into a new function called _expand_typestr so I could re-use functionality for the google docstring implementations.

Also, when in this file I also noticed that find_return_types was not making any use of numpy docs. I made a function _search_return_in_numpydocstr to fix this.

I added new test to test numpy return type hints in test_docstrings.py and ensured that the other numpy tests are no longer failing.


Overall I am happy with the current state of the PR and I feel like I'm ready to submit it for review.
Let me know if there is anything that doesn't make sense or if any changes need to be made. There are a few places that I would like to point out / would like feedback on:

  • In a few places I made modifications that don't actually change the behavior. The reason I did this was mostly due to debugging and needing to inspect how code was working as well as copy lines into IPython. In some of these cases I did not revert the changes. If either I or someone else needs to debug this code again it might be helpful to have those changes. If it would be better to revert back to the original state I can do this.

  • In parse_google_retblock there is a corner case that I'm not sure how to handle. Without making a grammar I can't figure out how to differentiate the case where only a type is specified with the case that only a description is specified. For now I'm just assuming that if this occurs the type is specified and if it is not a valid type I'm relying on this value to be thrown out further upstream. If anyone has a better way to do this / suggestions for what the grammar should look like I'm open to hearing it.

  • Should this change be added to the CHANGELOG.rst like I have done or is there a different way to do it?

  • I'll make other minor comments in the diff view where necessary

# retdict = {'type': type_part, 'desc': desc_part}
retdict = {'type': type_part}
retdict_list.append(retdict)
return retdict_list
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not 100% sure about the implementation of this function, but I'm also not sure how else to do it.

""")
if string is None:
return []
"""))
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line was causing errors when code was not unicode in python2.7

return chain.from_iterable(_execute_array_values(evaluator, d) for d in definitions)
types_list = [_execute_array_values(evaluator, d) for d in definitions]
type_list = list(chain.from_iterable(types_list))
return type_list
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this was changed to help with debugging.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also please revert. I do the same thing a lot and then revert it after.

[p for param_str in _search_param_in_docstr(docstring, str(param.name))
for p in _evaluate_for_statement_string(evaluator, param_str, module)]
[p for string in _search_param_in_docstr(docstr, param_str)
for p in _evaluate_for_statement_string(evaluator, string, module)]
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed names here to be consistent with the names used in the functions that were being called.

if not found:
# Check for numpy style return hint
found = _search_return_in_numpydocstr(docstr)
return found
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

previously numpy return types were never considered

for p in DOCSTRING_RETURN_PATTERNS:
match = p.search(code)
match = p.search(docstr)
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed name for consistency

"""

from textwrap import dedent
import jedi
from ..helpers import unittest

try:
import numpydoc
import numpydoc # NOQA
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suppressed a linting error

cd $(python -c "import jedi; from os.path import dirname, abspath; print(dirname(dirname(abspath(jedi.__file__))))")
# Run the doctests in this module
tox -e py test/test_evaluate/test_docstring.py

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Helpful command line to start running tests quicker.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think you need to mention that here. What you can do is put something like this into the docs, but especially the second command is general knowledge (you can find that in every tox tutorial). It's even easier if you use pytest. The cd command is also not extremely useful, since most people tend to just go to that place (in my case $HOME/programming/jedi).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with all your comments in this review except for the first part of this one (although I'll agree the cd is a bit cumbersome). Even though it might be easy to lookup a tox tutorial and figure out how to run tests and then determine how to transfer the commands learned in the tutorial to your own project, I think its immensely helpful to have a copy-pastable command line in a python source file that tells you how to run it.

I just spent some time reworking some of this PR into a new one. I don't use tox on a regular basis, so I forgot what the exact command was to run the tests. The fact that I came back here, found this note, and didn't have to context switch and wonder "how is this done again?", is a working example for why having a docstring in your test file (rather than something you have to go searching for in the docs) that tells you how to run it is a very worthwhile .

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I disagree. This would mean that I would add such a string to every single test module. Generally I would also do testing with pytest and not tox, it's usually faster. just saying ;-)

Many people in the python community are familiar with pytest. It's not an obscure tool. I know that it might take some time to get used to it. I know that it takes some time to read certain documentation. However I think that's just how writing software works. It takes time. Doing things quickly mostly ends up catastrophic (checkout the first versions of Jedi).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On my personal projects, I typically do keep a string like that in every module. Usually its just a byproduct of me typing it once, I copy and paste it into the file I'm currently editing, and then it just lives there. I've found that it makes it easier to bring new people onto a project if they have a quick starting point to get the thing they want to test running. Of course py.test can do this, but I historically worked with junior coders, and py.test's syntax requires lots of typing and is not the most intuitive. I've also got an awesome <leader>a vim keybinding that copies the current line in my editor into the nearest terminal and executes it, so perhaps my opinion is just a function of my workflow.

Regardless, its your codebase and I respect your design choices. I've got a new PR up in #963 which shouldn't have any of this chaff in it.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is complicated in pytest <path>? I cannot really imagine a simpler syntax.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be nice if you could specify a module name (given you've done something like pip install -e . without needing the path, similar to python's -m flag. I also feel like pytest path_to_module::funcname and pytest path_to_module::classname::funcnameis less intuitive than something like pytest <path_to_module> <funcname> and pytest <path_to_module> <classname>.<funcname>. Its nit-picky to be sure, but that's my feeling on it. On the other hand, it is nice that the :: means that a single test node is a single string (with no spaces).

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've never even used that. I just use filenames + -k which gives me substring matching and which is exactly what i want.

>>> from jedi.evaluate.docstrings import _search_param_in_docstr
>>> from jedi.evaluate.docstrings import _evaluate_for_statement_string
>>> from jedi.evaluate.docstrings import _execute_array_values
>>> from jedi.evaluate.docstrings import _execute_types_in_stmt
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The extra imports make it easier to run doctests from IPython, but I can understand if the result is a bit too bulky to be kept in. Let me know if I should remove the extra imports.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really get why you use them in the first place. I would just remove it all. doctests serve as documentation IMO and this is more confusing than anything.

return list(types)
else:
return [p_type]
return _expand_typestr(p_type)
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved this logic to a new function so it can be used by google style docstrings as well.

@@ -1,11 +1,13 @@
"""
Docstrings are another source of information for functions and classes.
:mod:`jedi.evaluate.dynamic` tries to find all executions of functions, while
the docstring parsing is much easier. There are two different types of
the docstring parsing is much easier. There are three different types of
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, should this be four if we include numpy?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think so.

# stuffed with content from a function call.
pseudo_cls.parent = module
type_list = _execute_types_in_stmt(evaluator, stmt)
return type_list
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this was changed to help with debugging.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please revert this. It's not more readable.

z.''')
script = jedi.Script(s)
names = [c.name for c in script.completions()]
assert 'numerator' in names
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Checks if google docstring return type inference is working

assert 'isupper' in names
assert 'capitalize' in names
assert 'numerator' in names
assert 'append' in names
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The above tests mirror existing numpy style tests

z.''')
script = jedi.Script(s)
names = [c.name for c in script.completions()]
assert 'numerator' in names
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Checks if numpy docstring return type inference is working (previously this was not implemented)

evaluator = script._evaluator
types = docstrings.find_return_types(evaluator, func)
assert len(types) == 1
assert types[0].base.obj is builtins.int
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if you want the jedi.evaluate.docstrings functions follow_param and find_return_types to be explicitly tested in this file, but it helped me a lot to have an example giving me an entry point into the code.

evaluator = script._evaluator
types = docstrings.find_return_types(evaluator, func)
assert len(types) == 1
assert types[0].base.obj is builtins.int
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Google style mirrors of tests for follow_param and find_return_types.

if type_ is not None:
pass
argdict_list.append(argdict)
return argdict_list
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is where each argline is parsed after it is known to belong to an Args block

key = tag_aliases.get(key, key)
groups.append((key, subblock))
line_offset += len(lines)
return groups
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The above function does the main work of splitting a docstring into blocks based on google-style headers.

if found is not None:
return found

return []
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Modified to add in call to _search_param_in_googledocstr

if m:
p_type = m.group(1)
found.extend(_expand_typestr(p_type))
return found
Copy link
Author

@Erotemic Erotemic Nov 7, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New functionality to parse for numpy return types. I had to work around what seems to be a bug in numpydoc where p_type is swapped with p_name when the return type if anonymous. This code should still work even once the bug is fixed.

# Otherwise just return the typestr wrapped in a list
else:
types = [p_type]
return types
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code was originally in _search_param_in_numpydocstr but I separated it out so I could re-use it. I also added the fix for the python2 ast.literal_eval problem here.

typestr = garg['type']
found = _expand_typestr(typestr)
break
return found
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New code for getting google style param hints.

for retdict in google_rets:
p_type = retdict['type']
found.extend(_expand_typestr(p_type))
return list(set(found))
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New code for getting google style return type hints.

@davidhalter
Copy link
Owner

This is a lot of code and comments. Thank you!

It might take a while until I have this one reviewed, just because of the sheer size (and lack of my own time).

@bcolsen
Copy link
Contributor

bcolsen commented Feb 14, 2017

I hope @Erotemic doesn't mind, but I updated some the docstrings.py code from this PR to merge against the current dev branch as PR: #868. I removed the google docstrings part due to my lack of experience but I hope it makes the PR easier to merge for @davidhalter. The google docstrings part can be easily applied as a new PR.

@Erotemic
Copy link
Author

@bcolsen, Splitting this into separate PR into one for fixes and one for new features is probably best. Thanks for doing the work on that.

@davidhalter
Copy link
Owner

davidhalter commented Feb 15, 2017

All: Sorry for taking so long on this one. It's not intentional, I'm just having troubles to find the time to look over this one. It's huge and I think some things could be smaller.

I'm having more time the next few weeks. I'll try to look into it.

@bcolsen
Copy link
Contributor

bcolsen commented Feb 15, 2017

I think that the docstrings.py file could be more modular so that every docstring type(e.g. sphinx, epydoc, numpy, or google) could be implemented as a "plugin" type structure. This might have some code overlap between docstring types but would be easily extendible.

However, I don't think this would be a smaller change.

@davidhalter
Copy link
Owner

Well, I think that plugin architecture wouldn't exactly help with the issue on hand here, that google docstrings just take quite a bit of code to parse them.

@Erotemic
Copy link
Author

The google docstrings don't take that much code to parse. The docscrape_google.py file does all the parsing and is only 315 lines (about half of which is documentation and doctests).

Most of the work is either tests, fixes, or boilerplate to integrate the docscrape_google.py functionality with the way that sphinx and numpy do it. I needed to refactor a bit of code to make things reusable between all the different docstring parsers.

@bcolsen
Copy link
Contributor

bcolsen commented Feb 19, 2017

Currently PR #868(which implements @Erotemic changes to docstrings.py from this PR ) has -36 and +103 (although I should include the tests functions). Even most of these changes just take current docstrings.py code and place them in conditionals(meld shows this better than github's diff).

@davidhalter, Would it be easier for you to review this code if it's broken into smaller PR's like #868?

If so (And if @Erotemic agrees) this PR could be closed and a smaller patch could be made to implement google docstrings after #868 is accepted.

Copy link
Owner

@davidhalter davidhalter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please tell me ASAP if you have time in the next 1-4 weeks to actually do this. If you don't I'll merge #868 in the meantime and we can just reuse this branch for google docstrings. Depends on how much time/effort you put in.


Thanks a lot for the hard work. First of it all: I'm sorry to let you wait for such a long time. There were things that were more important for both Jedi and my real life. I'm sorry that it doesn't honor your contribution.

It's a lot of code and that has been my problem here. I didn't really have the time to properly review it. I have read a lot of the code now and it looks generally good. However, there's a few things I'd like you to improve (apart from the things I've commented on):

  • Some API's have changed, you need to rebase and change some code. It's not a crazy amount of changes, but a lot of things were just happy before and now it isn't.
  • Please remove at least the imports from the doctests if possible. I'm not a big fan of big docstrings, sorry :-)
  • Revert the things that I mentioned. I think if you want to make my life easier you should not have tried to mix features (numpydocs/google) with some refactorings. That's just too hard to review. Different pull requests for this would have been much easier. But now that we have it it's ok.
  • In google docstrings I would actually not prefer to have a strong idea of what blocks we have to parse etc. IMO there should be a function that returns a dictionary that looks like {"Args": "...", "foo": "bar\n", "Example": "..."}. From that on you can work with simple regular expressions (like the ones we already have for the other languages).
  • Please don't use google docstrings anywhere (except in tests).

>>> from jedi.evaluate.docstrings import _search_param_in_docstr
>>> from jedi.evaluate.docstrings import _evaluate_for_statement_string
>>> from jedi.evaluate.docstrings import _execute_array_values
>>> from jedi.evaluate.docstrings import _execute_types_in_stmt
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really get why you use them in the first place. I would just remove it all. doctests serve as documentation IMO and this is more confusing than anything.

# stuffed with content from a function call.
pseudo_cls.parent = module
type_list = _execute_types_in_stmt(evaluator, stmt)
return type_list
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please revert this. It's not more readable.

return chain.from_iterable(_execute_array_values(evaluator, d) for d in definitions)
types_list = [_execute_array_values(evaluator, d) for d in definitions]
type_list = list(chain.from_iterable(types_list))
return type_list
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also please revert. I do the same thing a lot and then revert it after.

)
func = param.parent_function
module = param.get_parent_until()

types = eval_docstring(func.raw_doc)
docstr = func.raw_doc
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

raw_doc doesn't exist anymore. once you rebase you will see that. It's better now, because you can just use something like function_context.py__doc__().

return found


def _search_return_in_gooogledocstr(docstr):
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why three o's?

Search `docstr` for type(-s) of `param_str`.

>>> from jedi.evaluate.docstrings import _search_param_in_docstr
>>> from jedi.evaluate.docstrings import _search_param_in_googledocstr
>>> from jedi.evaluate.docstrings import _search_param_in_numpydocstr
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think you need these imports here? You can just omit them, because we're in the same module, can't you? I might be wrong as well.

@bcolsen
Copy link
Contributor

bcolsen commented May 1, 2017

@Erotemic Just a reminder PR #868 is my rebase of your code to the new api. I hope it can help as a guide.

Also I think the option to go ahead with PR #868 first is still open, and then add Google docstings as a separate PR.

@Erotemic
Copy link
Author

Erotemic commented May 5, 2017

@bcolsen and @davidhalter I'm not going to have time in the next few weeks to work on this at all (I'm in the middle of finishing and defending my thesis). You can go ahead with merging #868. Once all of this is said and done, I'll try to come back and update any remaining code to add the features in a new smaller branch.

Just of note, working on this has been helpful to me outside of jedi, as it serves to execute my doctests in the minimal version of my utility library https://github.com/Erotemic/ubelt (/shameless self promotion)

@davidhalter
Copy link
Owner

I'm going to close this one, because #868 has been merged and quite a bit of the code here is therefore obsolete. Please open a new PR for google style docstrings.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants