-
-
Notifications
You must be signed in to change notification settings - Fork 508
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added google style docstr support + numpy docstr fixes #796
Conversation
…e tests. More doctests. Google style return parser. Tests for follow args. Fixed text_type issue.
# retdict = {'type': type_part, 'desc': desc_part} | ||
retdict = {'type': type_part} | ||
retdict_list.append(retdict) | ||
return retdict_list |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not 100% sure about the implementation of this function, but I'm also not sure how else to do it.
""") | ||
if string is None: | ||
return [] | ||
""")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line was causing errors when code was not unicode in python2.7
return chain.from_iterable(_execute_array_values(evaluator, d) for d in definitions) | ||
types_list = [_execute_array_values(evaluator, d) for d in definitions] | ||
type_list = list(chain.from_iterable(types_list)) | ||
return type_list |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this was changed to help with debugging.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also please revert. I do the same thing a lot and then revert it after.
[p for param_str in _search_param_in_docstr(docstring, str(param.name)) | ||
for p in _evaluate_for_statement_string(evaluator, param_str, module)] | ||
[p for string in _search_param_in_docstr(docstr, param_str) | ||
for p in _evaluate_for_statement_string(evaluator, string, module)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed names here to be consistent with the names used in the functions that were being called.
if not found: | ||
# Check for numpy style return hint | ||
found = _search_return_in_numpydocstr(docstr) | ||
return found |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
previously numpy return types were never considered
for p in DOCSTRING_RETURN_PATTERNS: | ||
match = p.search(code) | ||
match = p.search(docstr) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changed name for consistency
""" | ||
|
||
from textwrap import dedent | ||
import jedi | ||
from ..helpers import unittest | ||
|
||
try: | ||
import numpydoc | ||
import numpydoc # NOQA |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suppressed a linting error
cd $(python -c "import jedi; from os.path import dirname, abspath; print(dirname(dirname(abspath(jedi.__file__))))") | ||
# Run the doctests in this module | ||
tox -e py test/test_evaluate/test_docstring.py | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Helpful command line to start running tests quicker.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think you need to mention that here. What you can do is put something like this into the docs, but especially the second command is general knowledge (you can find that in every tox tutorial). It's even easier if you use pytest. The cd command is also not extremely useful, since most people tend to just go to that place (in my case $HOME/programming/jedi).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with all your comments in this review except for the first part of this one (although I'll agree the cd is a bit cumbersome). Even though it might be easy to lookup a tox tutorial and figure out how to run tests and then determine how to transfer the commands learned in the tutorial to your own project, I think its immensely helpful to have a copy-pastable command line in a python source file that tells you how to run it.
I just spent some time reworking some of this PR into a new one. I don't use tox on a regular basis, so I forgot what the exact command was to run the tests. The fact that I came back here, found this note, and didn't have to context switch and wonder "how is this done again?", is a working example for why having a docstring in your test file (rather than something you have to go searching for in the docs) that tells you how to run it is a very worthwhile .
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I disagree. This would mean that I would add such a string to every single test module. Generally I would also do testing with pytest and not tox, it's usually faster. just saying ;-)
Many people in the python community are familiar with pytest. It's not an obscure tool. I know that it might take some time to get used to it. I know that it takes some time to read certain documentation. However I think that's just how writing software works. It takes time. Doing things quickly mostly ends up catastrophic (checkout the first versions of Jedi).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On my personal projects, I typically do keep a string like that in every module. Usually its just a byproduct of me typing it once, I copy and paste it into the file I'm currently editing, and then it just lives there. I've found that it makes it easier to bring new people onto a project if they have a quick starting point to get the thing they want to test running. Of course py.test can do this, but I historically worked with junior coders, and py.test's syntax requires lots of typing and is not the most intuitive. I've also got an awesome <leader>a
vim keybinding that copies the current line in my editor into the nearest terminal and executes it, so perhaps my opinion is just a function of my workflow.
Regardless, its your codebase and I respect your design choices. I've got a new PR up in #963 which shouldn't have any of this chaff in it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is complicated in pytest <path>
? I cannot really imagine a simpler syntax.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be nice if you could specify a module name (given you've done something like pip install -e .
without needing the path, similar to python's -m
flag. I also feel like pytest path_to_module::funcname
and pytest path_to_module::classname::funcname
is less intuitive than something like pytest <path_to_module> <funcname>
and pytest <path_to_module> <classname>.<funcname>
. Its nit-picky to be sure, but that's my feeling on it. On the other hand, it is nice that the ::
means that a single test node is a single string (with no spaces).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've never even used that. I just use filenames + -k
which gives me substring matching and which is exactly what i want.
>>> from jedi.evaluate.docstrings import _search_param_in_docstr | ||
>>> from jedi.evaluate.docstrings import _evaluate_for_statement_string | ||
>>> from jedi.evaluate.docstrings import _execute_array_values | ||
>>> from jedi.evaluate.docstrings import _execute_types_in_stmt |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The extra imports make it easier to run doctests from IPython, but I can understand if the result is a bit too bulky to be kept in. Let me know if I should remove the extra imports.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't really get why you use them in the first place. I would just remove it all. doctests serve as documentation IMO and this is more confusing than anything.
return list(types) | ||
else: | ||
return [p_type] | ||
return _expand_typestr(p_type) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved this logic to a new function so it can be used by google style docstrings as well.
@@ -1,11 +1,13 @@ | |||
""" | |||
Docstrings are another source of information for functions and classes. | |||
:mod:`jedi.evaluate.dynamic` tries to find all executions of functions, while | |||
the docstring parsing is much easier. There are two different types of | |||
the docstring parsing is much easier. There are three different types of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, should this be four if we include numpy?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think so.
# stuffed with content from a function call. | ||
pseudo_cls.parent = module | ||
type_list = _execute_types_in_stmt(evaluator, stmt) | ||
return type_list |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this was changed to help with debugging.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please revert this. It's not more readable.
z.''') | ||
script = jedi.Script(s) | ||
names = [c.name for c in script.completions()] | ||
assert 'numerator' in names |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Checks if google docstring return type inference is working
assert 'isupper' in names | ||
assert 'capitalize' in names | ||
assert 'numerator' in names | ||
assert 'append' in names |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The above tests mirror existing numpy style tests
z.''') | ||
script = jedi.Script(s) | ||
names = [c.name for c in script.completions()] | ||
assert 'numerator' in names |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Checks if numpy docstring return type inference is working (previously this was not implemented)
evaluator = script._evaluator | ||
types = docstrings.find_return_types(evaluator, func) | ||
assert len(types) == 1 | ||
assert types[0].base.obj is builtins.int |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if you want the jedi.evaluate.docstrings
functions follow_param
and find_return_types
to be explicitly tested in this file, but it helped me a lot to have an example giving me an entry point into the code.
evaluator = script._evaluator | ||
types = docstrings.find_return_types(evaluator, func) | ||
assert len(types) == 1 | ||
assert types[0].base.obj is builtins.int |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Google style mirrors of tests for follow_param
and find_return_types
.
if type_ is not None: | ||
pass | ||
argdict_list.append(argdict) | ||
return argdict_list |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is where each argline is parsed after it is known to belong to an Args block
key = tag_aliases.get(key, key) | ||
groups.append((key, subblock)) | ||
line_offset += len(lines) | ||
return groups |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The above function does the main work of splitting a docstring into blocks based on google-style headers.
if found is not None: | ||
return found | ||
|
||
return [] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Modified to add in call to _search_param_in_googledocstr
if m: | ||
p_type = m.group(1) | ||
found.extend(_expand_typestr(p_type)) | ||
return found |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
New functionality to parse for numpy return types. I had to work around what seems to be a bug in numpydoc
where p_type
is swapped with p_name
when the return type if anonymous. This code should still work even once the bug is fixed.
# Otherwise just return the typestr wrapped in a list | ||
else: | ||
types = [p_type] | ||
return types |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code was originally in _search_param_in_numpydocstr
but I separated it out so I could re-use it. I also added the fix for the python2 ast.literal_eval
problem here.
typestr = garg['type'] | ||
found = _expand_typestr(typestr) | ||
break | ||
return found |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
New code for getting google style param hints.
for retdict in google_rets: | ||
p_type = retdict['type'] | ||
found.extend(_expand_typestr(p_type)) | ||
return list(set(found)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
New code for getting google style return type hints.
This is a lot of code and comments. Thank you! It might take a while until I have this one reviewed, just because of the sheer size (and lack of my own time). |
I hope @Erotemic doesn't mind, but I updated some the docstrings.py code from this PR to merge against the current dev branch as PR: #868. I removed the google docstrings part due to my lack of experience but I hope it makes the PR easier to merge for @davidhalter. The google docstrings part can be easily applied as a new PR. |
@bcolsen, Splitting this into separate PR into one for fixes and one for new features is probably best. Thanks for doing the work on that. |
All: Sorry for taking so long on this one. It's not intentional, I'm just having troubles to find the time to look over this one. It's huge and I think some things could be smaller. I'm having more time the next few weeks. I'll try to look into it. |
I think that the docstrings.py file could be more modular so that every docstring type(e.g. sphinx, epydoc, numpy, or google) could be implemented as a "plugin" type structure. This might have some code overlap between docstring types but would be easily extendible. However, I don't think this would be a smaller change. |
Well, I think that plugin architecture wouldn't exactly help with the issue on hand here, that google docstrings just take quite a bit of code to parse them. |
The google docstrings don't take that much code to parse. The docscrape_google.py file does all the parsing and is only 315 lines (about half of which is documentation and doctests). Most of the work is either tests, fixes, or boilerplate to integrate the docscrape_google.py functionality with the way that sphinx and numpy do it. I needed to refactor a bit of code to make things reusable between all the different docstring parsers. |
Currently PR #868(which implements @Erotemic changes to docstrings.py from this PR ) has -36 and +103 (although I should include the tests functions). Even most of these changes just take current docstrings.py code and place them in conditionals(meld shows this better than github's diff). @davidhalter, Would it be easier for you to review this code if it's broken into smaller PR's like #868? If so (And if @Erotemic agrees) this PR could be closed and a smaller patch could be made to implement google docstrings after #868 is accepted. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please tell me ASAP if you have time in the next 1-4 weeks to actually do this. If you don't I'll merge #868 in the meantime and we can just reuse this branch for google docstrings. Depends on how much time/effort you put in.
Thanks a lot for the hard work. First of it all: I'm sorry to let you wait for such a long time. There were things that were more important for both Jedi and my real life. I'm sorry that it doesn't honor your contribution.
It's a lot of code and that has been my problem here. I didn't really have the time to properly review it. I have read a lot of the code now and it looks generally good. However, there's a few things I'd like you to improve (apart from the things I've commented on):
- Some API's have changed, you need to rebase and change some code. It's not a crazy amount of changes, but a lot of things were just happy before and now it isn't.
- Please remove at least the imports from the doctests if possible. I'm not a big fan of big docstrings, sorry :-)
- Revert the things that I mentioned. I think if you want to make my life easier you should not have tried to mix features (numpydocs/google) with some refactorings. That's just too hard to review. Different pull requests for this would have been much easier. But now that we have it it's ok.
- In google docstrings I would actually not prefer to have a strong idea of what blocks we have to parse etc. IMO there should be a function that returns a dictionary that looks like
{"Args": "...", "foo": "bar\n", "Example": "..."}
. From that on you can work with simple regular expressions (like the ones we already have for the other languages). - Please don't use google docstrings anywhere (except in tests).
>>> from jedi.evaluate.docstrings import _search_param_in_docstr | ||
>>> from jedi.evaluate.docstrings import _evaluate_for_statement_string | ||
>>> from jedi.evaluate.docstrings import _execute_array_values | ||
>>> from jedi.evaluate.docstrings import _execute_types_in_stmt |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't really get why you use them in the first place. I would just remove it all. doctests serve as documentation IMO and this is more confusing than anything.
# stuffed with content from a function call. | ||
pseudo_cls.parent = module | ||
type_list = _execute_types_in_stmt(evaluator, stmt) | ||
return type_list |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please revert this. It's not more readable.
return chain.from_iterable(_execute_array_values(evaluator, d) for d in definitions) | ||
types_list = [_execute_array_values(evaluator, d) for d in definitions] | ||
type_list = list(chain.from_iterable(types_list)) | ||
return type_list |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also please revert. I do the same thing a lot and then revert it after.
) | ||
func = param.parent_function | ||
module = param.get_parent_until() | ||
|
||
types = eval_docstring(func.raw_doc) | ||
docstr = func.raw_doc |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
raw_doc doesn't exist anymore. once you rebase you will see that. It's better now, because you can just use something like function_context.py__doc__()
.
return found | ||
|
||
|
||
def _search_return_in_gooogledocstr(docstr): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why three o's?
Search `docstr` for type(-s) of `param_str`. | ||
|
||
>>> from jedi.evaluate.docstrings import _search_param_in_docstr | ||
>>> from jedi.evaluate.docstrings import _search_param_in_googledocstr | ||
>>> from jedi.evaluate.docstrings import _search_param_in_numpydocstr |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think you need these imports here? You can just omit them, because we're in the same module, can't you? I might be wrong as well.
@bcolsen and @davidhalter I'm not going to have time in the next few weeks to work on this at all (I'm in the middle of finishing and defending my thesis). You can go ahead with merging #868. Once all of this is said and done, I'll try to come back and update any remaining code to add the features in a new smaller branch. Just of note, working on this has been helpful to me outside of jedi, as it serves to execute my doctests in the minimal version of my utility library https://github.com/Erotemic/ubelt (/shameless self promotion) |
I'm going to close this one, because #868 has been merged and quite a bit of the code here is therefore obsolete. Please open a new PR for google style docstrings. |
The purpose of this PR is to address issue #504 properly and improve upon the original attempt submitted in PR #617. It took a bit more time than expected, but I think I have everything working and tested.
In addition to enhancing jedi with ability to and return values from google style docstrings, I also included several fixes for numpy style docstrings.
A summary of what is changed is as follows:
Here is a breif description of how the new google style parsing was implemented:
To add support for the google style docstrings I attempted to mirror what was happening for numpy style docstrings. I created a file called
jedi/evaluate/docscrape_google.py
to work in a simlar way tonumpydoc.docscrape
. The main work in this file is done bysplit_google_docblocks
which separates the docstring into a list of sections represented as a list of tuples. Each tuple is a pair of section names and unparsed section lines. This split is made primarily using indentation information and a list of known google-style tags. Header aliases are taken into account so Arguments and Args section headers all resolve back to Args in the returned list.The publicly facing functions are
parse_google_args
andparse_google_returns
which get the appropriate Args / Returns / Yields section and then calls eitherparse_google_retblock
orparse_google_argblock
to grab a list of type hints represented as dictionaries. Back injedi/evaluate/docstrings.py
I use this new functionality to augment the behavior orfind_return_types
and_search_param_in_docstr
. These functions will now try to use google style docstrings if parsing Epydoc/Sphinx docstrings fails.When first looking into
jedi/evaluate/docstrings.py
I was having a hard time getting a feel for the information flow of the module. My style of coding is very integrated with an IPython terminal. I ended up creating several example doctests just as a byproduct of needing to define valid input arguments in IPython.In regards to the numpy docstring changes:
When running tests in
test/test_evaluate/test_docstring.py
I noticed thattest_numpydoc_alternative_types
andtest_numpydoc_docstring_set_of_values
were failing. I tracked the issue down to_search_param_in_numpydocstr
. One test was failing because support for alternative types was just not there, so I added that. The other test was failing because of a python2.7 issue whereast.literal_eval
raises aValueError: malformed string
when trying to parse set literals. This feature is just not supported in python2.7. My fix for this was to replace curly brakets with square brakets in python2. I also ended up separating both of these functionalies into a new function called_expand_typestr
so I could re-use functionality for the google docstring implementations.Also, when in this file I also noticed that
find_return_types
was not making any use of numpy docs. I made a function_search_return_in_numpydocstr
to fix this.I added new test to test numpy return type hints in
test_docstrings.py
and ensured that the other numpy tests are no longer failing.Overall I am happy with the current state of the PR and I feel like I'm ready to submit it for review.
Let me know if there is anything that doesn't make sense or if any changes need to be made. There are a few places that I would like to point out / would like feedback on:
In a few places I made modifications that don't actually change the behavior. The reason I did this was mostly due to debugging and needing to inspect how code was working as well as copy lines into IPython. In some of these cases I did not revert the changes. If either I or someone else needs to debug this code again it might be helpful to have those changes. If it would be better to revert back to the original state I can do this.
In
parse_google_retblock
there is a corner case that I'm not sure how to handle. Without making a grammar I can't figure out how to differentiate the case where only a type is specified with the case that only a description is specified. For now I'm just assuming that if this occurs the type is specified and if it is not a valid type I'm relying on this value to be thrown out further upstream. If anyone has a better way to do this / suggestions for what the grammar should look like I'm open to hearing it.Should this change be added to the CHANGELOG.rst like I have done or is there a different way to do it?
I'll make other minor comments in the diff view where necessary