-
-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: Add formatters for numpydoc section ordering and name/type spacing #132
ENH: Add formatters for numpydoc section ordering and name/type spacing #132
Conversation
This comment has been minimized.
This comment has been minimized.
Codecov Report
@@ Coverage Diff @@
## main #132 +/- ##
=========================================
Coverage 100.00% 100.00%
=========================================
Files 20 21 +1
Lines 490 576 +86
=========================================
+ Hits 490 576 +86
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks pretty good. I was wondering if we should detect that a docstring is in numpy style, but it can be handled in configuration. (Do not add the formatter if you don't have numpy style docstring)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't have the time for a full review right now but this is highly appreciated!
That said, I would like to handle numpy
docstrings behind a --style=
flag. I think for future maintainability that will be much easier.
Since in reality projects can use different styles at the same time --style
should probably be an append
type with the default being ["pep257"]
. For the tests in this PR we would then use --style=numpy
.
I should have probably documented this somewhere in the issue. Sorry about that!
We could also first create the PR that adds --style
to keep the size of PRs limited. I might be able to do this myself but that might take 2/3 weeks.
@DWesl I have quickly hacked together a PR to add the --style
flag as I had some spare time. See #138. After that it should be trivial to add "numpy"
to the available choices for the option.
To add numpy to the options, yes, but actually having that option do something will take a bit more work. I think I could arrange for |
Yeah I haven't thought this through completely but I was thinking of perhaps adding a Does that sound like it would work? |
The straightforward way to do that involves having the default for the formatter options depend on the |
It turns out |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
@DWesl Just want to say this is on my radar. I'm on holiday currently but I expect to get to this PR somewhere this week! 😄 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did a first review of the code associated with the adding of the new style. The formatters themselves look relatively straightforward and I didn't see anything obvious just now.
I'm really happy you worked on this as this seems like a very nice addition. Let me know if there is anything I can do to help, I'd like to merge and release this ASAP 😄
This comment has been minimized.
This comment has been minimized.
2 similar comments
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have removed all the stuff that was related to how formatters are being run and merged it in #145. That way we can focus on the numpy
stuff here. Sorry for the push to your branch, but that seemed to most effective way forward.
Let me know what you think!
This comment has been minimized.
This comment has been minimized.
1 similar comment
This comment has been minimized.
This comment has been minimized.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR is becoming quite large. If you think of any other formatters please add them in another PR.
I think this might be my last comments 😄
docs/usage.rst
Outdated
[files ...] | ||
|
||
positional arguments: | ||
files The directory or files to format. | ||
|
||
options: | ||
optional arguments: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Which interpreter version are you using locally?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cygwin CPython 3.9; should I be using 3.8 or 3.7 for this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No I think it might actually be 3.10
that is giving different results here... 😓
Anyway, let's fix this at the end of this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was hoping it might be because I did pip install -r requirements-test.txt
instead of pip install -U -r requirements-test.txt
, but adding the -U didn't seem to change anything
# Rejoin sections | ||
new_lines = [line for section in new_sections.values() for line in section] | ||
# Ensure the last line puts the quotes in the right spot | ||
# Enforces indented closing quotes on the last line |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This probably shouldn't be handled by this formatter. Can we store whether the closing quotes were already on a new line and do this according to that?
We like to keep each formatter responsible for a single thing to provide optimal customisation for users.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, new test for that. Would it help if I made all the test input files symlinks of numpydoc_style.py
, with only the changes for that formatter reflected in the corresponding .py.out
file?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That might work well in this case yeah!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Whole bunch of symlinks created and .args
files expanded. I hope git knows how to set these up on other computers; I've had problems before.
) -> OrderedDict[str, list[str]]: | ||
"""Sort the numpydoc sections into the numpydoc order.""" | ||
new_sections = OrderedDict([sections.popitem(last=False)]) | ||
new_sections.update( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't we remove L36 and just return a sorted OrderedDict
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see an OrderedDict.sort
method here. and the implementation here looks like the ordering is kept by a linked list rather than a list I can sort.
To merge line 36 with 37-44, I would need a consistent name for the summary/deprecation warning/extended summary section, which currently uses the summary as the name. I can look into that, as it would simplify other things.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see an
OrderedDict.sort
method here. and the implementation here looks like the ordering is kept by a linked list rather than a list I can sort.
👍
To merge line 36 with 37-44, I would need a consistent name for the summary/deprecation warning/extended summary section, which currently uses the summary as the name. I can look into that, as it would simplify other things.
Yeah, Summary
seems fine to me. We could add a comment somewhere that it also includes the other two.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Initial section now named "Summary" unless the initial section is zero lines, in which case the initial section shares a name with the next section and gets disappeared by the dict
constructor.
new_sections = OrderedDict([]) | ||
first_section = True | ||
for section_name, section_lines in sections.items(): | ||
if first_section: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But what if the first section isn't a summary? Could you add a test for this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it starts with
"""Parameters
----------
then there is no summary section; I think the current code would change this to
"""
Parameters
------------
To fix this further, I would need a consistent name for the summary/deprecation warning/extended summary section.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See above.
Although I think this change is good, it should probably also be its own formatter. But let's add that in a follow up PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I made it so that starting with a section header on the first line will work. Should I change a test to make sure?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I think that would be good!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Works" now, although interaction with the section ordering formatter can produce strange results:
""" Parameters
----------
...
Returns
-------
...
I think there's a formatter to strip the leading space before "Parameters" (may need two runs to finish); not sure about indenting Returns
properly.
new_sections = OrderedDict([sections.popitem(last=False)]) | ||
new_sections.update( | ||
OrderedDict( | ||
[ | ||
(sec_name, sections[sec_name]) | ||
for sec_name in sorted( | ||
sections.keys(), | ||
key=self.numpydoc_section_order.index, | ||
) | ||
] | ||
) | ||
) | ||
return new_sections |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another option, using OrderedDict
features and avoiding the KeyError
on weird sections:
new_sections = OrderedDict([sections.popitem(last=False)]) | |
new_sections.update( | |
OrderedDict( | |
[ | |
(sec_name, sections[sec_name]) | |
for sec_name in sorted( | |
sections.keys(), | |
key=self.numpydoc_section_order.index, | |
) | |
] | |
) | |
) | |
return new_sections | |
new_sections = sections.copy() | |
for sec_name in self.numpydoc_section_order: | |
try: | |
new_sections.move_to_end(sec_name) | |
except KeyError: | |
pass | |
for sec_name in new_sections.keys()[1:]: | |
if sec_name not in self.numpydoc_section_order: | |
new_sections.move_to_end(sec_name) | |
return new_sections |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Too bad there isn't a move_to_front
. But this seems to work!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's move_to_end(key, last=False)
, which should do the same thing. My main reason for ordering it this way is now moot, so I should be able to iterate through reversed(self.numpydoc_section_order)
without a problem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lots of different things going on in this PR, but I feel like we're getting there 😄
|
||
# Everything before the first section header is in a single | ||
# summary/deprecation warning/extended summary section. This | ||
# ends up called "Summary". |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# ends up called "Summary". | |
# ends up being called "Summary". |
) | ||
if section_hyphen_lines and section_hyphen_lines[0] == 1: | ||
# No summary/deprecation warning/extended summary section | ||
section_starts.pop(0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can section_starts
be a set
to avoid this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I need that list sorted and I'm not sure that set
s are ordered the way dict
s have been since 3.7 or so. This is the last time that variable is used, and dictionary semantics means most other problems are already dealt with; would you prefer I negated the condition and only included the other branch?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, let's only do the other branch! I didn't see it wasn't used anymore.
) -> OrderedDict[str, list[str]]: | ||
"""Ensure proper spacing between sections.""" | ||
new_sections = OrderedDict([]) | ||
for section_name, section_lines in sections.items(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shall we replace in-place here as well?
|
||
def sincos(theta): | ||
"""Returns | ||
------- |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't seem correct? Shouldn't it be a little longer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It matches the length of the word "Returns", and is shorter in the file by three characters, matching the three double quotes starting that line.
It might be clearer as
def sincos(theta):
"""\
Returns
-------
...
"""
(same number of hyphens)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean, there isn't really a style guide for this I think, but imo it makes sense to add more -
here so that the second line covers both the quotes and Returns
.
|
||
|
||
def sincos(theta): | ||
""" Parameters |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should probably be fixed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A variant on textwrap.dedent
that ignored the indentation of the first line for determining common whitespace, which might work, then a similar variant of textwrap.indent
at the end.
I would need to test how both handle
"""\
A summary line, indented the same as the others for all tools.
Section Header
--------------
...
"""
Another option might be to replace the f"{quotes:s}{body:s}{quotes:s}"
reconstructing the new docstring with f"{quotes:s}\\\n{body:s}{quotes:s}"
if the first character of the body is whitespace, but that still leaves the question of what to do with that "Returns" at the left margin.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't we just check which line we are inserting on and if it's 0
we remove any indent and if higher we add an indent if it is missing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's another option.
The numpydoc style guide doesn't specify whether it prefers first line of docstring on same line as quotes or different line, but has examples of both. The numpydoc validation tool suggests the
"""\
Summary.
Extended summary...
"""
form is preferred to the
"""Summary.
Extended summary...
"""
form, which doesn't seem to agree with the style guide. It also insists on extended summary and "See Also" sections, which I don't see always being necessary.
Supporting the first form is relatively straightforward; looping through the first lines of each section and doing line.lstrip()
on the first section or textwrap.indent(line)
for subsequent sections is also straightforward.
theta: float | ||
the angle at which to calculate the sine and cosine. | ||
|
||
Returns |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This as well.
This comment has been minimized.
This comment has been minimized.
# Everything before the first section header is in a single | ||
# summary/deprecation warning/extended summary section. This | ||
# ends up being called "Summary". |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# Everything before the first section header is in a single | |
# summary/deprecation warning/extended summary section. This | |
# ends up being called "Summary". |
Now that I look at it again, this no longer makes sense here.
section[0] = section[0].lstrip() | ||
elif not section[0][0].isspace(): | ||
section[0] = f"{' ' * indent_length:s}{section[0]:s}" | ||
first_section = False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
section[0] = section[0].lstrip() | |
elif not section[0][0].isspace(): | |
section[0] = f"{' ' * indent_length:s}{section[0]:s}" | |
first_section = False | |
section[0] = section[0].lstrip() | |
first_section = False | |
elif not section[0][0].isspace(): | |
section[0] = f"{' ' * indent_length:s}{section[0]:s}" |
Saves some reassignments 😄
|
||
def sincos(theta): | ||
"""Returns | ||
------- |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean, there isn't really a style guide for this I think, but imo it makes sense to add more -
here so that the second line covers both the quotes and Returns
.
This comment has been minimized.
This comment has been minimized.
Co-authored-by: Daniël van Noord <[email protected]> Co-authored-by: Pierre Sassoulas <[email protected]>
This commit adds ``NumpydocNameColonTypeFormatter``, ``NumpydocSectionHyphenLengthFormatter``, ``NumpydocSectionOrderingFormatter`` and ``NumpydocSectionSpacingFormatter`` Co-authored-by: Daniël van Noord <[email protected]>
ad56aeb
to
03833db
Compare
According to the primer, this change has no effect on the checked open source code. 🤖🎉 |
@DWesl Thanks for all the work you put into this PR. I'm going to make a release immediately after this has been merged 😄 |
Related to #125; I got the bits I could think how to automate for the docstring style I actually use (numpydoc)
Should probably extend the tests for more types of docstrings and more sections within each docstring.
Ideally the section test would have docstrings on
and would test the ordering of some subset of these sections:
Formatters for:
x : float
notx:float
orx: float
)