Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: Add formatters for numpydoc section ordering and name/type spacing #132

Merged
merged 2 commits into from
Aug 21, 2022

Conversation

DWesl
Copy link
Contributor

@DWesl DWesl commented Jul 21, 2022

Related to #125; I got the bits I could think how to automate for the docstring style I actually use (numpydoc)

Should probably extend the tests for more types of docstrings and more sections within each docstring.

Ideally the section test would have docstrings on

  • function
  • class
  • method
  • generator
  • module
  • constant

and would test the ordering of some subset of these sections:

  • Parameters
  • Returns
  • Raises
  • Examples
  • Yields
  • References

Formatters for:

  • name-colon parameter spacing (x : float not x:float or x: float)
  • Section ordering
  • Section spacing (blank line between sections)
  • Length of line of hyphens after section header (should be same length as section header)

@github-actions

This comment has been minimized.

@codecov
Copy link

codecov bot commented Jul 21, 2022

Codecov Report

Merging #132 (03833db) into main (edd53a0) will not change coverage.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff            @@
##              main      #132   +/-   ##
=========================================
  Coverage   100.00%   100.00%           
=========================================
  Files           20        21    +1     
  Lines          490       576   +86     
=========================================
+ Hits           490       576   +86     
Impacted Files Coverage Δ
...tringformatter/_configuration/arguments_manager.py 100.00% <ø> (ø)
pydocstringformatter/_formatting/__init__.py 100.00% <100.00%> (ø)
pydocstringformatter/_formatting/base.py 100.00% <100.00%> (ø)
...stringformatter/_formatting/formatters_numpydoc.py 100.00% <100.00%> (ø)

@Pierre-Sassoulas Pierre-Sassoulas added the enhancement New feature or request label Jul 21, 2022
@Pierre-Sassoulas Pierre-Sassoulas added this to the 0.7.0 milestone Jul 21, 2022
Copy link
Collaborator

@Pierre-Sassoulas Pierre-Sassoulas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks pretty good. I was wondering if we should detect that a docstring is in numpy style, but it can be handled in configuration. (Do not add the formatter if you don't have numpy style docstring)

pydocstringformatter/_formatting/base.py Outdated Show resolved Hide resolved
pydocstringformatter/_formatting/base.py Outdated Show resolved Hide resolved
@DanielNoord DanielNoord self-requested a review July 21, 2022 19:13
Copy link
Owner

@DanielNoord DanielNoord left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have the time for a full review right now but this is highly appreciated!

That said, I would like to handle numpy docstrings behind a --style= flag. I think for future maintainability that will be much easier.
Since in reality projects can use different styles at the same time --style should probably be an append type with the default being ["pep257"]. For the tests in this PR we would then use --style=numpy.

I should have probably documented this somewhere in the issue. Sorry about that!

We could also first create the PR that adds --style to keep the size of PRs limited. I might be able to do this myself but that might take 2/3 weeks.

@DWesl I have quickly hacked together a PR to add the --style flag as I had some spare time. See #138. After that it should be trivial to add "numpy" to the available choices for the option.

@DWesl
Copy link
Contributor Author

DWesl commented Jul 28, 2022

To add numpy to the options, yes, but actually having that option do something will take a bit more work. I think I could arrange for "numpydoc" in run.config.style to imply both options I have here without terribly much trouble, but it sounded like you had a different idea for how the code should work.

@DanielNoord
Copy link
Owner

Yeah I haven't thought this through completely but I was thinking of perhaps adding a style attribute to Formatter which is "pep257" by default but can also be "numpy" and then add formatters to the list of formatters to run based on the config.style attribute.

Does that sound like it would work?

@DWesl
Copy link
Contributor Author

DWesl commented Jul 30, 2022

The straightforward way to do that involves having the default for the formatter options depend on the --style value, which I don't know how to do with argparse

@DWesl
Copy link
Contributor Author

DWesl commented Jul 30, 2022

It turns out BooleanOptionalAction doesn't type-check the default, so I can use None as the default for checkers associated with a particular style, then loop through the formatter options after parsing the options and set any values that are still None (i.e., they were not set on the command line) based on the values in self.namespace.style.

@github-actions

This comment has been minimized.

@DanielNoord DanielNoord self-requested a review August 8, 2022 18:44
@github-actions

This comment has been minimized.

@DanielNoord
Copy link
Owner

@DWesl Just want to say this is on my radar. I'm on holiday currently but I expect to get to this PR somewhere this week! 😄

Copy link
Owner

@DanielNoord DanielNoord left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did a first review of the code associated with the adding of the new style. The formatters themselves look relatively straightforward and I didn't see anything obvious just now.

I'm really happy you worked on this as this seems like a very nice addition. Let me know if there is anything I can do to help, I'd like to merge and release this ASAP 😄

docs/usage.rst Outdated Show resolved Hide resolved
pydocstringformatter/_configuration/arguments_manager.py Outdated Show resolved Hide resolved
pydocstringformatter/_configuration/arguments_manager.py Outdated Show resolved Hide resolved
pydocstringformatter/_configuration/arguments_manager.py Outdated Show resolved Hide resolved
pydocstringformatter/_configuration/formatter_options.py Outdated Show resolved Hide resolved
pydocstringformatter/_formatting/base.py Outdated Show resolved Hide resolved
@github-actions

This comment has been minimized.

2 similar comments
@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

Copy link
Owner

@DanielNoord DanielNoord left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have removed all the stuff that was related to how formatters are being run and merged it in #145. That way we can focus on the numpy stuff here. Sorry for the push to your branch, but that seemed to most effective way forward.

Let me know what you think!

pydocstringformatter/_formatting/base.py Outdated Show resolved Hide resolved
pydocstringformatter/_formatting/base.py Outdated Show resolved Hide resolved
pydocstringformatter/_formatting/formatters_numpydoc.py Outdated Show resolved Hide resolved
pydocstringformatter/_formatting/formatters_numpydoc.py Outdated Show resolved Hide resolved
pydocstringformatter/_formatting/formatters_numpydoc.py Outdated Show resolved Hide resolved
pydocstringformatter/_formatting/formatters_numpydoc.py Outdated Show resolved Hide resolved
tests/test_config.py Show resolved Hide resolved
pydocstringformatter/_configuration/arguments_manager.py Outdated Show resolved Hide resolved
pydocstringformatter/_formatting/formatters_numpydoc.py Outdated Show resolved Hide resolved
pydocstringformatter/_formatting/formatters_numpydoc.py Outdated Show resolved Hide resolved
tests/test_config.py Show resolved Hide resolved
pydocstringformatter/_formatting/formatters_numpydoc.py Outdated Show resolved Hide resolved
pydocstringformatter/_formatting/formatters_numpydoc.py Outdated Show resolved Hide resolved
@github-actions

This comment has been minimized.

1 similar comment
@github-actions

This comment has been minimized.

Copy link
Owner

@DanielNoord DanielNoord left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR is becoming quite large. If you think of any other formatters please add them in another PR.

I think this might be my last comments 😄

docs/usage.rst Outdated
[files ...]

positional arguments:
files The directory or files to format.

options:
optional arguments:
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which interpreter version are you using locally?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cygwin CPython 3.9; should I be using 3.8 or 3.7 for this?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No I think it might actually be 3.10 that is giving different results here... 😓

Anyway, let's fix this at the end of this PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was hoping it might be because I did pip install -r requirements-test.txt instead of pip install -U -r requirements-test.txt, but adding the -U didn't seem to change anything

# Rejoin sections
new_lines = [line for section in new_sections.values() for line in section]
# Ensure the last line puts the quotes in the right spot
# Enforces indented closing quotes on the last line
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This probably shouldn't be handled by this formatter. Can we store whether the closing quotes were already on a new line and do this according to that?

We like to keep each formatter responsible for a single thing to provide optimal customisation for users.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, new test for that. Would it help if I made all the test input files symlinks of numpydoc_style.py, with only the changes for that formatter reflected in the corresponding .py.out file?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That might work well in this case yeah!

Copy link
Contributor Author

@DWesl DWesl Aug 11, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whole bunch of symlinks created and .args files expanded. I hope git knows how to set these up on other computers; I've had problems before.

) -> OrderedDict[str, list[str]]:
"""Sort the numpydoc sections into the numpydoc order."""
new_sections = OrderedDict([sections.popitem(last=False)])
new_sections.update(
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't we remove L36 and just return a sorted OrderedDict?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see an OrderedDict.sort method here. and the implementation here looks like the ordering is kept by a linked list rather than a list I can sort.

To merge line 36 with 37-44, I would need a consistent name for the summary/deprecation warning/extended summary section, which currently uses the summary as the name. I can look into that, as it would simplify other things.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see an OrderedDict.sort method here. and the implementation here looks like the ordering is kept by a linked list rather than a list I can sort.

👍

To merge line 36 with 37-44, I would need a consistent name for the summary/deprecation warning/extended summary section, which currently uses the summary as the name. I can look into that, as it would simplify other things.

Yeah, Summary seems fine to me. We could add a comment somewhere that it also includes the other two.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Initial section now named "Summary" unless the initial section is zero lines, in which case the initial section shares a name with the next section and gets disappeared by the dict constructor.

new_sections = OrderedDict([])
first_section = True
for section_name, section_lines in sections.items():
if first_section:
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But what if the first section isn't a summary? Could you add a test for this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it starts with

    """Parameters
    ----------

then there is no summary section; I think the current code would change this to

    """
    Parameters
    ------------

To fix this further, I would need a consistent name for the summary/deprecation warning/extended summary section.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See above.

Although I think this change is good, it should probably also be its own formatter. But let's add that in a follow up PR.

Copy link
Contributor Author

@DWesl DWesl Aug 11, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I made it so that starting with a section header on the first line will work. Should I change a test to make sure?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think that would be good!

Copy link
Contributor Author

@DWesl DWesl Aug 11, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Works" now, although interaction with the section ordering formatter can produce strange results:

    """    Parameters
    ----------
    ...

Returns
    -------
    ...

I think there's a formatter to strip the leading space before "Parameters" (may need two runs to finish); not sure about indenting Returns properly.

Comment on lines 36 to 48
new_sections = OrderedDict([sections.popitem(last=False)])
new_sections.update(
OrderedDict(
[
(sec_name, sections[sec_name])
for sec_name in sorted(
sections.keys(),
key=self.numpydoc_section_order.index,
)
]
)
)
return new_sections
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another option, using OrderedDict features and avoiding the KeyError on weird sections:

Suggested change
new_sections = OrderedDict([sections.popitem(last=False)])
new_sections.update(
OrderedDict(
[
(sec_name, sections[sec_name])
for sec_name in sorted(
sections.keys(),
key=self.numpydoc_section_order.index,
)
]
)
)
return new_sections
new_sections = sections.copy()
for sec_name in self.numpydoc_section_order:
try:
new_sections.move_to_end(sec_name)
except KeyError:
pass
for sec_name in new_sections.keys()[1:]:
if sec_name not in self.numpydoc_section_order:
new_sections.move_to_end(sec_name)
return new_sections

Copy link
Owner

@DanielNoord DanielNoord Aug 11, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Too bad there isn't a move_to_front. But this seems to work!

Copy link
Contributor Author

@DWesl DWesl Aug 11, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's move_to_end(key, last=False), which should do the same thing. My main reason for ordering it this way is now moot, so I should be able to iterate through reversed(self.numpydoc_section_order) without a problem.

Copy link
Owner

@DanielNoord DanielNoord left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lots of different things going on in this PR, but I feel like we're getting there 😄


# Everything before the first section header is in a single
# summary/deprecation warning/extended summary section. This
# ends up called "Summary".
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# ends up called "Summary".
# ends up being called "Summary".

)
if section_hyphen_lines and section_hyphen_lines[0] == 1:
# No summary/deprecation warning/extended summary section
section_starts.pop(0)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can section_starts be a set to avoid this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I need that list sorted and I'm not sure that sets are ordered the way dicts have been since 3.7 or so. This is the last time that variable is used, and dictionary semantics means most other problems are already dealt with; would you prefer I negated the condition and only included the other branch?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, let's only do the other branch! I didn't see it wasn't used anymore.

) -> OrderedDict[str, list[str]]:
"""Ensure proper spacing between sections."""
new_sections = OrderedDict([])
for section_name, section_lines in sections.items():
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we replace in-place here as well?


def sincos(theta):
"""Returns
-------
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't seem correct? Shouldn't it be a little longer?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It matches the length of the word "Returns", and is shorter in the file by three characters, matching the three double quotes starting that line.

It might be clearer as

def sincos(theta):
    """\
    Returns
    -------
    ...
    """

(same number of hyphens)

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean, there isn't really a style guide for this I think, but imo it makes sense to add more - here so that the second line covers both the quotes and Returns.



def sincos(theta):
""" Parameters
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should probably be fixed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A variant on textwrap.dedent that ignored the indentation of the first line for determining common whitespace, which might work, then a similar variant of textwrap.indent at the end.

I would need to test how both handle

    """\
    A summary line, indented the same as the others for all tools.

    Section Header
    --------------
    ...
    """

Another option might be to replace the f"{quotes:s}{body:s}{quotes:s}" reconstructing the new docstring with f"{quotes:s}\\\n{body:s}{quotes:s}" if the first character of the body is whitespace, but that still leaves the question of what to do with that "Returns" at the left margin.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't we just check which line we are inserting on and if it's 0 we remove any indent and if higher we add an indent if it is missing?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's another option.

The numpydoc style guide doesn't specify whether it prefers first line of docstring on same line as quotes or different line, but has examples of both. The numpydoc validation tool suggests the

    """\
    Summary.

    Extended summary...
    """

form is preferred to the

    """Summary.

    Extended summary...
    """

form, which doesn't seem to agree with the style guide. It also insists on extended summary and "See Also" sections, which I don't see always being necessary.

Supporting the first form is relatively straightforward; looping through the first lines of each section and doing line.lstrip() on the first section or textwrap.indent(line) for subsequent sections is also straightforward.

theta: float
the angle at which to calculate the sine and cosine.

Returns
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This as well.

@github-actions

This comment has been minimized.

Comment on lines 244 to 246
# Everything before the first section header is in a single
# summary/deprecation warning/extended summary section. This
# ends up being called "Summary".
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# Everything before the first section header is in a single
# summary/deprecation warning/extended summary section. This
# ends up being called "Summary".

Now that I look at it again, this no longer makes sense here.

Comment on lines 284 to 287
section[0] = section[0].lstrip()
elif not section[0][0].isspace():
section[0] = f"{' ' * indent_length:s}{section[0]:s}"
first_section = False
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
section[0] = section[0].lstrip()
elif not section[0][0].isspace():
section[0] = f"{' ' * indent_length:s}{section[0]:s}"
first_section = False
section[0] = section[0].lstrip()
first_section = False
elif not section[0][0].isspace():
section[0] = f"{' ' * indent_length:s}{section[0]:s}"

Saves some reassignments 😄


def sincos(theta):
"""Returns
-------
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean, there isn't really a style guide for this I think, but imo it makes sense to add more - here so that the second line covers both the quotes and Returns.

@github-actions

This comment has been minimized.

Co-authored-by: Daniël van Noord <[email protected]>
Co-authored-by: Pierre Sassoulas <[email protected]>
This commit adds ``NumpydocNameColonTypeFormatter``,
``NumpydocSectionHyphenLengthFormatter``,
``NumpydocSectionOrderingFormatter`` and
``NumpydocSectionSpacingFormatter``

Co-authored-by: Daniël van Noord <[email protected]>
@github-actions
Copy link

According to the primer, this change has no effect on the checked open source code. 🤖🎉

@DanielNoord DanielNoord enabled auto-merge (rebase) August 21, 2022 11:11
@DanielNoord DanielNoord merged commit 9e60bb9 into DanielNoord:main Aug 21, 2022
@DanielNoord
Copy link
Owner

@DWesl Thanks for all the work you put into this PR. I'm going to make a release immediately after this has been merged 😄

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants