Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

wip Remove docstring substitutions #57683

Closed

Conversation

MarcoGorelli
Copy link
Member

  • closes #xxxx (Replace xxxx with the GitHub issue number)
  • Tests added and passed if fixing a bug or adding a new feature
  • All code checks passed.
  • Added type annotations to new arguments/methods/functions.
  • Added an entry in the latest doc/source/whatsnew/vX.X.X.rst file if fixing a bug or adding a new feature.

@MarcoGorelli MarcoGorelli changed the title wip Remove docstring validator wip Remove docstring substitutions Feb 29, 2024
@Dr-Irv
Copy link
Contributor

Dr-Irv commented Feb 29, 2024

So it seems that the advantage of docstring substitutions is that you only document certain things in one place. Otherwise, you have to repeat doc text in multiple places. That could create a future maintenance burden, if you have to change the description of a parameter in multiple places.

So maybe the solution would be to write a special code that does the substitutions for the purpose of doing the code validation. Not sure how hard that would be.

@rhshadrach
Copy link
Member

So it seems that the advantage of docstring substitutions is that you only document certain things in one place.

There have been a number of times where I've noticed something wrong with the docs, went to go correct it only to see that I'd have to tease apart and redo a lot of the sharing. Not having the time (and many times, the desire) to do so, I either raised and issue or let it slip.

While I do agree that we may end up with inconsistencies, in my experience I think our docstrings would be a lot better if we made them easier to edit.

It is also possible to set up tests to check that e.g. numeric_only always has the same description by parsing docstrings. Not certain this is worth the effort, but it's at least possible to do.

@mroeschke
Copy link
Member

Awesome to see that automation took care of a lot of this! Big +1.

Maintaining docstrings as purely text is worth the tradeoff of maintaining complex docstring automation and sharing. It will also be more straightforward for new contributors as well.

If we want to maintain consistency, we can do that through unit testing instead (we also do this with the docstrings of Timestamp and NaT

@MarcoGorelli
Copy link
Member Author

There have been a number of times where I've noticed something wrong with the docs, went to go correct it only to see that I'd have to tease apart and redo a lot of the sharing. Not having the time (and many times, the desire) to do so, I either raised and issue or let it slip.

😄 same

A bit of copy-and-pasting for docs isn't too bad, and as you've both noted, it's possible to write unit tests to ensure that some docstrings stay in sync

If there's no objections, I'll set aside some time to make this happen then (but my first priority is to properly handle pyarrow dtypes in the interchange protocol)

@Dr-Irv
Copy link
Contributor

Dr-Irv commented Mar 1, 2024

A bit of copy-and-pasting for docs isn't too bad, and as you've both noted, it's possible to write unit tests to ensure that some docstrings stay in sync

If you can do this, I'm all for it. My concern here is that most contributors won't realize that they have to change some docstring in 4 different files without some kind of unit test checking that.

@datapythonista
Copy link
Member

While reusing docstrings can surely make things easier given the amount of duplicated methods in pandas, I do agree that in the current state it's probably creating more trouble than the value it adds. For example, the make_doc function to generate docstrings for the reduction functions is surely quite annoying to deal with, see this PR: #57682

Seems from the comments here that in general we're happy to fully get rid of the duplication, but I don't think a final decision has been made. I see some discussion in different issues and PRs, mostly here, #57578 and #57710, but since the impact is quite big in term of changes to the code, what do you think about creating a specific issue to make a global decision (maybe even a PDEP), on whether we want to do reusing at all, in which well defined cases would it be acceptable if any, and how are we planning to move forward with the changes (maybe some automation like a script to try to identify all the docstrings being reused, and keep an inventory to work on)?

Maybe I'm too cautious, but feels like we may end up in a situation where we're not well aligned among the core dev team, and we have lots of PRs related to this, also mixed with other possibly conflicting docstring efforts.

@MarcoGorelli
Copy link
Member Author

now that the docstring validation has been noticeably sped up in CI, I don't think this is as important - closing then, thanks for comments

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants