Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DOC: fix PR02 errors in docstrings #57111

Closed
jordan-d-murphy opened this issue Jan 28, 2024 · 36 comments
Closed

DOC: fix PR02 errors in docstrings #57111

jordan-d-murphy opened this issue Jan 28, 2024 · 36 comments
Labels
CI Continuous Integration Docs good first issue

Comments

@jordan-d-murphy
Copy link
Contributor

Pandas has a script for validating docstrings:

pandas/ci/code_checks.sh

Lines 72 to 194 in 9008ee5

MSG='Partially validate docstrings (PR02)' ; echo $MSG
$BASE_DIR/scripts/validate_docstrings.py --format=actions --errors=PR02 --ignore_functions \
pandas.io.formats.style.Styler.to_excel\
pandas.CategoricalIndex.rename_categories\
pandas.CategoricalIndex.reorder_categories\
pandas.CategoricalIndex.add_categories\
pandas.CategoricalIndex.remove_categories\
pandas.CategoricalIndex.set_categories\
pandas.IntervalIndex.set_closed\
pandas.IntervalIndex.contains\
pandas.IntervalIndex.overlaps\
pandas.IntervalIndex.to_tuples\
pandas.DatetimeIndex.round\
pandas.DatetimeIndex.floor\
pandas.DatetimeIndex.ceil\
pandas.DatetimeIndex.month_name\
pandas.DatetimeIndex.day_name\
pandas.DatetimeIndex.to_period\
pandas.DatetimeIndex.std\
pandas.TimedeltaIndex.round\
pandas.TimedeltaIndex.floor\
pandas.TimedeltaIndex.ceil\
pandas.PeriodIndex\
pandas.PeriodIndex.strftime\
pandas.Series.clip\
pandas.Series.rename_axis\
pandas.Series.interpolate\
pandas.Series.dt.to_period\
pandas.Series.dt.tz_localize\
pandas.Series.dt.tz_convert\
pandas.Series.dt.strftime\
pandas.Series.dt.round\
pandas.Series.dt.floor\
pandas.Series.dt.ceil\
pandas.Series.dt.month_name\
pandas.Series.dt.day_name\
pandas.Series.str.wrap\
pandas.Series.cat.rename_categories\
pandas.Series.cat.reorder_categories\
pandas.Series.cat.add_categories\
pandas.Series.cat.remove_categories\
pandas.Series.cat.set_categories\
pandas.Series.plot\
pandas.Series.plot.bar\
pandas.Series.plot.barh\
pandas.Series.plot.line\
pandas.Series.plot.pie\
pandas.DataFrame.clip\
pandas.DataFrame.plot\
pandas.DataFrame.plot.bar\
pandas.DataFrame.plot.barh\
pandas.DataFrame.plot.line\
pandas.DataFrame.plot.pie\
pandas.tseries.offsets.DateOffset\
pandas.tseries.offsets.BusinessDay\
pandas.tseries.offsets.BDay\
pandas.tseries.offsets.BusinessHour\
pandas.tseries.offsets.CustomBusinessDay\
pandas.tseries.offsets.CDay\
pandas.tseries.offsets.CustomBusinessHour\
pandas.tseries.offsets.MonthEnd\
pandas.tseries.offsets.MonthBegin\
pandas.tseries.offsets.BusinessMonthEnd\
pandas.tseries.offsets.BMonthEnd\
pandas.tseries.offsets.BusinessMonthBegin\
pandas.tseries.offsets.BMonthBegin\
pandas.tseries.offsets.CustomBusinessMonthEnd\
pandas.tseries.offsets.CBMonthEnd\
pandas.tseries.offsets.CustomBusinessMonthBegin\
pandas.tseries.offsets.CBMonthBegin\
pandas.tseries.offsets.SemiMonthEnd\
pandas.tseries.offsets.SemiMonthBegin\
pandas.tseries.offsets.Week\
pandas.tseries.offsets.WeekOfMonth\
pandas.tseries.offsets.LastWeekOfMonth\
pandas.tseries.offsets.BQuarterEnd\
pandas.tseries.offsets.BQuarterBegin\
pandas.tseries.offsets.QuarterEnd\
pandas.tseries.offsets.QuarterBegin\
pandas.tseries.offsets.BYearEnd\
pandas.tseries.offsets.BYearBegin\
pandas.tseries.offsets.YearEnd\
pandas.tseries.offsets.YearBegin\
pandas.tseries.offsets.FY5253\
pandas.tseries.offsets.FY5253Quarter\
pandas.tseries.offsets.Easter\
pandas.tseries.offsets.Day\
pandas.tseries.offsets.Hour\
pandas.tseries.offsets.Minute\
pandas.tseries.offsets.Second\
pandas.tseries.offsets.Milli\
pandas.tseries.offsets.Micro\
pandas.tseries.offsets.Nano\
pandas.describe_option\
pandas.reset_option\
pandas.get_option\
pandas.set_option\
pandas.Timestamp.max\
pandas.Timestamp.min\
pandas.Timestamp.resolution\
pandas.Timedelta.max\
pandas.Timedelta.min\
pandas.Timedelta.resolution\
pandas.Interval\
pandas.Grouper\
pandas.core.groupby.SeriesGroupBy.apply\
pandas.core.groupby.SeriesGroupBy.transform\
pandas.core.groupby.DataFrameGroupBy.transform\
pandas.core.groupby.DataFrameGroupBy.nth\
pandas.core.groupby.DataFrameGroupBy.rolling\
pandas.core.groupby.SeriesGroupBy.idxmax\
pandas.core.groupby.SeriesGroupBy.idxmin\
pandas.core.groupby.SeriesGroupBy.nth\
pandas.core.groupby.SeriesGroupBy.rolling\
pandas.core.groupby.DataFrameGroupBy.hist\
pandas.core.groupby.DataFrameGroupBy.plot\
pandas.core.groupby.SeriesGroupBy.plot\
pandas.read_hdf\
pandas.HDFStore.append\
pandas.core.window.rolling.Rolling.quantile\
pandas.core.window.expanding.Expanding.quantile\
pandas.api.extensions.ExtensionArray.argsort # There should be no backslash in the final line, please keep this comment in the last ignored function
RET=$(($RET + $?)) ; echo $MSG "DONE"

Currently, some methods fail the PR02 check.

The task here is:

  • take 2-4 methods
  • run: scripts/validate_docstrings.py --format=actions --errors=PR02 method-name
  • check if validation docstrings passes for those methods, and if it’s necessary fix the docstrings according to whatever error is reported
  • remove those methods from code_checks.sh
  • commit, push, open pull request

Please don't comment take as multiple people can work on this issue. You also don't need to ask for permission to work on this, just comment on which methods are you going to work.

If you're new contributor, please check the contributing guide

thanks @natmokval @mroeschke for the inspiration for this issue

@jordan-d-murphy
Copy link
Contributor Author

I'll take

  • pandas.Series.interpolate
  • pandas.read_hdf
  • pandas.HDFStore.append

@jrmylow
Copy link
Contributor

jrmylow commented Jan 28, 2024

I'll get started with:

  • pandas.describe_option
  • pandas.reset_option
  • pandas.get_option
  • pandas.set_option

@jordan-d-murphy
Copy link
Contributor Author

I'll take:

  • pandas.core.groupby.SeriesGroupBy.idxmax
  • pandas.core.groupby.SeriesGroupBy.idxmin

@jrmylow
Copy link
Contributor

jrmylow commented Jan 30, 2024

Would it be helpful to add the good-first-issue label to this? Seems the previous iteration (for EX03?) had that

@jordan-d-murphy
Copy link
Contributor Author

I think so, but I can't add the label myself. Feel free to add if you have permissions 🙂

@manlattan
Copy link
Contributor

/take

@thiphan94 thiphan94 mentioned this issue Jan 31, 2024
5 tasks
@thiphan94
Copy link
Contributor

I worked on:

  • pandas.Series.clip
  • pandas.DataFrame.clip
  • pandas.api.extensions.ExtensionArray.argsort

@jordan-d-murphy
Copy link
Contributor Author

I'll take

  • pandas.core.groupby.SeriesGroupBy.transform
  • pandas.core.groupby.DataFrameGroupBy.transform

@jordan-d-murphy
Copy link
Contributor Author

jordan-d-murphy commented Feb 2, 2024

Note - a new method was added to the list in ci/code_checks.sh

pandas.core.groupby.DataFrameGroupBy.corrwith

See:

#57109
#57158

@jordan-d-murphy
Copy link
Contributor Author

I'll work on

  • pandas.core.window.rolling.Rolling.quantile
  • pandas.core.window.expanding.Expanding.quantile

@jordan-d-murphy
Copy link
Contributor Author

I got a fix for pandas.io.formats.style.Styler.to_excel

@nkasing
Copy link
Contributor

nkasing commented Feb 4, 2024

I'll take

  • pandas.Series.plot.bar / pandas.DataFrame.plot.bar
  • pandas.Series.plot.barh / pandas.DataFrame.plot.barh
  • pandas.Series.plot.line / pandas.DataFrame.plot
  • pandas.Series.plot.pie / pandas.DataFrame.plot.pie

@jordan-d-murphy
Copy link
Contributor Author

got a fix for pandas.Series.rename_axis

@nkasing
Copy link
Contributor

nkasing commented Feb 4, 2024

"Unknown parameters" arises from parameters in the docstring that do not appear in the function signatures.

Often, this happens because the parameter is buried in **kwargs. For example, in the docstring for bar and line plots, color is specified as a parameter, but the function signature does not include it.

In these cases, the only thing I can think of is to copy the default color value, adding it to the function signature (note this wouldn't be updated if matplotlib's defaults update) and then override it in the function if it is found in kwargs, but I really don't like this solution.

What would be a better method?

@jordan-d-murphy
Copy link
Contributor Author

jordan-d-murphy commented Feb 4, 2024

I've noticed this too, and I've been thinking about this a lot over the last week or so, but I'm not sure yet what the best approach would be. Whatever we end up choosing as the best approach, there are quite a few docstrings in the list that exhibit this behavior and could be resolved with the solution we decide on.

@jrmylow
Copy link
Contributor

jrmylow commented Feb 4, 2024

As a suggestion, I've implemented something that might work in: #57235, open to suggestions on whether this reads well to people

@jrmylow
Copy link
Contributor

jrmylow commented Feb 6, 2024

hi all, would like to get some opinions around the difficulty I'm having with the pd.tseries.offsets. Note in this case the problem is due to how the objects are generated, and not necessarily that the docstrings are bad.

Currently, the offsets are generated using cython. Cython unfortunately does not support inspect (nor does it plan to) [1]. As I understand it, the signatures generated don't even have parameters attached to them (despite having parameters in the wrong code).

The numpydoc test suite that runs uses inspect under the hood [2] and returns no parameters if the underlying object doesn't support introspection (e.g. a C object).

I suspect there are libraries out there that would allow introspection of c objects and their signatures, is it alright to just import one of them? Alternatively I can go and look at whether we can expose this via tinkering with the Cython flags but that's new territory for me.

[1] - https://cython.readthedocs.io/en/latest/src/userguide/limitations.html#inspect-support
[2] - https://github.com/numpy/numpydoc/blob/8e2d545131fc2a124dd01d749c385f827bb1072b/numpydoc/validate.py#L422

@jordan-d-murphy
Copy link
Contributor Author

I got a fix for pandas.core.groupby.SeriesGroupBy.apply

@jordan-d-murphy
Copy link
Contributor Author

I got a fix for pandas.core.groupby.DataFrameGroupBy.hist

@jordan-d-murphy
Copy link
Contributor Author

jordan-d-murphy commented Feb 9, 2024

I'll open a PR for pandas.Series.str.wrap

@Dxuian
Copy link

Dxuian commented Feb 10, 2024

Im new to open source , apologies in advance when i run the command to check which methods fail the PR02 check even the methods that other people have dealt with and merged show up ?
are these the default errors that show up ? or does that suggest some error in the PR02 validation checker ?

@jordan-d-murphy
Copy link
Contributor Author

@Dxuian - try running this with some of the ones that haven't been fixed yet
scripts/validate_docstrings.py --format=actions --errors=PR02 <method-name>
for example:
scripts/validate_docstrings.py --format=actions --errors=PR02 pandas.Series.dt.to_period

@Dxuian
Copy link

Dxuian commented Feb 10, 2024

@Dxuian - try running this with some of the ones that haven't been fixed yet scripts/validate_docstrings.py --format=actions --errors=PR02 <method-name> for example: scripts/validate_docstrings.py --format=actions --errors=PR02 pandas.Series.dt.to_period

the checks fail for pandas.Series.dt.to_periods as is expected , what im trying to say is that the checks also fail for pandas.read_hdf
dddddd
which was previously fixed and merged or does that happen because pandas.read_hdf is no longer in code_checks.sh
thank you in advance

@jordan-d-murphy
Copy link
Contributor Author

Ah I see, @Dxuian

Running scripts/validate_docstrings.py --format=actions --errors=PR02 pandas.read_hdf on my branch outputs the following,

################################################################################
################################## Validation ##################################
################################################################################

Docstring for "pandas.read_hdf" correct. :)

which makes me think what might be going on is your branch might be out of sync with main.

I've been using these commands to keep my env and branches in sync with the latest updates. Can you try updating your branch with the latest changes from main?

Updating the development environment

git checkout main
git merge upstream/main
mamba activate pandas-dev
mamba env update -f environment.yml --prune

Creating a feature branch

git checkout main
git pull upstream main --ff-only
git checkout -b shiny-new-feature

Pre-commit

pre-commit install
pre-commit run --files
pre-commit run --from-ref=upstream/main --to-ref=HEAD --all-files
pre-commit run --hook-stage manual --all-files

Updating your pull request

git checkout shiny-new-feature
git fetch upstream
git merge upstream/main

@jordan-d-murphy
Copy link
Contributor Author

I got fixes for these:

pandas.core.groupby.DataFrameGroupBy.rolling
pandas.core.groupby.SeriesGroupBy.rolling

@Dxuian
Copy link

Dxuian commented Feb 10, 2024

Ah I see, @Dxuian

Running scripts/validate_docstrings.py --format=actions --errors=PR02 pandas.read_hdf on my branch outputs the following,

################################################################################
################################## Validation ##################################
################################################################################

Docstring for "pandas.read_hdf" correct. :)

which makes me think what might be going on is your branch might be out of sync with main.

I've been using these commands to keep my env and branches in sync with the latest updates. Can you try updating your branch with the latest changes from main?

Updating the development environment

git checkout main git merge upstream/main mamba activate pandas-dev mamba env update -f environment.yml --prune

Creating a feature branch

git checkout main git pull upstream main --ff-only git checkout -b shiny-new-feature

Pre-commit

pre-commit install pre-commit run --files pre-commit run --from-ref=upstream/main --to-ref=HEAD --all-files pre-commit run --hook-stage manual --all-files

Updating your pull request

git checkout shiny-new-feature git fetch upstream git merge upstream/main

so i've been trying to wrestle what going on wrong since yesterday but i cant seem to pin it down even after i've pulled from the latest main branch i still get the same PR02 errors on issues that should've been fixed after checking out this issue page.
zzzz
I've reset all env tried different dev configs
I want to confirm has anyone who's not worked on the issue been able replicate this or is it just me ?
so far ive redownloaded docker containers , used the pip venv method even tried the mamba method but nothing seems to work. can someone confirm if
the issue still persists for commits that have been merged
image
thanks in advance

@jordan-d-murphy
Copy link
Contributor Author

jordan-d-murphy commented Feb 11, 2024

EDIT: this suggestion didn't end up panning out, I am going to close the PR.

@pmhatre1
Copy link
Contributor

pmhatre1 commented Mar 8, 2024

Got fixes for
pandas.DatetimeIndex.day_name
pandas.DatetimeIndex.month_name

@jordan-d-murphy
Copy link
Contributor Author

jordan-d-murphy commented Mar 17, 2024

Hi all, if CI: speedup docstring check consecutive runs #57826 gets merged in, I might be reworking our approach here; this would look like closing the following issues:

DOC: fix GL08 errors in docstrings
DOC: fix PR01 errors in docstrings
DOC: fix PR07 errors in docstrings
DOC: fix SA01 errors in docstrings
DOC: fix RT03 errors in docstrings
DOC: fix PR02 errors in docstrings

And opening a new issue to address these based on the new approach.

tl;dr

the work can still be done, but probably under a new ticket once #57826 is merged in

@jordan-d-murphy
Copy link
Contributor Author

@jordan-d-murphy
Copy link
Contributor Author

Opened DOC: Enforce Numpy Docstring Validation (Parent Issue) #58063 as a parent issue for fixing docstrings based on the refactoring in code_checks.sh

Feel free to swing by and help out! 🙂

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CI Continuous Integration Docs good first issue
Projects
None yet
10 participants