Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DOC: fix EX03 errors in docstrings #56804

Closed
Tracked by #56835
natmokval opened this issue Jan 9, 2024 · 66 comments · Fixed by #56863
Closed
Tracked by #56835

DOC: fix EX03 errors in docstrings #56804

natmokval opened this issue Jan 9, 2024 · 66 comments · Fixed by #56863
Labels
Code Style Code style, linting, code_checks Docs good first issue

Comments

@natmokval
Copy link
Contributor

natmokval commented Jan 9, 2024

pandas has a script for validating docstrings

pandas/ci/code_checks.sh

Lines 72 to 172 in b7e2202

MSG='Partially validate docstrings (EX03)' ; echo $MSG
$BASE_DIR/scripts/validate_docstrings.py --format=actions --errors=EX03 --ignore_functions \
pandas.Series.dt.day_name \
pandas.Series.str.len \
pandas.Series.cat.set_categories \
pandas.Series.plot.bar \
pandas.Series.plot.hist \
pandas.Series.plot.line \
pandas.Series.to_sql \
pandas.Series.to_latex \
pandas.errors.CategoricalConversionWarning \
pandas.errors.ChainedAssignmentError \
pandas.errors.ClosedFileError \
pandas.errors.DatabaseError \
pandas.errors.IndexingError \
pandas.errors.InvalidColumnName \
pandas.errors.NumExprClobberingError \
pandas.errors.PossibleDataLossError \
pandas.errors.PossiblePrecisionLoss \
pandas.errors.SettingWithCopyError \
pandas.errors.SettingWithCopyWarning \
pandas.errors.SpecificationError \
pandas.errors.UndefinedVariableError \
pandas.errors.ValueLabelTypeMismatch \
pandas.Timestamp.ceil \
pandas.Timestamp.floor \
pandas.Timestamp.round \
pandas.read_pickle \
pandas.ExcelWriter \
pandas.read_json \
pandas.io.json.build_table_schema \
pandas.DataFrame.to_latex \
pandas.io.formats.style.Styler.to_latex \
pandas.read_parquet \
pandas.DataFrame.to_sql \
pandas.read_stata \
pandas.core.resample.Resampler.pipe \
pandas.core.resample.Resampler.fillna \
pandas.core.resample.Resampler.interpolate \
pandas.plotting.scatter_matrix \
pandas.pivot \
pandas.merge_asof \
pandas.wide_to_long \
pandas.Index.rename \
pandas.Index.droplevel \
pandas.Index.isin \
pandas.CategoricalIndex.set_categories \
pandas.MultiIndex.names \
pandas.MultiIndex.droplevel \
pandas.IndexSlice \
pandas.DatetimeIndex.month_name \
pandas.DatetimeIndex.day_name \
pandas.core.window.rolling.Rolling.corr \
pandas.Grouper \
pandas.core.groupby.SeriesGroupBy.apply \
pandas.core.groupby.DataFrameGroupBy.apply \
pandas.core.groupby.SeriesGroupBy.transform \
pandas.core.groupby.SeriesGroupBy.pipe \
pandas.core.groupby.DataFrameGroupBy.pipe \
pandas.core.groupby.DataFrameGroupBy.describe \
pandas.core.groupby.DataFrameGroupBy.idxmax \
pandas.core.groupby.DataFrameGroupBy.idxmin \
pandas.core.groupby.DataFrameGroupBy.value_counts \
pandas.core.groupby.SeriesGroupBy.describe \
pandas.core.groupby.DataFrameGroupBy.boxplot \
pandas.core.groupby.DataFrameGroupBy.hist \
pandas.io.formats.style.Styler.map \
pandas.io.formats.style.Styler.apply_index \
pandas.io.formats.style.Styler.map_index \
pandas.io.formats.style.Styler.format \
pandas.io.formats.style.Styler.format_index \
pandas.io.formats.style.Styler.relabel_index \
pandas.io.formats.style.Styler.hide \
pandas.io.formats.style.Styler.set_td_classes \
pandas.io.formats.style.Styler.set_tooltips \
pandas.io.formats.style.Styler.set_uuid \
pandas.io.formats.style.Styler.pipe \
pandas.io.formats.style.Styler.highlight_between \
pandas.io.formats.style.Styler.highlight_quantile \
pandas.io.formats.style.Styler.background_gradient \
pandas.io.formats.style.Styler.text_gradient \
pandas.DataFrame.values \
pandas.DataFrame.loc \
pandas.DataFrame.iloc \
pandas.DataFrame.groupby \
pandas.DataFrame.describe \
pandas.DataFrame.skew \
pandas.DataFrame.var \
pandas.DataFrame.idxmax \
pandas.DataFrame.idxmin \
pandas.DataFrame.last \
pandas.DataFrame.pivot \
pandas.DataFrame.sort_values \
pandas.DataFrame.tz_convert \
pandas.DataFrame.tz_localize \
pandas.DataFrame.plot.bar \
pandas.DataFrame.plot.hexbin \
pandas.DataFrame.plot.hist \
pandas.DataFrame.plot.line \
pandas.DataFrame.hist \
RET=$(($RET + $?)) ; echo $MSG "DONE"

Currently, some methods fail the EX03 check.

The task here is:

  • take 2-4 methods
  • run: scripts/validate_docstrings.py --format=actions --errors=EX03 method-name
  • check if validation docstrings passes for those methods, and if it’s necessary fix the docstrings according to whatever error is reported
  • remove those methods from code_checks.sh
  • commit, push, open pull request

Please don't comment take as multiple people can work on this issue. You also don't need to ask for permission to work on this, just comment on which methods are you going to work.

If you're new contributor, please check the contributing guide

thanks @MarcoGorelli for giving me the idea for this issue.

@roadrollerdafjorst
Copy link
Contributor

I can take the first two methods.

  • pandas.Series.dt.day_name
  • pandas.Series.str.len

@luke396
Copy link
Contributor

luke396 commented Jan 10, 2024

I will work for:
pandas.Series.cat.set_categories
pandas.Series.plot.bar
pandas.Series.plot.hist

@luke396
Copy link
Contributor

luke396 commented Jan 10, 2024

Just to be sure, and to clarify for later contributors, the error EX03 refers to all possible flake8 errors.

Reference #27977.

@svrashank
Copy link
Contributor

Working on :

  • pandas.Series.plot.line
  • pandas.Series.to_sql

@Deekshita-S
Copy link
Contributor

working on:

  • pandas.DataFrame.loc
  • pandas.DataFrame.iloc
  • pandas.DataFrame.describe

@asishm
Copy link
Contributor

asishm commented Jan 10, 2024

I wrote a comment but accidentally deleted it 🤦‍♂️

tl;dr - if a method is passed to validate_docstrings.py it ignores all the other arguments and returns all errors.

@lukasld
Copy link

lukasld commented Jan 10, 2024

Im new to this and maybe I overlook something fundamental:
I wanted to take on some of these docstrings, I however run into an issue and am not sure what I am doing wrong.

After executing:

python3 validate_docstrings.py --format=actions --errors=EX03 pandas.errors.SpecificationError

I get a list of flake8 - errors:

flake8 error: line 4, col 40: E261 at least two spaces before inline comment
...

However, after adding an extra space and saving the docstring for SpecificationError in this case in

./pandas/errors/__init__.py

and rerunning the above validation_docstrings.py again, the script returns the same errors, as if the change had no effect.

@lukasld
Copy link

lukasld commented Jan 10, 2024

Otherwise id take:

  • pandas.errors.SettingWithCopyWarning
  • pandas.errors.SpecificationError
  • pandas.errors.UndefinedVariableError

@tiffanyxiao
Copy link
Contributor

Working on:

  • pandas.errors.DatabaseError
  • pandas.errors.IndexingError
  • pandas.errors.InvalidColumnName
    Thank you!

@svrashank
Copy link
Contributor

So for pandas.Series.plot.line its giving following errors:

  • Unknown parameters {'color'}
  • flake8 error: line 4, col 4: E121 continuation line under-indented for hanging indent
  • flake8 error: line 6, col 4: E123 closing bracket does not match indentation of opening bracket's line
    I am new to this , can someone help me in understanding 'line 4,col 4' of what ? I can't seem to locate where it is pointing me towards. Thanks in advance

@jordan-d-murphy
Copy link
Contributor

Hi all, I've opened a PR for the following

pandas.core.groupby.DataFrameGroupBy.describe
pandas.core.groupby.DataFrameGroupBy.idxmax
pandas.core.groupby.DataFrameGroupBy.idxmin
pandas.core.groupby.DataFrameGroupBy.value_counts

@jordan-d-murphy
Copy link
Contributor

Hi all, I've opened a PR for the following

pandas.core.resample.Resampler.fillna
pandas.core.groupby.SeriesGroupBy.describe
pandas.DataFrame.last
pandas.DataFrame.plot.hist

@jordan-d-murphy
Copy link
Contributor

Hi all, I've opened a PR for the following

pandas.DataFrame.idxmax
pandas.DataFrame.idxmin
pandas.DataFrame.pivot

@svrashank
Copy link
Contributor

Im new to this and maybe I overlook something fundamental: I wanted to take on some of these docstrings, I however run into an issue and am not sure what I am doing wrong.

After executing:

python3 validate_docstrings.py --format=actions --errors=EX03 pandas.errors.SpecificationError

I get a list of flake8 - errors:

flake8 error: line 4, col 40: E261 at least two spaces before inline comment
...

However, after adding an extra space and saving the docstring for SpecificationError in this case in

./pandas/errors/__init__.py

and rerunning the above validation_docstrings.py again, the script returns the same errors, as if the change had no effect.

Even I am new and facing similar issue . Even after making the changes the error logs don't change

@asishm
Copy link
Contributor

asishm commented Jan 11, 2024

Explanation of what to look for:

EX03 is the errors for the example code-blocks in a function/method's documentation

for pandas.errors.SpecificationError the examples show:

Examples
--------
>>> df = pd.DataFrame({'A': [1, 1, 1, 2, 2],
...                    'B': range(5),
...                    'C': range(5)})
>>> df.groupby('A').B.agg({'foo': 'count'}) # doctest: +SKIP
... # SpecificationError: nested renamer is not supported

>>> df.groupby('A').agg({'B': {'foo': ['sum', 'max']}}) # doctest: +SKIP
... # SpecificationError: nested renamer is not supported

>>> df.groupby('A').agg(['min', 'min']) # doctest: +SKIP
... # SpecificationError: nested renamer is not supported

line 4 here would be the 4th line in the examples which is >>> df.groupby('A').B.agg({'foo': 'count'}) # doctest: +SKIP

line 6 would be >>> df.groupby('A').agg({'B': {'foo': ['sum', 'max']}}) # doctest: +SKIP

@natmokval
Copy link
Contributor Author

EX03 is the errors for the example code-blocks in a function/method's documentation

for pandas.errors.SpecificationError the examples show:

Examples
--------
>>> df = pd.DataFrame({'A': [1, 1, 1, 2, 2],
...                    'B': range(5),
...                    'C': range(5)})
>>> df.groupby('A').B.agg({'foo': 'count'}) # doctest: +SKIP
... # SpecificationError: nested renamer is not supported

>>> df.groupby('A').agg({'B': {'foo': ['sum', 'max']}}) # doctest: +SKIP
... # SpecificationError: nested renamer is not supported

>>> df.groupby('A').agg(['min', 'min']) # doctest: +SKIP
... # SpecificationError: nested renamer is not supported

line 4 here would be the 4th line in the examples which is >>> df.groupby('A').B.agg({'foo': 'count'}) # doctest: +SKIP

line 6 would be >>> df.groupby('A').agg({'B': {'foo': ['sum', 'max']}}) # doctest: +SKIP

@asishm what kind of flake8 errors did you get?

@asishm
Copy link
Contributor

asishm commented Jan 11, 2024

        flake8 error: line 4, col 40: E261 at least two spaces before inline comment
        flake8 error: line 6, col 52: E261 at least two spaces before inline comment
        flake8 error: line 8, col 36: E261 at least two spaces before inline comment

there's also a non flake8 error reported See Also section not found for the above.

see #56804 (comment) and #56827 (comment) for details @natmokval

@jordan-d-murphy
Copy link
Contributor

@asishm can you try adding a space before each of these # symbols image

@asishm
Copy link
Contributor

asishm commented Jan 11, 2024

yeah that's the fix - sorry if it wasn't clear, it was more of an explanation for people that had trouble figuring out the lines affected.

@jordan-d-murphy
Copy link
Contributor

Okay! Makes sense. Hope the photo might help someone else then 🙂

@lukasld
Copy link

lukasld commented Jan 11, 2024

Im new to this and maybe I overlook something fundamental: I wanted to take on some of these docstrings, I however run into an issue and am not sure what I am doing wrong.
After executing:

python3 validate_docstrings.py --format=actions --errors=EX03 pandas.errors.SpecificationError

I get a list of flake8 - errors:

flake8 error: line 4, col 40: E261 at least two spaces before inline comment
...

However, after adding an extra space and saving the docstring for SpecificationError in this case in

./pandas/errors/__init__.py

and rerunning the above validation_docstrings.py again, the script returns the same errors, as if the change had no effect.

Even I am new and facing similar issue . Even after making the changes the error logs don't change

Maybe it numpydoc or another library is caching the results? Not sure...

@jordan-d-murphy
Copy link
Contributor

jordan-d-murphy commented Jan 11, 2024

I've fixed the following:

pandas.Series.to_latex
pandas.read_pickle
pandas.DataFrame.to_latex
pandas.core.resample.Resampler.pipe

@jordan-betterman
Copy link

jordan-betterman commented Jan 22, 2024

I'll take:

pandas.errors.ValueLabelTypeMismatch
pandas.Timestamp.ceil
pandas.Timestamp.floor

@jordan-d-murphy
Copy link
Contributor

@jordan-betterman it looks like pandas.errors.ValueLabelTypeMismatch was fixed in #56867 and pandas.Timestamp.ceil and pandas.Timestamp.floor were both fixed in #56879

@jordan-betterman
Copy link

@jordan-d-murphy sounds good! Are there any others that need fixing?

@jordan-d-murphy
Copy link
Contributor

yes, if you check https://github.com/pandas-dev/pandas/blob/main/ci/code_checks.sh on the main branch, it looks like these are still remaining:

    MSG='Partially validate docstrings (EX03)' ;  echo $MSG
    $BASE_DIR/scripts/validate_docstrings.py --format=actions --errors=EX03 --ignore_functions \
        pandas.Series.plot.line \
        pandas.Series.to_sql \
        pandas.read_json \
        pandas.DataFrame.to_sql # There should be no backslash in the final line, please keep this comment in the last ignored function
    RET=$(($RET + $?)) ; echo $MSG "DONE"

@jordan-betterman
Copy link

@jordan-d-murphy great I'll take those!

@jordan-betterman
Copy link

@jordan-d-murphy how do I update the validate_docstrings script with the changes I made?

@jordan-d-murphy
Copy link
Contributor

@jordan-betterman when you are running it, you can use this format:
scripts/validate_docstrings.py --format=actions --errors=EX03 pandas.Series.plot.line

@jordan-d-murphy
Copy link
Contributor

if you want to see some good examples you can look at some of the MRs that have been merged in, such as #56989 or #56878

@jordan-betterman
Copy link

Im new to this and maybe I overlook something fundamental: I wanted to take on some of these docstrings, I however run into an issue and am not sure what I am doing wrong.

After executing:

python3 validate_docstrings.py --format=actions --errors=EX03 pandas.errors.SpecificationError

I get a list of flake8 - errors:

flake8 error: line 4, col 40: E261 at least two spaces before inline comment
...

However, after adding an extra space and saving the docstring for SpecificationError in this case in

./pandas/errors/__init__.py

and rerunning the above validation_docstrings.py again, the script returns the same errors, as if the change had no effect.

@jordan-d-murphy I'm having the same issues as this comment.

@asishm
Copy link
Contributor

asishm commented Jan 22, 2024

@jordan-betterman see if this comment helps - #56804 (comment)

@jordan-betterman
Copy link

jordan-betterman commented Jan 22, 2024

@asishm I saw that comment and was really helpful! I think the issue is when I run the validation_docstrings script it's not checking the most updated version of the docs.

This are the changes I made:

>>> df = pd.DataFrame({
...    'pig': [20, 18, 489, 675, 1776],
...    'horse': [4, 25, 281, 600, 1900],
... }, index=[1990, 1997, 2003, 2009, 2014])
>>> lines = df.plot.line()

This is what the script evaluated

>>> df = pd.DataFrame({
...    'pig': [20, 18, 489, 675, 1776],
...    'horse': [4, 25, 281, 600, 1900]
...    }, index=[1990, 1997, 2003, 2009, 2014])
>>> lines = df.plot.line()

Is there something in the setup that I need to change for it to work? This is my first day working in the repo, so it could be something on my end!

@jordan-d-murphy
Copy link
Contributor

@jordan-betterman can you post the script you're running and the output?

@jordan-betterman
Copy link

script:
scripts/validate_docstrings.py --format=actions --errors=EX03 pandas.Series.plot.line

Output:

################################################################################
##################### Docstring (pandas.Series.plot.line)  #####################
################################################################################

Plot Series or DataFrame as lines.

This function is useful to plot lines using DataFrame's values
as coordinates.

Parameters
----------
x : label or position, optional
    Allows plotting of one column versus another. If not specified,
    the index of the DataFrame is used.
y : label or position, optional
    Allows plotting of one column versus another. If not specified,
    all numerical columns are used.
color : str, array-like, or dict, optional
    The color for each of the DataFrame's columns. Possible values are:

    - A single color string referred to by name, RGB or RGBA code,
        for instance 'red' or '#a98d19'.

    - A sequence of color strings referred to by name, RGB or RGBA
        code, which will be used for each column recursively. For
        instance ['green','yellow'] each column's line will be filled in
        green or yellow, alternatively. If there is only a single column to
        be plotted, then only the first color from the color list will be
        used.

    - A dict of the form {column name : color}, so that each column will be
        colored accordingly. For example, if your columns are called `a` and
        `b`, then passing {'a': 'green', 'b': 'red'} will color lines for
        column `a` in green and lines for column `b` in red.

**kwargs
    Additional keyword arguments are documented in
    :meth:`DataFrame.plot`.

Returns
-------
matplotlib.axes.Axes or np.ndarray of them
    An ndarray is returned with one :class:`matplotlib.axes.Axes`
    per column when ``subplots=True``.

        See Also
        --------
        matplotlib.pyplot.plot : Plot y versus x as lines and/or markers.

        Examples
        --------

        .. plot::
            :context: close-figs

            >>> s = pd.Series([1, 3, 2])
            >>> s.plot.line()  # doctest: +SKIP

        .. plot::
            :context: close-figs

            The following example shows the populations for some animals
            over the years.

            >>> df = pd.DataFrame({
            ...    'pig': [20, 18, 489, 675, 1776],
            ...    'horse': [4, 25, 281, 600, 1900]
            ...    }, index=[1990, 1997, 2003, 2009, 2014])
            >>> lines = df.plot.line()

        .. plot::
           :context: close-figs

           An example with subplots, so an array of axes is returned.

           >>> axes = df.plot.line(subplots=True)
           >>> type(axes)
           <class 'numpy.ndarray'>

        .. plot::
           :context: close-figs

           Let's repeat the same example, but specifying colors for
           each column (in this case, for each animal).

           >>> axes = df.plot.line(
           ...     subplots=True, color={"pig": "pink", "horse": "#742802"}
           ... )

        .. plot::
            :context: close-figs

            The following example shows the relationship between both
            populations.

            >>> lines = df.plot.line(x='pig', y='horse')

################################################################################
################################## Validation ##################################
################################################################################

3 Errors found for `pandas.Series.plot.line`:
	PR02	Unknown parameters {'color'}
	EX03	flake8 error: line 4, col 4: E121 continuation line under-indented for hanging indent
	EX03	flake8 error: line 6, col 4: E123 closing bracket does not match indentation of opening bracket's line

@jordan-d-murphy
Copy link
Contributor

jordan-d-murphy commented Jan 22, 2024

hmmm okay, yes your approach seems correct, but when I ran this on the latest branch I'm seeing no EX03 errors for pandas.Series.plot.line

I've been using the following approach to set up my dev env and working branch before working on my PRs, which ensures my branch is up to date with the latest version of main.

can you try running these commands, and then try running your script again and see if it helps?


Updating the development environment

git checkout main
git merge upstream/main
mamba activate pandas-dev
mamba env update -f environment.yml --prune

Creating a feature branch

git checkout main
git pull upstream main --ff-only
git checkout -b shiny-new-feature (NOTE: shiny-new-feature should be your new working branch name)


After running the above commands, running the following script
scripts/validate_docstrings.py --format=actions --errors=EX03 pandas.Series.plot.line
results in output that ends with this:

################################################################################
################################## Validation ##################################
################################################################################

1 Errors found for `pandas.Series.plot.line`:
	PR02	Unknown parameters {'color'}

@jordan-d-murphy
Copy link
Contributor

Also another note, once all the EX03 errors are cleared, the next step is to remove the function from the list in ci/code_checks.sh
Screenshot 2024-01-22 at 2 31 17 PM

@jordan-betterman
Copy link

I tried all of that and still didn't work. I'm going to stop working on it and find another issue to take on. Thanks for all the help!

@jordan-d-murphy
Copy link
Contributor

Okay, sorry to hear it didn't work out. Thanks for giving it a shot!

@jordan-d-murphy
Copy link
Contributor

I've opened a PR for the remaining 4 functions. I believe this will close this issue.

pandas.Series.plot.line
pandas.Series.to_sql
pandas.read_json
pandas.DataFrame.to_sql

@jordan-d-murphy
Copy link
Contributor

@natmokval now that #57025 has been merged in, I believe we can close this issue. Please let me know if you see any additional work that needs to be done, I'd be happy to clean up any loose ends!

@natmokval
Copy link
Contributor Author

@jordan-d-murphy, I agree, seems we fixed all flake8 errors. Thank you for working on this issue with intensity and helping other contributors. Now, we can close this issue.

@MarcoGorelli
Copy link
Member

Awesome, thanks all!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Code Style Code style, linting, code_checks Docs good first issue
Projects
None yet