Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DOC: Enforce Numpy Docstring Validation | pandas.DataFrame #58065

Closed
jordan-d-murphy opened this issue Mar 29, 2024 · 34 comments · Fixed by #59262
Closed

DOC: Enforce Numpy Docstring Validation | pandas.DataFrame #58065

jordan-d-murphy opened this issue Mar 29, 2024 · 34 comments · Fixed by #59262

Comments

@jordan-d-murphy
Copy link
Contributor

jordan-d-murphy commented Mar 29, 2024

DOC: Enforce Numpy Docstring Validation (Parent Issue) #58063

Pandas has a script for validating docstrings in code_checks.sh. Currently, some methods fail some of these checks.

pandas.DataFrame

pandas/ci/code_checks.sh

Lines 82 to 134 in c468028

-i "pandas.DataFrame.__dataframe__ SA01" \
-i "pandas.DataFrame.__iter__ SA01" \
-i "pandas.DataFrame.assign SA01" \
-i "pandas.DataFrame.at_time PR01" \
-i "pandas.DataFrame.axes SA01" \
-i "pandas.DataFrame.backfill PR01,SA01" \
-i "pandas.DataFrame.bfill SA01" \
-i "pandas.DataFrame.columns SA01" \
-i "pandas.DataFrame.copy SA01" \
-i "pandas.DataFrame.droplevel SA01" \
-i "pandas.DataFrame.dtypes SA01" \
-i "pandas.DataFrame.ffill SA01" \
-i "pandas.DataFrame.first_valid_index SA01" \
-i "pandas.DataFrame.get SA01" \
-i "pandas.DataFrame.hist RT03" \
-i "pandas.DataFrame.infer_objects RT03" \
-i "pandas.DataFrame.keys SA01" \
-i "pandas.DataFrame.kurt RT03,SA01" \
-i "pandas.DataFrame.kurtosis RT03,SA01" \
-i "pandas.DataFrame.last_valid_index SA01" \
-i "pandas.DataFrame.mask RT03" \
-i "pandas.DataFrame.max RT03" \
-i "pandas.DataFrame.mean RT03,SA01" \
-i "pandas.DataFrame.median RT03,SA01" \
-i "pandas.DataFrame.min RT03" \
-i "pandas.DataFrame.pad PR01,SA01" \
-i "pandas.DataFrame.plot PR02,SA01" \
-i "pandas.DataFrame.pop SA01" \
-i "pandas.DataFrame.prod RT03" \
-i "pandas.DataFrame.product RT03" \
-i "pandas.DataFrame.reorder_levels SA01" \
-i "pandas.DataFrame.sem PR01,RT03,SA01" \
-i "pandas.DataFrame.skew RT03,SA01" \
-i "pandas.DataFrame.sparse PR01,SA01" \
-i "pandas.DataFrame.sparse.density SA01" \
-i "pandas.DataFrame.sparse.from_spmatrix SA01" \
-i "pandas.DataFrame.sparse.to_coo SA01" \
-i "pandas.DataFrame.sparse.to_dense SA01" \
-i "pandas.DataFrame.std PR01,RT03,SA01" \
-i "pandas.DataFrame.sum RT03" \
-i "pandas.DataFrame.swapaxes PR01,SA01" \
-i "pandas.DataFrame.swaplevel SA01" \
-i "pandas.DataFrame.to_feather SA01" \
-i "pandas.DataFrame.to_markdown SA01" \
-i "pandas.DataFrame.to_parquet RT03" \
-i "pandas.DataFrame.to_period SA01" \
-i "pandas.DataFrame.to_timestamp SA01" \
-i "pandas.DataFrame.tz_convert SA01" \
-i "pandas.DataFrame.tz_localize SA01" \
-i "pandas.DataFrame.unstack RT03" \
-i "pandas.DataFrame.value_counts RT03" \
-i "pandas.DataFrame.var PR01,RT03,SA01" \
-i "pandas.DataFrame.where RT03" \

The task is:

  1. take 1-5 methods

  2. run: scripts/validate_docstrings.py --format=actions <method-name>

example command: scripts/validate_docstrings.py --format=actions pandas.Categorical.__array__
example output:

################################################################################
################################## Validation ##################################
################################################################################

2 Errors found for `pandas.Categorical.__array__`:
	ES01	No extended summary found
	SA01	See Also section not found
  1. check if validation docstrings passes for those methods, and if it’s necessary fix the docstrings according to whatever error is reported. Note: We've chosen to ignore ES01 errors, these are not required to be fixed.

  2. remove those methods from code_checks.sh if all errors are cleared and the docstring is correct, otherwise, remove the specific error that was fixed from the list of errors for that method.

  3. commit, push, open pull request

Please don't comment take as multiple people can work on this issue. You also don't need to ask for permission to work on this, just comment on which methods are you going to work : )

If you're new contributor, please check the contributing guide

thanks @datapythonista for the inspiration for this issue!

@YashpalAhlawat
Copy link
Contributor

opened a fix for pandas.DataFrame.where

@Aloqeely
Copy link
Member

Going to remove pandas.DataFrame.swapaxes, pandas.DataFrame.pad and pandas.DataFrame.backfill which are all deprecated

@bergnerjonas
Copy link
Contributor

bergnerjonas commented Mar 31, 2024

Will work on pandas.DataFrame.unstack, pandas.DataFrame.value_counts and pandas.DataFrame.tz_localize

@bergnerjonas
Copy link
Contributor

Continue with pandas.DataFrame.to_period ,pandas.DataFrame.to_timestamp ,pandas.DataFrame.tz_convert

mroeschke pushed a commit that referenced this issue Apr 1, 2024
…tamp, pandas.DataFrame.tz_convert #58065 (#58101)

* Add See Also section to to_period method.

* Add See Also section to to_timestamp method.

* Add See Also section to tz_convert method.

* fix typo in to_timestamp

* Fix formatting of See Also

* Remove fixed methods from code check ignore list.

* Fix formatting

* Revert accidentally included docstring changes.

* fix merge issues

* Fix pre commit hooks

* Fix line break issues.
@shriyakalakata
Copy link
Contributor

Working on pandas.DataFrame.assign, pandas.DataFrame.bfill, and pandas.DataFrame.ffill

@shriyakalakata
Copy link
Contributor

shriyakalakata commented Apr 17, 2024

Will work on pandas.DataFrame.get and pandas.DataFrame.dtypes

@shriyakalakata
Copy link
Contributor

shriyakalakata commented Apr 17, 2024

Will work on pandas.DataFrame.copy, pandas.DataFrame.first_valid_index, pandas.DataFrame.last_valid_index, and pandas.DataFrame.keys

@gboeker
Copy link
Contributor

gboeker commented Apr 18, 2024

Working on pandas.DataFrame.sparse, pandas.DataFrame.sparse.density, pandas.DataFrame.sparse.from_spmatrix, pandas.DataFrame.sparse.to_coo, and pandas.DataFrame.sparse.to_dense

@KeiOshima
Copy link
Contributor

KeiOshima commented Apr 19, 2024

Working on:

pandas.DataFrame.columns
pandas.DataFrame.pop

@KeiOshima
Copy link
Contributor

KeiOshima commented Apr 21, 2024

working on:

pandas.DataFrame.to_feather 

@gboeker
Copy link
Contributor

gboeker commented Apr 21, 2024

working on

pandas.DataFrame.mean
pandas.DataFrame.median
pandas.DataFrame.plot
pandas.DataFrame.pop

@gboeker
Copy link
Contributor

gboeker commented Apr 21, 2024

working on

pandas.DataFrame.__iter__
pandas.DataFrame.columns
pandas.DataFrame.droplevel

@Brett-Dixon
Copy link
Contributor

I will check out:

    -i "pandas.DataFrame.max RT03" \
    -i "pandas.DataFrame.mean RT03,SA01" \
    -i "pandas.DataFrame.median RT03,SA01" \
    -i "pandas.DataFrame.min RT03" \

@anishfish2
Copy link
Contributor

anishfish2 commented May 27, 2024

I'll check out:

        -i "pandas.DataFrame.plot PR02,SA01" \

@mroeschke, @Aloqeely is there anything else I need to add?

@enesyesil
Copy link

Hey, is this issue open? would like contribute as a beginner. Thank you

@Aloqeely
Copy link
Member

Yes it's still open. Good luck!

@shriyase
Copy link

Hey, I'd like to contribute to this issue if it's still open

@Aloqeely
Copy link
Member

Yes! You can see ci/code_checks.sh for all the docstrings that need to be fixed.

@shriyase
Copy link

I will work on -i "pandas.DataFrame.value_counts RT03" \,
-i "pandas.DataFrame.var PR01,RT03,SA01" \,
-i "pandas.DataFrame.where RT03" \, -i "pandas.DataFrame.backfill PR01,SA01" \ and
-i "pandas.DataFrame.bfill SA01" \

@shriyase
Copy link

Is it only that the methods from the original issue post need to be checked? Or any methods in ci/code_checks.sh? Also when i check a few of the methods, I get additional errors that aren't listed in the documentation? Do I add those error codes too?

@Aloqeely
Copy link
Member

You can fix any method in that file.

Also when i check a few of the methods, I get additional errors that aren't listed in the documentation?

Not sure what you mean by additional errors.

@CollinClifford
Copy link

Does the validate_docstrings.py validation not work for anyone else? I'm getting an error on line 217, stating that there isn't enough values to unpack (expected 4 got 1) for all of the docstrings I've checked.

@Aloqeely
Copy link
Member

It works for me. Can you post the command you ran and the error message?

@deekapila
Copy link
Contributor

take

@ktseng4096
Copy link
Contributor

I'll work on pandas.Series.to_dict and pandas.Series.to_frame

@Aditya060
Copy link
Contributor

Does the validate_docstrings.py validation not work for anyone else? I'm getting an error on line 217, stating that there isn't enough values to unpack (expected 4 got 1) for all of the docstrings I've checked.

Try to install flake8 in the virtual environment that you are working in. That fixed the issue for me.

@deekapila
Copy link
Contributor

hi, I have a question. Is this issue only for pandas.DataFrame.xxxx methods or we can can use this to fix other as well. If there are no errors for pandas.DataFrame.xxx method, should we remove this from code_checks.sh ? e.g. pandas.DataFrae.copy says 'Docstring for "pandas.DataFrame.copy: correct. :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.