Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API: replace dropna=False option with na_sentinel=None in factorize #35852

Merged

Conversation

charlesdong1991
Copy link
Member

@charlesdong1991 charlesdong1991 commented Aug 22, 2020

@pep8speaks
Copy link

pep8speaks commented Aug 22, 2020

Hello @charlesdong1991! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2020-09-02 14:12:23 UTC

@charlesdong1991 charlesdong1991 changed the title DOC: Add dropna documentation for pd.factorize DOC: Add dropna documentation for pd.factorize and dropna keyword in factorize method Aug 22, 2020
@charlesdong1991 charlesdong1991 changed the title DOC: Add dropna documentation for pd.factorize and dropna keyword in factorize method DOC: Add dropna documentation for pd.factorize and accept dropna keyword in Series.factorize Aug 22, 2020
@charlesdong1991 charlesdong1991 added Docs Testing pandas testing functions or related to the test suite labels Aug 22, 2020
@charlesdong1991 charlesdong1991 changed the title DOC: na_sentinel keyword in pd.factorize supports None to include NaNs in the unique of values DOC: na_sentinel keyword in pd.factorize supports None to include NaNs in the unique of values and remove dropna keyword Aug 28, 2020
@jorisvandenbossche jorisvandenbossche changed the title DOC: na_sentinel keyword in pd.factorize supports None to include NaNs in the unique of values and remove dropna keyword API: replace dropna=False option with na_sentinel=None in factorize Aug 28, 2020
Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor comment, otherwise lgtm. ping on green.

If NaN is in the values, and we want to include NaN in the uniques of the
values, it can be achieved by setting ``na_sentinel=None``.

>>> values = np.array([1, 2, 1, np.nan])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add the same example above this as well but w/o setting na_sentinel

values, it can be achieved by setting ``na_sentinel=None``.

>>> values = np.array([1, 2, 1, np.nan])
>>> codes, uniques = pd.factorize(values) # default
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
>>> codes, uniques = pd.factorize(values) # default
>>> codes, uniques = pd.factorize(values) # default na_sentinel=-1

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you can push this change and merge on green would be great.

array([ 0, 1, 0, -1])
>>> uniques
array([1., 2.])
>>> codes, uniques = pd.factorize(values, na_sentinel=None)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ideally put a blank line as othewise this is very hard to read

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done!

@jreback
Copy link
Contributor

jreback commented Sep 2, 2020

@charlesdong1991 can ignore that 38_locale failure

@charlesdong1991
Copy link
Member Author

thanks @jreback Will ignore and merge it!

@charlesdong1991 charlesdong1991 merged commit 19c3d40 into pandas-dev:master Sep 2, 2020
@charlesdong1991
Copy link
Member Author

@simonjayhawkins btw, i think should this be backported, right?

@simonjayhawkins
Copy link
Member

@meeseeksdev backport 1.1.x

@simonjayhawkins
Copy link
Member

@simonjayhawkins btw, i think should this be backported, right?

yep. thanks. meeseeks... is only very occasionally picking these up now.

@simonjayhawkins
Copy link
Member

i've got a branch going for automating the release process, but in the first instance am using it for release readiness, i'm planning to add a check shortly that the whatsnew on the backport and master are the same. so this should make sure that any PRs that change the whatsnew aren't missed.

https://github.com/simonjayhawkins/pandas-release/actions?query=workflow%3A%22Manual+Release%22

charlesdong1991 added a commit that referenced this pull request Sep 2, 2020
@jorisvandenbossche
Copy link
Member

i'm planning to add a check shortly that the whatsnew on the backport and master are the same. so this should make sure that any PRs that change the whatsnew aren't missed.

That would be cool!

@simonjayhawkins simonjayhawkins mentioned this pull request Sep 7, 2020
kesmit13 pushed a commit to kesmit13/pandas that referenced this pull request Nov 2, 2020
…andas-dev#35852)

* remove \n from docstring

* fix issue 17038

* revert change

* revert change

* add dropna doc for factorize

* rephrase the doc

* flake8

* fixup

* use NaN

* add dropna in series.factorize

* black

* add test

* linting

* linting

* doct

* fix black

* fixup

* fix doctest

* add whatsnew

* linting

* fix test

* try one time

* hide dropna and use na_sentinel=None

* update whatsnew

* rename test function

* remove dropna from factorize

* update doc

* docstring

* update doc

* add comment

* code change on review

* update doc

* code change on review

* minor move in whatsnew

* add default example

* doc

* one more try

* explicit doc

* add space
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API Design Docs Testing pandas testing functions or related to the test suite
Projects
None yet
Development

Successfully merging this pull request may close these issues.

DOC: document dropna kwarg of pd.factorize
6 participants