Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API: replace dropna=False option with na_sentinel=None in factorize #35852

Merged
Merged
Changes from 1 commit
Commits
Show all changes
48 commits
Select commit Hold shift + click to select a range
7e461a1
remove \n from docstring
charlesdong1991 Dec 3, 2018
1314059
fix conflicts
charlesdong1991 Jan 19, 2019
8bcb313
Merge remote-tracking branch 'upstream/master'
charlesdong1991 Jul 30, 2019
24c3ede
Merge remote-tracking branch 'upstream/master'
charlesdong1991 Jan 14, 2020
dea38f2
fix issue 17038
charlesdong1991 Jan 14, 2020
cd9e7ac
revert change
charlesdong1991 Jan 14, 2020
e5e912b
revert change
charlesdong1991 Jan 14, 2020
045a76f
Merge remote-tracking branch 'upstream/master'
charlesdong1991 Apr 6, 2020
a61367b
Merge remote-tracking branch 'upstream/master' into add_dropna_factorize
charlesdong1991 Aug 22, 2020
2c451a9
add dropna doc for factorize
charlesdong1991 Aug 22, 2020
31ffca7
rephrase the doc
charlesdong1991 Aug 22, 2020
32d029d
flake8
charlesdong1991 Aug 22, 2020
ba93eb6
fixup
charlesdong1991 Aug 22, 2020
2330908
use NaN
charlesdong1991 Aug 22, 2020
b9850a2
add dropna in series.factorize
charlesdong1991 Aug 22, 2020
3a18c65
black
charlesdong1991 Aug 22, 2020
9d7f1e6
add test
charlesdong1991 Aug 22, 2020
97fd2e6
linting
charlesdong1991 Aug 22, 2020
68527ef
linting
charlesdong1991 Aug 22, 2020
817905c
doct
charlesdong1991 Aug 22, 2020
364aeae
fix black
charlesdong1991 Aug 22, 2020
7cd0cce
fixup
charlesdong1991 Aug 22, 2020
2368223
fix doctest
charlesdong1991 Aug 22, 2020
8ca0652
add whatsnew
charlesdong1991 Aug 22, 2020
b452513
linting
charlesdong1991 Aug 22, 2020
344c072
fix test
charlesdong1991 Aug 22, 2020
1a5c358
try one time
charlesdong1991 Aug 22, 2020
81a0a7e
Merge remote-tracking branch 'upstream/master' into add_dropna_factorize
charlesdong1991 Aug 22, 2020
4607953
hide dropna and use na_sentinel=None
charlesdong1991 Aug 27, 2020
a7d4abd
Merge remote-tracking branch 'upstream/master' into add_dropna_factorize
charlesdong1991 Aug 27, 2020
3ef1459
update whatsnew
charlesdong1991 Aug 27, 2020
fca7300
rename test function
charlesdong1991 Aug 27, 2020
f0a6556
remove dropna from factorize
charlesdong1991 Aug 27, 2020
c81e79e
update doc
charlesdong1991 Aug 27, 2020
37ca034
docstring
charlesdong1991 Aug 27, 2020
4f0f226
update doc
charlesdong1991 Aug 27, 2020
5fcabe7
add comment
charlesdong1991 Aug 27, 2020
b7cd915
code change on review
charlesdong1991 Aug 28, 2020
8a2a1f7
update doc
charlesdong1991 Aug 28, 2020
e0c7342
code change on review
charlesdong1991 Aug 28, 2020
0480d9f
minor move in whatsnew
charlesdong1991 Aug 28, 2020
5c87cd1
add default example
charlesdong1991 Sep 1, 2020
b70e595
Merge remote-tracking branch 'upstream/master' into add_dropna_factorize
charlesdong1991 Sep 1, 2020
076fc10
doc
charlesdong1991 Sep 1, 2020
c945457
one more try
charlesdong1991 Sep 1, 2020
e6c7434
explicit doc
charlesdong1991 Sep 1, 2020
7533ab0
Merge remote-tracking branch 'upstream/master' into add_dropna_factorize
charlesdong1991 Sep 2, 2020
bf8641a
add space
charlesdong1991 Sep 2, 2020
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
hide dropna and use na_sentinel=None
charlesdong1991 committed Aug 27, 2020
commit 460795312b40f1e336a08f517972ca14dbd26e87
10 changes: 7 additions & 3 deletions pandas/core/algorithms.py
Original file line number Diff line number Diff line change
@@ -522,6 +522,12 @@ def _factorize_array(
Hint to the hashtable sizer.
"""
),
dropna=dedent(
"""\
dropna : bool, default True
Drop the NaN from the uniques of the values.
"""
),
)
def factorize(
values,
@@ -543,9 +549,7 @@ def factorize(
{values}{sort}
na_sentinel : int, default -1
Value to mark "not found".
{size_hint}
dropna : bool, default True
Drop the NaN from the uniques of the values.
{size_hint}{dropna}

Returns
-------
18 changes: 17 additions & 1 deletion pandas/core/base.py
Original file line number Diff line number Diff line change
@@ -1390,15 +1390,31 @@ def memory_usage(self, deep=False):
values="",
order="",
size_hint="",
dropna="",
sort=textwrap.dedent(
"""\
sort : bool, default False
Sort `uniques` and shuffle `codes` to maintain the
relationship.
"""
),
na_sentinel=textwrap.dedent(
"""\
na_sentinel : int or None, default -1
Value to mark "not found". If None, will drop the NaN
from the uniques of the values.
"""
),
)
def factorize(self, sort=False, na_sentinel=-1, dropna=True):
def factorize(self, sort: bool = False, na_sentinel: Optional[int] = -1):

# GH35667, na_sentinel=-1 and dropna to keep backward compatibility of
# algorithm.factorize so as not to break.
if na_sentinel is None:
na_sentinel = -1
dropna = False
else:
dropna = True
return algorithms.factorize(
self, sort=sort, na_sentinel=na_sentinel, dropna=dropna
)
15 changes: 5 additions & 10 deletions pandas/tests/base/test_factorize.py
Original file line number Diff line number Diff line change
@@ -28,19 +28,14 @@ def test_factorize(index_or_series_obj, sort):
tm.assert_index_equal(result_uniques, expected_uniques)


@pytest.mark.parametrize("dropna", [True, False])
def test_factorize_dropna(dropna):
def test_factorize_dropna():
# GH35667
values = np.array([1, 2, 1, np.nan])
ser = pd.Series(values)
codes, uniques = ser.factorize(dropna=dropna)

if dropna:
expected_codes = np.array([0, 1, 0, -1], dtype="int64")
expected_uniques = pd.Index([1.0, 2.0])
else:
expected_codes = np.array([0, 1, 0, 2], dtype="int64")
expected_uniques = pd.Index([1.0, 2.0, np.nan])
codes, uniques = ser.factorize(na_sentinel=None)

expected_codes = np.array([0, 1, 0, 2], dtype="int64")
expected_uniques = pd.Index([1.0, 2.0, np.nan])

tm.assert_numpy_array_equal(codes, expected_codes)
tm.assert_index_equal(uniques, expected_uniques)