-
Notifications
You must be signed in to change notification settings - Fork 358
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adjust Series.mode to match pandas Series.mode #1995
Conversation
Codecov Report
@@ Coverage Diff @@
## master #1995 +/- ##
==========================================
- Coverage 94.52% 94.47% -0.05%
==========================================
Files 50 50
Lines 10952 10859 -93
==========================================
- Hits 10352 10259 -93
Misses 600 600
Continue to review full report at Codecov.
|
pser = pd.Series([0, 0, 1, 1, 1, np.nan, np.nan, np.nan]) | ||
kser = ks.from_pandas(pser) | ||
self.assert_eq(kser.mode(), pser.mode()) | ||
self.assert_eq(kser.mode(False).sort_values().values, pser.mode(False).sort_values().values) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like pandas < 0.24 doesn't support any parameter for Series.mode()
Maybe we can separate the test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you! Modified.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Otherwise, LGTM.
if LooseVersion(pd.__version__) >= LooseVersion("0.24"): | ||
# The `dropna` argument is added in pandas 0.24. | ||
self.assert_eq( | ||
kser.mode(False).sort_values().values, pser.mode(False).sort_values().values |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we use .values
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The order of elements in the result series is different, for example
>>> pser = pd.Series([0, 0, 1, 1, 1, np.nan, np.nan, np.nan])
>>> kser = ks.from_pandas(pser)
>>> pser.mode(False).sort_values()
0 1.0
1 NaN
dtype: float64
>>> kser.mode(False).sort_values()
1 1.0
0 NaN
dtype: float64
Is this difference acceptable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, shall we reset_index()
then?
kser.mode(False).sort_values().reset_index()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Btw, shall we use the named argument to make the argument clear?
kser.mode(dropna=False)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good! Modified!
if LooseVersion(pd.__version__) >= LooseVersion("0.24"): | ||
# The `dropna` argument is added in pandas 0.24. | ||
self.assert_eq( | ||
kser.mode(False).sort_values().values, pser.mode(False).sort_values().values |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Modified.
Thanks! merging. |
Currently, Series.mode reserves the name of Series in the result, whereas pandas Series.mode doesn't:
In addition, unit tests are added.