Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adjust Series.mode to match pandas Series.mode #1995

Merged
merged 3 commits into from
Jan 7, 2021

Conversation

xinrong-meng
Copy link
Contributor

@xinrong-meng xinrong-meng commented Jan 5, 2021

Currently, Series.mode reserves the name of Series in the result, whereas pandas Series.mode doesn't:

>>> kser1
x    1
y    2
Name: z, dtype: int64
>>> kser1.mode()
0    1
1    2
Name: z, dtype: int64. # Reserve name
>>> pser1 = kser1.to_pandas()
>>> pser1.mode()
0    1
1    2
dtype: int64. # Not reserve name

In addition, unit tests are added.

@codecov-io
Copy link

codecov-io commented Jan 6, 2021

Codecov Report

Merging #1995 (101c114) into master (ae5c8d8) will decrease coverage by 0.04%.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1995      +/-   ##
==========================================
- Coverage   94.52%   94.47%   -0.05%     
==========================================
  Files          50       50              
  Lines       10952    10859      -93     
==========================================
- Hits        10352    10259      -93     
  Misses        600      600              
Impacted Files Coverage Δ
databricks/koalas/series.py 96.74% <100.00%> (-0.01%) ⬇️
databricks/koalas/__init__.py 85.93% <0.00%> (-4.69%) ⬇️
databricks/koalas/testing/utils.py 79.34% <0.00%> (-1.37%) ⬇️
...ricks/koalas/tests/plot/test_series_plot_plotly.py 96.22% <0.00%> (-0.39%) ⬇️
databricks/koalas/generic.py 92.48% <0.00%> (-0.14%) ⬇️
databricks/koalas/plot/matplotlib.py 93.23% <0.00%> (-0.12%) ⬇️
databricks/koalas/mlflow.py 95.12% <0.00%> (-0.12%) ⬇️
databricks/koalas/indexing.py 92.53% <0.00%> (-0.10%) ⬇️
databricks/koalas/groupby.py 91.52% <0.00%> (-0.07%) ⬇️
databricks/koalas/plot/core.py 91.66% <0.00%> (-0.06%) ⬇️
... and 17 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update ae5c8d8...101c114. Read the comment docs.

pser = pd.Series([0, 0, 1, 1, 1, np.nan, np.nan, np.nan])
kser = ks.from_pandas(pser)
self.assert_eq(kser.mode(), pser.mode())
self.assert_eq(kser.mode(False).sort_values().values, pser.mode(False).sort_values().values)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like pandas < 0.24 doesn't support any parameter for Series.mode()

Maybe we can separate the test.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! Modified.

Copy link
Collaborator

@ueshin ueshin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Otherwise, LGTM.

if LooseVersion(pd.__version__) >= LooseVersion("0.24"):
# The `dropna` argument is added in pandas 0.24.
self.assert_eq(
kser.mode(False).sort_values().values, pser.mode(False).sort_values().values
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we use .values?

Copy link
Contributor Author

@xinrong-meng xinrong-meng Jan 6, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The order of elements in the result series is different, for example

>>> pser = pd.Series([0, 0, 1, 1, 1, np.nan, np.nan, np.nan])
>>> kser = ks.from_pandas(pser)
>>> pser.mode(False).sort_values()
0    1.0
1    NaN
dtype: float64
>>> kser.mode(False).sort_values()
1    1.0
0    NaN
dtype: float64

Is this difference acceptable?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, shall we reset_index() then?

kser.mode(False).sort_values().reset_index()

Copy link
Collaborator

@ueshin ueshin Jan 6, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Btw, shall we use the named argument to make the argument clear?

kser.mode(dropna=False)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good! Modified!

if LooseVersion(pd.__version__) >= LooseVersion("0.24"):
# The `dropna` argument is added in pandas 0.24.
self.assert_eq(
kser.mode(False).sort_values().values, pser.mode(False).sort_values().values
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Modified.

@ueshin
Copy link
Collaborator

ueshin commented Jan 7, 2021

Thanks! merging.

@ueshin ueshin merged commit ddbdc9a into databricks:master Jan 7, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants