Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix fillna not to change index values. #1241

Merged
merged 3 commits into from
Jan 30, 2020
Merged

Conversation

ueshin
Copy link
Collaborator

@ueshin ueshin commented Jan 29, 2020

DataFrame.fillna() should not change the index values.

>>> import pandas as pd
>>> import numpy as np
>>> pdf = pd.DataFrame({'x': [np.nan, 2, 3, 4, np.nan, 6],
...                     'y': [1, 2, np.nan, 4, np.nan, np.nan],
...                     'z': [1, 2, 3, 4, np.nan, np.nan]}).set_index(['x', 'y'])
>>>
>>> pdf.fillna(-1)
           z
x   y
NaN 1.0  1.0
2.0 2.0  2.0
3.0 NaN  3.0
4.0 4.0  4.0
NaN NaN -1.0
6.0 NaN -1.0
>>> pdf.fillna({'x': -1, 'y': -2, 'z': -5})
           z
x   y
NaN 1.0  1.0
2.0 2.0  2.0
3.0 NaN  3.0
4.0 4.0  4.0
NaN NaN -5.0
6.0 NaN -5.0

whereas:

>>> ks.from_pandas(pdf).fillna(-1)
             z
x    y
-1.0  1.0  1.0
 2.0  2.0  2.0
 3.0 -1.0  3.0
 4.0  4.0  4.0
-1.0 -1.0 -1.0
 6.0 -1.0 -1.0
>>> ks.from_pandas(pdf).fillna({'x': -1, 'y': -2, 'z': -5})
             z
x    y
-1.0  1.0  1.0
 2.0  2.0  2.0
 3.0 -2.0  3.0
 4.0  4.0  4.0
-1.0 -2.0 -5.0
 6.0 -2.0 -5.0

@codecov-io
Copy link

codecov-io commented Jan 29, 2020

Codecov Report

Merging #1241 into master will decrease coverage by 1.27%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1241      +/-   ##
==========================================
- Coverage   95.05%   93.78%   -1.28%     
==========================================
  Files          35       35              
  Lines        7218     7219       +1     
==========================================
- Hits         6861     6770      -91     
- Misses        357      449      +92
Impacted Files Coverage Δ
databricks/koalas/frame.py 96.84% <100%> (+0.11%) ⬆️
databricks/koalas/usage_logging/__init__.py 24.32% <0%> (-72.98%) ⬇️
databricks/koalas/usage_logging/usage_logger.py 50% <0%> (-50%) ⬇️
databricks/koalas/__init__.py 78.72% <0%> (-6.39%) ⬇️
databricks/conftest.py 94% <0%> (-4%) ⬇️
databricks/koalas/plot.py 94.28% <0%> (+0.95%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update aca3d87...74b4a0e. Read the comment docs.

@itholic
Copy link
Contributor

itholic commented Jan 30, 2020

LGTM 👍

@ueshin
Copy link
Collaborator Author

ueshin commented Jan 30, 2020

Thanks! I'd merge this for now since I found another issue related to fillna. I'll submit the PR soon.
Please feel free to leave comments if any.

@ueshin ueshin merged commit 4bdfe2d into databricks:master Jan 30, 2020
@ueshin ueshin deleted the fillna branch January 30, 2020 17:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants