Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Series.div when divide by zero #1412

Merged
merged 10 commits into from
Apr 16, 2020
Merged

Conversation

itholic
Copy link
Contributor

@itholic itholic commented Apr 9, 2020

Resolves #1411

>>> pser
0    100.0
1      NaN
2   -300.0
3      NaN
4    500.0
5   -700.0
Name: Koalas, dtype: float64

>>> pser.div(0)
0    inf
1    NaN
2   -inf
3    NaN
4    inf
5   -inf
Name: Koalas, dtype: float64

>>> ks.from_pandas(pser).div(0)
0    inf
1    NaN
2   -inf
3    NaN
4    inf
5   -inf
Name: Koalas, dtype: float64

@@ -184,8 +184,14 @@ def __sub__(self, other):
return _column_op(spark.Column.__sub__)(self, other)

__mul__ = _column_op(spark.Column.__mul__)
__div__ = _numpy_column_op(spark.Column.__div__)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI: In Python 3.x, __div__ and __rdiv__ no more supported. Using __truediv__ and __rtruediv__instead.

@itholic itholic changed the title Fix Series.div when divide by zero [WIP] Fix Series.div when divide by zero Apr 9, 2020
databricks/koalas/base.py Show resolved Hide resolved
databricks/koalas/base.py Outdated Show resolved Hide resolved
rectangle 0 36
circle 0.0 36.0
triangle 0.0 18.0
rectangle 0.0 36.0
Copy link
Contributor Author

@itholic itholic Apr 10, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For floordiv and rfloordiv, the result will always be a float since we cannot predict the result type of each column which is determined before executes the job by Spark.

Copy link
Contributor Author

@itholic itholic Apr 10, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In short, since there is a possibility that the result can be a Infinity which is float, the result always be a float.

Someone can give an opinion about this?

@itholic itholic changed the title [WIP] Fix Series.div when divide by zero Fix Series.div when divide by zero Apr 10, 2020
@codecov-io
Copy link

codecov-io commented Apr 10, 2020

Codecov Report

Merging #1412 into master will decrease coverage by 2.29%.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1412      +/-   ##
==========================================
- Coverage   95.13%   92.84%   -2.30%     
==========================================
  Files          34       34              
  Lines        7958     7966       +8     
==========================================
- Hits         7571     7396     -175     
- Misses        387      570     +183     
Impacted Files Coverage Δ
databricks/koalas/frame.py 93.39% <ø> (-3.23%) ⬇️
databricks/koalas/base.py 97.64% <100.00%> (+0.07%) ⬆️
databricks/koalas/usage_logging/__init__.py 24.32% <0.00%> (-70.28%) ⬇️
databricks/koalas/usage_logging/usage_logger.py 50.00% <0.00%> (-50.00%) ⬇️
databricks/koalas/__init__.py 73.07% <0.00%> (-19.24%) ⬇️
databricks/conftest.py 88.67% <0.00%> (-7.55%) ⬇️
databricks/koalas/utils.py 93.90% <0.00%> (-2.54%) ⬇️
databricks/koalas/namespace.py 85.00% <0.00%> (-1.58%) ⬇️
databricks/koalas/plot.py 93.33% <0.00%> (-0.96%) ⬇️
... and 3 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update cd500d2...b0fd81f. Read the comment docs.

Copy link
Collaborator

@ueshin ueshin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.
Could you fix the test failure?

@itholic
Copy link
Contributor Author

itholic commented Apr 15, 2020

I think I ran into some problem while I'm trying to reproduce the failure.

Our test is failing in Python 3.5 with PyArrow 0.16.0 like the below,

스크린샷 2020-04-16 오전 7 40 10

After setting my test environment to equivalent to the GitHub Actions (Python 3.5, PySpark 2.3.4, pandas 0.23.4, PyArrow 0.16.0), It raises exception like the below when I import the Koalas package.

>>> import databricks.koalas as ks
WARNING:root:Found pyspark version "2.3.4" installed. pyspark>=2.4.0 is recommended.
ImportError: No module named 'numpy.core._multiarray_umath'
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File ".../koalas/databricks/koalas/__init__.py", line 46, in <module>
    import pyarrow
  File "/usr/local/anaconda3/envs/koalas3.5/lib/python3.5/site-packages/pyarrow/__init__.py", line 49, in <module>
    from pyarrow.lib import cpu_count, set_cpu_count
  File "pyarrow/lib.pyx", line 40, in init pyarrow.lib
ImportError: numpy.core.multiarray failed to import

Could somebody help to solve this??

@itholic
Copy link
Contributor Author

itholic commented Apr 15, 2020

Oh, It was problem with version of numpy.

now It works after changing numpy version from 1.15.2 to 1.17.0


>>> df.rfloordiv(10)
>>> df.rfloordiv(10) # doctest: +SKIP
Copy link
Contributor Author

@itholic itholic Apr 16, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test is moved to tests/test_dataframe.py::DataFrameTest::test_rfloordiv because this behaviour can be different depends on the version of pandas.

  • pandas < 0.24.0
>>> df.rfloordiv(10)
             angles  degrees
circle          inf      0.0
triangle   3.000000      0.0
rectangle  2.000000      0.0
  • pandas >= 0.24.0
>>> df.rfloordiv(10)
           angles  degrees
circle        inf      0.0
triangle      3.0      0.0
rectangle     2.0      0.0

Copy link
Collaborator

@ueshin ueshin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Otherwise, LGTM.

databricks/koalas/tests/test_dataframe.py Outdated Show resolved Hide resolved
databricks/koalas/tests/test_series.py Outdated Show resolved Hide resolved
@ueshin
Copy link
Collaborator

ueshin commented Apr 16, 2020

Thanks! merging

@ueshin ueshin merged commit bf1bb4e into databricks:master Apr 16, 2020
@itholic itholic deleted the fix_div_zero branch April 21, 2020 07:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Incompatible behavior with pandas for Series.div() when divide by zero.
3 participants