Fix Series.div when divide by zero #1412

itholic · 2020-04-09T03:45:47Z

Resolves #1411

>>> pser
0    100.0
1      NaN
2   -300.0
3      NaN
4    500.0
5   -700.0
Name: Koalas, dtype: float64

>>> pser.div(0)
0    inf
1    NaN
2   -inf
3    NaN
4    inf
5   -inf
Name: Koalas, dtype: float64

>>> ks.from_pandas(pser).div(0)
0    inf
1    NaN
2   -inf
3    NaN
4    inf
5   -inf
Name: Koalas, dtype: float64

itholic · 2020-04-09T03:53:32Z

databricks/koalas/base.py

@@ -184,8 +184,14 @@ def __sub__(self, other):
            return _column_op(spark.Column.__sub__)(self, other)

    __mul__ = _column_op(spark.Column.__mul__)
-    __div__ = _numpy_column_op(spark.Column.__div__)


FYI: In Python 3.x, __div__ and __rdiv__ no more supported. Using __truediv__ and __rtruediv__instead.

databricks/koalas/base.py

itholic · 2020-04-10T03:01:16Z

databricks/koalas/frame.py

-rectangle       0       36
+circle        0.0     36.0
+triangle      0.0     18.0
+rectangle     0.0     36.0


For floordiv and rfloordiv, the result will always be a float since we cannot predict the result type of each column which is determined before executes the job by Spark.

In short, since there is a possibility that the result can be a Infinity which is float, the result always be a float.

Someone can give an opinion about this?

codecov-io · 2020-04-10T03:37:37Z

Codecov Report

Merging #1412 into master will decrease coverage by 2.29%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##           master    #1412      +/-   ##
==========================================
- Coverage   95.13%   92.84%   -2.30%     
==========================================
  Files          34       34              
  Lines        7958     7966       +8     
==========================================
- Hits         7571     7396     -175     
- Misses        387      570     +183

Impacted Files	Coverage Δ
databricks/koalas/frame.py	`93.39% <ø> (-3.23%)`	⬇️
databricks/koalas/base.py	`97.64% <100.00%> (+0.07%)`	⬆️
databricks/koalas/usage_logging/__init__.py	`24.32% <0.00%> (-70.28%)`	⬇️
databricks/koalas/usage_logging/usage_logger.py	`50.00% <0.00%> (-50.00%)`	⬇️
databricks/koalas/__init__.py	`73.07% <0.00%> (-19.24%)`	⬇️
databricks/conftest.py	`88.67% <0.00%> (-7.55%)`	⬇️
databricks/koalas/utils.py	`93.90% <0.00%> (-2.54%)`	⬇️
databricks/koalas/namespace.py	`85.00% <0.00%> (-1.58%)`	⬇️
databricks/koalas/plot.py	`93.33% <0.00%> (-0.96%)`	⬇️
... and 3 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update cd500d2...b0fd81f. Read the comment docs.

ueshin

LGTM.
Could you fix the test failure?

itholic · 2020-04-15T23:06:50Z

I think I ran into some problem while I'm trying to reproduce the failure.

Our test is failing in Python 3.5 with PyArrow 0.16.0 like the below,

After setting my test environment to equivalent to the GitHub Actions (Python 3.5, PySpark 2.3.4, pandas 0.23.4, PyArrow 0.16.0), It raises exception like the below when I import the Koalas package.

>>> import databricks.koalas as ks
WARNING:root:Found pyspark version "2.3.4" installed. pyspark>=2.4.0 is recommended.
ImportError: No module named 'numpy.core._multiarray_umath'
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File ".../koalas/databricks/koalas/__init__.py", line 46, in <module>
    import pyarrow
  File "/usr/local/anaconda3/envs/koalas3.5/lib/python3.5/site-packages/pyarrow/__init__.py", line 49, in <module>
    from pyarrow.lib import cpu_count, set_cpu_count
  File "pyarrow/lib.pyx", line 40, in init pyarrow.lib
ImportError: numpy.core.multiarray failed to import

Could somebody help to solve this??

itholic · 2020-04-15T23:14:47Z

Oh, It was problem with version of numpy.

now It works after changing numpy version from 1.15.2 to 1.17.0

itholic · 2020-04-16T00:43:48Z

databricks/koalas/frame.py


->>> df.rfloordiv(10)
+>>> df.rfloordiv(10)  # doctest: +SKIP


This test is moved to tests/test_dataframe.py::DataFrameTest::test_rfloordiv because this behaviour can be different depends on the version of pandas.

pandas < 0.24.0

>>> df.rfloordiv(10) angles degrees circle inf 0.0 triangle 3.000000 0.0 rectangle 2.000000 0.0

pandas >= 0.24.0

>>> df.rfloordiv(10) angles degrees circle inf 0.0 triangle 3.0 0.0 rectangle 2.0 0.0

…x_div_zero

ueshin

Otherwise, LGTM.

databricks/koalas/tests/test_dataframe.py

databricks/koalas/tests/test_series.py

ueshin · 2020-04-16T23:48:29Z

Thanks! merging

Fix Series.div when divide by zero

006f694

itholic commented Apr 9, 2020

View reviewed changes

itholic changed the title ~~Fix Series.div when divide by zero~~ [WIP] Fix Series.div when divide by zero Apr 9, 2020

ueshin reviewed Apr 9, 2020

View reviewed changes

databricks/koalas/base.py Show resolved Hide resolved

databricks/koalas/base.py Outdated Show resolved Hide resolved

Modify the tests for matching floordiv

3a559cf

itholic commented Apr 10, 2020

View reviewed changes

Fix tests for pandas < 1.0.0

93fc548

itholic changed the title ~~[WIP] Fix Series.div when divide by zero~~ Fix Series.div when divide by zero Apr 10, 2020

remove unnecessary F.lit

bce4a7a

ueshin reviewed Apr 15, 2020

View reviewed changes

itholic added 2 commits April 16, 2020 09:37

Fix build failure in pandas<0.24.0

dd9b480

Resolve conflicts

dbc99ec

itholic commented Apr 16, 2020

View reviewed changes

Fix tests

b0fd81f

itholic mentioned this pull request Apr 16, 2020

Add unique function to SeriesGroupBy. #1426

Merged

itholic added 2 commits April 16, 2020 11:52

Merge branch 'master' of https://github.com/databricks/koalas into fi…

8260686

…x_div_zero

fix type

82e856c

ueshin approved these changes Apr 16, 2020

View reviewed changes

databricks/koalas/tests/test_dataframe.py Outdated Show resolved Hide resolved

databricks/koalas/tests/test_series.py Outdated Show resolved Hide resolved

ks -> pd in several tests

0032eb4

ueshin merged commit bf1bb4e into databricks:master Apr 16, 2020

itholic deleted the fix_div_zero branch April 21, 2020 07:40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix Series.div when divide by zero #1412

Fix Series.div when divide by zero #1412

itholic commented Apr 9, 2020

itholic Apr 9, 2020

itholic Apr 10, 2020 •

edited

Loading

itholic Apr 10, 2020 •

edited

Loading

codecov-io commented Apr 10, 2020 •

edited

Loading

ueshin left a comment

itholic commented Apr 15, 2020 •

edited

Loading

itholic commented Apr 15, 2020

itholic Apr 16, 2020 •

edited

Loading

ueshin left a comment

ueshin commented Apr 16, 2020

Fix Series.div when divide by zero #1412

Fix Series.div when divide by zero #1412

Conversation

itholic commented Apr 9, 2020

itholic Apr 9, 2020

Choose a reason for hiding this comment

itholic Apr 10, 2020 • edited Loading

Choose a reason for hiding this comment

itholic Apr 10, 2020 • edited Loading

Choose a reason for hiding this comment

codecov-io commented Apr 10, 2020 • edited Loading

Codecov Report

ueshin left a comment

Choose a reason for hiding this comment

itholic commented Apr 15, 2020 • edited Loading

itholic commented Apr 15, 2020

itholic Apr 16, 2020 • edited Loading

Choose a reason for hiding this comment

ueshin left a comment

Choose a reason for hiding this comment

ueshin commented Apr 16, 2020

itholic Apr 10, 2020 •

edited

Loading

itholic Apr 10, 2020 •

edited

Loading

codecov-io commented Apr 10, 2020 •

edited

Loading

itholic commented Apr 15, 2020 •

edited

Loading

itholic Apr 16, 2020 •

edited

Loading