-
Notifications
You must be signed in to change notification settings - Fork 358
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix Series.div when divide by zero #1412
Conversation
@@ -184,8 +184,14 @@ def __sub__(self, other): | |||
return _column_op(spark.Column.__sub__)(self, other) | |||
|
|||
__mul__ = _column_op(spark.Column.__mul__) | |||
__div__ = _numpy_column_op(spark.Column.__div__) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI: In Python 3.x, __div__
and __rdiv__
no more supported. Using __truediv__
and __rtruediv__
instead.
rectangle 0 36 | ||
circle 0.0 36.0 | ||
triangle 0.0 18.0 | ||
rectangle 0.0 36.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For floordiv
and rfloordiv
, the result will always be a float
since we cannot predict the result type of each column which is determined before executes the job by Spark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In short, since there is a possibility that the result can be a Infinity
which is float
, the result always be a float
.
Someone can give an opinion about this?
Codecov Report
@@ Coverage Diff @@
## master #1412 +/- ##
==========================================
- Coverage 95.13% 92.84% -2.30%
==========================================
Files 34 34
Lines 7958 7966 +8
==========================================
- Hits 7571 7396 -175
- Misses 387 570 +183
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
Could you fix the test failure?
I think I ran into some problem while I'm trying to reproduce the failure. Our test is failing in After setting my test environment to equivalent to the GitHub Actions ( >>> import databricks.koalas as ks
WARNING:root:Found pyspark version "2.3.4" installed. pyspark>=2.4.0 is recommended.
ImportError: No module named 'numpy.core._multiarray_umath'
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File ".../koalas/databricks/koalas/__init__.py", line 46, in <module>
import pyarrow
File "/usr/local/anaconda3/envs/koalas3.5/lib/python3.5/site-packages/pyarrow/__init__.py", line 49, in <module>
from pyarrow.lib import cpu_count, set_cpu_count
File "pyarrow/lib.pyx", line 40, in init pyarrow.lib
ImportError: numpy.core.multiarray failed to import Could somebody help to solve this?? |
Oh, It was problem with version of now It works after changing numpy version from |
|
||
>>> df.rfloordiv(10) | ||
>>> df.rfloordiv(10) # doctest: +SKIP |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test is moved to tests/test_dataframe.py::DataFrameTest::test_rfloordiv
because this behaviour can be different depends on the version of pandas.
- pandas < 0.24.0
>>> df.rfloordiv(10)
angles degrees
circle inf 0.0
triangle 3.000000 0.0
rectangle 2.000000 0.0
- pandas >= 0.24.0
>>> df.rfloordiv(10)
angles degrees
circle inf 0.0
triangle 3.0 0.0
rectangle 2.0 0.0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Otherwise, LGTM.
Thanks! merging |
Resolves #1411