Implement Series.where #922

itholic · 2019-10-13T00:21:03Z

Like pandas Series.where (https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.where.html)

implemented function where for series.

>>> s1 = ks.Series([0, 1, 2, 3, 4])
>>> s2 = ks.Series([100, 200, 300, 400, 500])
>>> s1.where(s1 > 0)
0    NaN
1    1.0
2    2.0
3    3.0
4    4.0
Name: 0, dtype: float64


>>> s1.where(s1 > 1, 10)
0    10
1    10
2     2
3     3
4     4
Name: 0, dtype: int64

>>> s1.where(s1 > 1, s1 + 50)
0    50
1    51
2     2
3     3
4     4
Name: 0, dtype: int64


>>> s1.where(s1 > 1, s2)
0    100
1    200
2      2
3      3
4      4
Name: 0, dtype: int64

codecov-io · 2019-10-13T00:50:39Z

Codecov Report

Merging #922 into master will increase coverage by <.01%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master     #922      +/-   ##
==========================================
+ Coverage   94.52%   94.53%   +<.01%     
==========================================
  Files          34       34              
  Lines        6465     6476      +11     
==========================================
+ Hits         6111     6122      +11     
  Misses        354      354

Impacted Files	Coverage Δ
databricks/koalas/missing/series.py	`100% <ø> (ø)`	⬆️
databricks/koalas/series.py	`96.15% <100%> (+0.05%)`	⬆️
databricks/koalas/internal.py	`96.38% <0%> (ø)`	⬆️
databricks/koalas/namespace.py	`86.83% <0%> (ø)`	⬆️
databricks/koalas/frame.py	`96.02% <0%> (ø)`	⬆️
databricks/koalas/indexes.py	`96.44% <0%> (+0.02%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update c8dcb64...b620849. Read the comment docs.

ueshin

Shall we add more tests in test_series to check various patterns? e.g.,

>>> s1 = pd.Series([0, 1, 2, 3, 4])
>>> s2 = pd.Series([100, 200, 300, 400, 500])

>>> s1.where(s2 > 100)
0    NaN
1    1.0
2    2.0
3    3.0
4    4.0
dtype: float64

and negative cases?

databricks/koalas/series.py

HyukjinKwon · 2019-10-21T03:59:08Z

databricks/koalas/series.py

@@ -3409,6 +3409,80 @@ def replace(self, to_replace=None, value=None, regex=False) -> 'Series':

        return self._with_new_scol(current)

+    def where(self, cond, other=np.nan):


@itholic seems like pandas shares the same implementation internally. After this PR is merged, can you move this into _Frame class and implement DataFrame.where as well?

okay, i'm going to work right after this PR is merged

HyukjinKwon · 2019-10-21T03:59:27Z

Seems fine to me otherwise.

softagram-bot · 2019-10-25T02:31:42Z

Softagram Impact Report for pull/922 (head commit: `b620849`)

⭐ Change Overview

(Open in Softagram Desktop for full details)

📄 Full report

Permalink: Full report for pull/922

Impact Report explained. Give feedback on this report to [email protected]

HyukjinKwon · 2019-12-03T04:10:03Z

databricks/koalas/series.py

+        # |                4|  4|            true|              500|
+        # +-----------------+---+----------------+-----------------+
+        data_col_name = self._internal.column_name_for(self._internal.column_index[0])
+        index_column = self._internal.index_columns[0]


@itholic, I think this doesn't support multi-level index cases. Can you fix this please?

index_columns can be multiple and we cannot just use the first one only.

HyukjinKwon · 2019-12-03T04:12:19Z

databricks/koalas/tests/test_series.py

+        set_option("compute.ops_on_diff_frames", True)
+
+    @classmethod
+    def tearDownClass(cls):


@itholic disable this. compute.ops_on_diff_frames is disabled by default because it costs a lot. We should move the test cases into OpsOnDiffFramesEnabledTest

HyukjinKwon · 2019-12-03T04:13:16Z

databricks/koalas/tests/test_series.py

@@ -742,6 +753,23 @@ def test_duplicates(self):
        self.assert_eq(pser.drop_duplicates().sort_values(),
                       kser.drop_duplicates().sort_values())

+    def test_where(self):
+        pser1 = pd.Series([0, 1, 2, 3, 4], name=0)


Can you add a test when compute.ops_on_diff_frames is off? I think we can still use a scalar values for other such as int.

Implement Series.where

00910c7

Enable other as Series

39aec52

ueshin reviewed Oct 18, 2019

View reviewed changes

databricks/koalas/series.py Outdated Show resolved Hide resolved

databricks/koalas/series.py Outdated Show resolved Hide resolved

databricks/koalas/series.py Outdated Show resolved Hide resolved

HyukjinKwon reviewed Oct 21, 2019

View reviewed changes

itholic added 6 commits October 23, 2019 15:24

tests/test_series.py

ab6f747

change logic to use temp col & add some tests

373ccda

Resolve conflicts

658683b

Resolve conflicts

1acf285

Fix missing

bed954e

Remove xs from missing

b620849

HyukjinKwon merged commit 709b928 into databricks:master Oct 28, 2019

HyukjinKwon approved these changes Oct 28, 2019

View reviewed changes

itholic deleted the s_where branch November 6, 2019 05:32

HyukjinKwon reviewed Dec 3, 2019

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement Series.where #922

Implement Series.where #922

itholic commented Oct 13, 2019 •

edited

Loading

codecov-io commented Oct 13, 2019 •

edited

Loading

ueshin left a comment

HyukjinKwon Oct 21, 2019

itholic Oct 22, 2019

HyukjinKwon commented Oct 21, 2019

softagram-bot commented Oct 25, 2019

HyukjinKwon Dec 3, 2019

HyukjinKwon Dec 3, 2019

HyukjinKwon Dec 3, 2019

HyukjinKwon Dec 3, 2019

		@@ -3409,6 +3409,80 @@ def replace(self, to_replace=None, value=None, regex=False) -> 'Series':

		return self._with_new_scol(current)

		def where(self, cond, other=np.nan):

Implement Series.where #922

Implement Series.where #922

Conversation

itholic commented Oct 13, 2019 • edited Loading

codecov-io commented Oct 13, 2019 • edited Loading

Codecov Report

ueshin left a comment

Choose a reason for hiding this comment

HyukjinKwon Oct 21, 2019

Choose a reason for hiding this comment

itholic Oct 22, 2019

Choose a reason for hiding this comment

HyukjinKwon commented Oct 21, 2019

softagram-bot commented Oct 25, 2019

Softagram Impact Report for pull/922 (head commit: b620849)

⭐ Change Overview

📄 Full report

HyukjinKwon Dec 3, 2019

Choose a reason for hiding this comment

HyukjinKwon Dec 3, 2019

Choose a reason for hiding this comment

HyukjinKwon Dec 3, 2019

Choose a reason for hiding this comment

HyukjinKwon Dec 3, 2019

Choose a reason for hiding this comment

itholic commented Oct 13, 2019 •

edited

Loading

codecov-io commented Oct 13, 2019 •

edited

Loading

Softagram Impact Report for pull/922 (head commit: `b620849`)