Add Series.all and Series.any #359

HyukjinKwon · 2019-05-20T06:45:25Z

No description provided.

HyukjinKwon · 2019-05-20T07:20:30Z

databricks/koalas/series.py

+        # Here we check if the count of `True`s is more than one in order to mimic `any`.
+        return sdf.select(
+            (F.count(F.when(col.cast('boolean'), 1).otherwise(None)) >= 1)
+        ).collect()[0][0]


@hvanhovell do you remember this discussion about rewriting for any and every at Apache Spark side? (IIRC you suggested max/min approach but the hole was None handling). Can you quickly check if this rewriting makes sense to you?

Here I need to mimic both here to mimic Pandas' with rewriting since any and every only exist in the Spark's master.

I can't entirely remember that comment. I did found an e-mail thread about this, and there I state the opposite: ANY(col) can be safely rewritten into MAX(col) you may need to add a coalesce if we need to return false for an empty dataset.

codecov-io · 2019-05-20T07:26:59Z

Codecov Report

Merging #359 into master will decrease coverage by 0.03%.
The diff coverage is 83.33%.

@@            Coverage Diff             @@
##           master     #359      +/-   ##
==========================================
- Coverage    94.4%   94.36%   -0.04%     
==========================================
  Files          36       36              
  Lines        3646     3656      +10     
==========================================
+ Hits         3442     3450       +8     
- Misses        204      206       +2

Impacted Files	Coverage Δ
databricks/koalas/missing/series.py	`100% <ø> (ø)`	⬆️
databricks/koalas/series.py	`92.7% <83.33%> (-0.36%)`	⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 3cffb28...84d3c59. Read the comment docs.

HyukjinKwon · 2019-05-20T07:33:38Z

All tests passed

HyukjinKwon · 2019-05-20T09:45:11Z

databricks/koalas/series.py

+        # Here we check if the count of `True`s is more than one in order to mimic `any`.
+        return sdf.select(
+            (F.count(F.when(col.cast('boolean'), 1).otherwise(None)) >= 1)
+        ).collect()[0][0]


BTW, count(col) ignores null

HyukjinKwon · 2019-05-21T04:28:28Z

Let me get this in. The output is as expected anyway.

hvanhovell · 2019-05-21T07:58:13Z

databricks/koalas/series.py

+        # Note that we're ignoring `None`s here for now.
+        # Here we check if the count of `True`s is more than one in order to mimic `any`.
+        return sdf.select(
+            (F.count(F.when(col.cast('boolean'), 1).otherwise(None)) >= 1)


Two nits: otherwise(None) should be implied, and if you do a is not empty check why not check > 0?

Yea, currently None is being simply ignored (that's what Pandas does by default - skipna argument). Yes .. > 0 looks better. Will fix it soon. thanks :D.

also let me give a shot with min/max ones too

HyukjinKwon added 2 commits May 20, 2019 15:44

Add Series.all and Series.any

74a8328

Rewrite expressions to mimic all and any

84d3c59

HyukjinKwon commented May 20, 2019

View reviewed changes

HyukjinKwon merged commit 47ad632 into databricks:master May 21, 2019

hvanhovell reviewed May 21, 2019

View reviewed changes

HyukjinKwon deleted the add-all-any branch September 11, 2020 07:53

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Series.all and Series.any #359

Add Series.all and Series.any #359

HyukjinKwon commented May 20, 2019

HyukjinKwon May 20, 2019 •

edited

Loading

hvanhovell May 21, 2019

codecov-io commented May 20, 2019 •

edited

Loading

HyukjinKwon commented May 20, 2019

HyukjinKwon May 20, 2019

HyukjinKwon commented May 21, 2019

hvanhovell May 21, 2019

HyukjinKwon May 21, 2019

HyukjinKwon May 21, 2019

Add Series.all and Series.any #359

Add Series.all and Series.any #359

Conversation

HyukjinKwon commented May 20, 2019

HyukjinKwon May 20, 2019 • edited Loading

Choose a reason for hiding this comment

hvanhovell May 21, 2019

Choose a reason for hiding this comment

codecov-io commented May 20, 2019 • edited Loading

Codecov Report

HyukjinKwon commented May 20, 2019

HyukjinKwon May 20, 2019

Choose a reason for hiding this comment

HyukjinKwon commented May 21, 2019

hvanhovell May 21, 2019

Choose a reason for hiding this comment

HyukjinKwon May 21, 2019

Choose a reason for hiding this comment

HyukjinKwon May 21, 2019

Choose a reason for hiding this comment

HyukjinKwon May 20, 2019 •

edited

Loading

codecov-io commented May 20, 2019 •

edited

Loading