Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Series.all and Series.any #359

Merged
merged 2 commits into from
May 21, 2019
Merged

Conversation

HyukjinKwon
Copy link
Member

No description provided.

# Here we check if the count of `True`s is more than one in order to mimic `any`.
return sdf.select(
(F.count(F.when(col.cast('boolean'), 1).otherwise(None)) >= 1)
).collect()[0][0]
Copy link
Member Author

@HyukjinKwon HyukjinKwon May 20, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hvanhovell do you remember this discussion about rewriting for any and every at Apache Spark side? (IIRC you suggested max/min approach but the hole was None handling). Can you quickly check if this rewriting makes sense to you?

Here I need to mimic both here to mimic Pandas' with rewriting since any and every only exist in the Spark's master.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't entirely remember that comment. I did found an e-mail thread about this, and there I state the opposite: ANY(col) can be safely rewritten into MAX(col) you may need to add a coalesce if we need to return false for an empty dataset.

@codecov-io
Copy link

codecov-io commented May 20, 2019

Codecov Report

Merging #359 into master will decrease coverage by 0.03%.
The diff coverage is 83.33%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #359      +/-   ##
==========================================
- Coverage    94.4%   94.36%   -0.04%     
==========================================
  Files          36       36              
  Lines        3646     3656      +10     
==========================================
+ Hits         3442     3450       +8     
- Misses        204      206       +2
Impacted Files Coverage Δ
databricks/koalas/missing/series.py 100% <ø> (ø) ⬆️
databricks/koalas/series.py 92.7% <83.33%> (-0.36%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 3cffb28...84d3c59. Read the comment docs.

@HyukjinKwon
Copy link
Member Author

All tests passed

# Here we check if the count of `True`s is more than one in order to mimic `any`.
return sdf.select(
(F.count(F.when(col.cast('boolean'), 1).otherwise(None)) >= 1)
).collect()[0][0]
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, count(col) ignores null

@HyukjinKwon HyukjinKwon merged commit 47ad632 into databricks:master May 21, 2019
@HyukjinKwon
Copy link
Member Author

Let me get this in. The output is as expected anyway.

# Note that we're ignoring `None`s here for now.
# Here we check if the count of `True`s is more than one in order to mimic `any`.
return sdf.select(
(F.count(F.when(col.cast('boolean'), 1).otherwise(None)) >= 1)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two nits: otherwise(None) should be implied, and if you do a is not empty check why not check > 0?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, currently None is being simply ignored (that's what Pandas does by default - skipna argument). Yes .. > 0 looks better. Will fix it soon. thanks :D.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also let me give a shot with min/max ones too

@HyukjinKwon HyukjinKwon deleted the add-all-any branch September 11, 2020 07:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants