-
Notifications
You must be signed in to change notification settings - Fork 907
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support VARIANCE
and STD
aggregation in rolling op
#8809
Conversation
Per discussion above, the current rolling aggregation returns null under following corner cases (assume
When |
FYI: In Spark, there is an option allowing you to specify whether to output a null or a NaN when divide-by-zero: https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/CentralMomentAgg.scala#L67 |
@ttnghia Is it a global switch or specific to std/var aggregation? |
It's just specific to std/var aggregation. I'm not sure if this option is exposed to the user though. @revans2 do you have any idea? |
It is a global config in newer versions of Spark |
Co-authored-by: Christopher Harris <[email protected]>
@revans2 Thanks for the info. As mentioned several times by Bobby, for var/std case, doing a |
In defense of @isVoid's PR, the oldest rolling-window code was this way for a bit. I've tried moving to using more explicit (manually verifiable) values for the more recent feature additions. It will be worthwhile to dismantle the old tests and write explicit ones for them. As @davidwendt, @isVoid, etc. have noted, we should probably do that in a follow-up. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
Regarding the change in behaviour since my last review (i.e. specifically when |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Co-authored-by: David Wendt <[email protected]>
rerun tests |
1 similar comment
rerun tests |
@gpucibot merge |
Part 1 of #8695
This PR adds support to
STD
andVARIANCE
rolling aggregations in libcudf.Implementation notes: