-
Notifications
You must be signed in to change notification settings - Fork 234
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Std dev samp for windowing [databricks] #3869
Add Std dev samp for windowing [databricks] #3869
Conversation
Should we have it disabled by default for now? We should link the cuDF issue if there is one.
Is this something you are planning to handle in the plugin? Or is that going to be handled under the hood by cuDF |
f9ca4fb
to
005ba4e
Compare
Signed-off-by: Raza Jafri <[email protected]>
005ba4e
to
1d94635
Compare
I kept it as a draft PR for this specific reason.
Yes, I have updated the PR to handle the Spark specific output in the plugin |
Signed-off-by: Raza Jafri <[email protected]>
Signed-off-by: Raza Jafri <[email protected]>
Signed-off-by: Raza Jafri <[email protected]>
Signed-off-by: Raza Jafri <[email protected]>
sql-plugin/src/main/scala/com/nvidia/spark/rapids/GpuOverrides.scala
Outdated
Show resolved
Hide resolved
sql-plugin/src/main/scala/com/nvidia/spark/rapids/GpuOverrides.scala
Outdated
Show resolved
Hide resolved
sql-plugin/src/main/scala/org/apache/spark/sql/rapids/AggregateFunctions.scala
Outdated
Show resolved
Hide resolved
Signed-off-by: Raza Jafri <[email protected]>
build |
Signed-off-by: Raza Jafri <[email protected]>
These changes were causing test_lead_lag_for_structs_with_arrays test to fail. I am not sure why but seems to be happening only for the case where int_gen is nested in the array_gen. Signed-off-by: Raza Jafri <[email protected]>
build |
@revans2 can take another look? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just one minor nit
|
||
override lazy val evaluateExpression: Expression = { | ||
override val evaluateExpression: Expression = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Can we add the lazy back in?
Signed-off-by: Raza Jafri <[email protected]>
build |
ExprChecks.groupByOnly( | ||
TypeSig.DOUBLE, TypeSig.DOUBLE, | ||
Seq(ParamCheck("input", TypeSig.DOUBLE, | ||
TypeSig.DOUBLE))).asInstanceOf[ExprChecksImpl].contexts | ||
++ | ||
ExprChecks.windowOnly( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be just aggNotReduction
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't know about that ExprCheck. I think that's what we need, @revans2 do you see anything wrong with using aggNotReduction
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
aggNotReduction should do the same thing, just be cleaner.
Signed-off-by: Raza Jafri <[email protected]>
build |
@@ -1204,8 +1204,27 @@ case class GpuStddevPop(child: Expression, nullOnDivideByZero: Boolean) | |||
override def prettyName: String = "stddev_pop" | |||
} | |||
|
|||
case class WindowStddevSamp( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we going to support WindowStddevPop
too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not in this PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not? The std
family can be supported all at once with very little code added. Otherwise, we may have new requests from customers to go back, read the code, add more code + tests etc... in a significantly much more time.
IMO adding all stddev_pop/samp
and var_pop/samp
at once is the most optimal way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be very easy to do but that doesn't mean that we have to bundle them together. This PR is specific to a client's request and is scheduled for this release.
This PR adds windowing support for stddev. Its built on top of @revans2 changes for adding support for
GpuReplaceWindowFunction
CudfWindowStddev
class to avoid stack overflow when replacingStddevSamp
window_function_test.py
.depends on rapidsai/cudf#9527
closes #3814
Signed-off-by: Raza Jafri [email protected]