-
Notifications
You must be signed in to change notification settings - Fork 28.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-22362][SQL] Add unit test for Window Aggregate Functions #20046
Conversation
val e = intercept[AnalysisException]( | ||
df.select( | ||
$"key", | ||
count("invalid").over( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
over()
would be enough, as partition and orderBy is ignored anyway
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I will remove the unnecessary parts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could have tests attempting to feed aggregation functions with columns containing wrong datatypes (e.g. avg on String)
Jenkins, ok to test |
Test build #85276 has finished for PR 20046 at commit
|
Test build #85310 has finished for PR 20046 at commit
|
Test build #85311 has finished for PR 20046 at commit
|
4dd08dc
to
03d6159
Compare
Test build #85599 has finished for PR 20046 at commit
|
Test build #85601 has finished for PR 20046 at commit
|
Test build #85610 has finished for PR 20046 at commit
|
Test build #86352 has finished for PR 20046 at commit
|
retest this please |
Test build #86362 has finished for PR 20046 at commit
|
@@ -86,6 +93,429 @@ class DataFrameWindowFunctionsSuite extends QueryTest with SharedSQLContext { | |||
assert(e.message.contains("requires window to be ordered")) | |||
} | |||
|
|||
test("aggregation and rows between") { | |||
val df = Seq((1, "1"), (2, "1"), (2, "2"), (1, "1"), (2, "2")).toDF("key", "value") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We shall also include null data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This tests was removed and re-added as result of merge conflict. Now I cleaned up.
We shall also cover the sql interface, you can find some example in |
@jiangxb1987 how does your request to cover the sql interface relates to SPARK-23160? |
I have already extended sql/core/src/test/resources/sql-tests/inputs/window.sql with the missing window aggregate functions but if you would like I can move it to a different PR too. |
Test build #86474 has finished for PR 20046 at commit
|
PySpark failure must be unrelated as only unit tests are added. |
Test build #86472 has finished for PR 20046 at commit
|
Test build #86476 has finished for PR 20046 at commit
|
@jiangxb1987 your review comments are applied, is there something else I should work on regarding this PR? |
gentle ping @gatorsmile @hvanhovell |
Test build #89066 has finished for PR 20046 at commit
|
ping @hvanhovell @cloud-fan |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few comments. Looks good overall.
@@ -33,6 +35,11 @@ import org.apache.spark.unsafe.types.CalendarInterval | |||
class DataFrameWindowFunctionsSuite extends QueryTest with SharedSQLContext { | |||
import testImplicits._ | |||
|
|||
private def sortWrappedArrayInRow(d: DataFrame) = d.map { | |||
case Row(key: String, unsorted: mutable.WrappedArray[String]) => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's not pattern match against mutable.WrappedArray
and use Seq
instead. mutable.WrappedArray
is pretty much an implementation detail, and pattern matching against it is brittle.
@@ -33,6 +35,11 @@ import org.apache.spark.unsafe.types.CalendarInterval | |||
class DataFrameWindowFunctionsSuite extends QueryTest with SharedSQLContext { | |||
import testImplicits._ | |||
|
|||
private def sortWrappedArrayInRow(d: DataFrame) = d.map { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can also just use the array_sort
function. That is probably a lot cheaper.
Test build #89100 has finished for PR 20046 at commit
|
gentle reminder @hvanhovell |
@hvanhovell I would like to ask you to take another quick glance to these change |
LGTM - merging to master. Thanks! |
What changes were proposed in this pull request?
Improving the test coverage of window functions focusing on missing test for window aggregate functions. No new UDAF test is added as it has been tested already.
How was this patch tested?
Only new tests were added, automated tests were executed.