Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-22361][SQL][TEST] Add unit test for Window Frames #20019

Closed
wants to merge 6 commits into from

Conversation

gaborgsomogyi
Copy link
Contributor

What changes were proposed in this pull request?

There are already quite a few integration tests using window frames, but the unit tests coverage is not ideal.

In this PR the already existing tests are reorganized, extended and where gaps found additional cases added.

How was this patch tested?

Automated: Pass the Jenkins.

@gaborgsomogyi
Copy link
Contributor Author

@gatorsmile
Copy link
Member

ok to test

@SparkQA
Copy link

SparkQA commented Dec 20, 2017

Test build #85153 has finished for PR 20019 at commit 87e61ce.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • class DataFrameWindowFramesSuite extends QueryTest with SharedSQLContext

Copy link
Contributor

@smurakozi smurakozi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a test checking that rangeBetween throws an AnalysisException if it gets a non-numeric column as input, as demonstrated by
Seq("a").toDF("k").select(min('k).over(Window.orderBy('k).rangeBetween(-1, 1))).show

Add a test checking that rangeBetween throws an AnalysisException if it gets a column ordered by multiple order by expressions, as demonstrated by
Seq((1, 2)).toDF("k1", "k2").select(min('k1).over(Window.orderBy('k1, 'k2).rangeBetween(-1, 1))).show

Add tests checking that the above mentioned rules are ignored if range is not bounded, as demonstrated by
Seq((1, 2)).toDF("k1", "k2").select(min('k1).over(Window.orderBy('k1, 'k2).rangeBetween(Window.unboundedPreceding, Window.unboundedFollowing))).show

and
Seq("a").toDF("k").select(min('k).over(Window.orderBy('k).rangeBetween(Window.unboundedPreceding, Window.unboundedFollowing))).show

@gaborgsomogyi
Copy link
Contributor Author

@smurakozi nice catch, added them. Additionally found a nit.

@SparkQA
Copy link

SparkQA commented Dec 20, 2017

Test build #85190 has finished for PR 20019 at commit a2fa042.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Contributor

@jiangxb1987 jiangxb1987 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the effort! Only some minor issues.
It would also be great to improve the test for window frames parser, which currently is in ExpressionParserSuite.test("window function expressions").

assert(e.message.contains("Boundary end is not a valid integer: 2147483648"))
}

test("range between should accept at most one ORDER BY expression when unbounded") {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"unbounded" -> "bounded"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

"window specification with multiple order by expressions"))
}

test("range between should accept numeric values only when bounded") {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is actually a TypeCoercion issue, I'm not sure whether we should cover it here or TypeCoercionSuite

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I bit enhanced this area but not yet moved. What do you think, would it be good idea to move this test there? This is as window related as type coercion.

@jiangxb1987
Copy link
Contributor

also cc @gatorsmile whom I believe should be interested to look at this.

@gaborgsomogyi
Copy link
Contributor Author

ExpressionParserSuite window related part is a bit reorganized and enhanced.

@SparkQA
Copy link

SparkQA commented Dec 21, 2017

Test build #85267 has finished for PR 20019 at commit fa20e73.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@@ -263,21 +263,60 @@ class ExpressionParserSuite extends PlanTest {
"sum(product + 1) over (partition by ((product / 2) + 1) order by 2)",
WindowExpression('sum.function('product + 1),
WindowSpecDefinition(Seq('product / 2 + 1), Seq(Literal(2).asc), UnspecifiedFrame)))
}

test("range/rows window function expressions") {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about window function expressions on various window frames?

Copy link
Contributor Author

@gaborgsomogyi gaborgsomogyi Jan 2, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just back from holiday. What do you mean exactly here? Range/rows between test with most of the combinations are covered. If you could give an example it would be helpful.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jiangxb1987 gentle ping

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry for the late response. Just update the test case name.

val frameTypes = Seq(("rows", RowFrame), ("range", RangeFrame))
val boundaries = Seq(
("10 preceding", -Literal(10), CurrentRow),
("unbounded preceding", UnboundedPreceding, CurrentRow),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tiny nit: could you add comments on each group of boundaries to explain what they are trying to cover?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change is on the way...

@SparkQA
Copy link

SparkQA commented Jan 2, 2018

Test build #85592 has finished for PR 20019 at commit 677a4d6.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jan 8, 2018

Test build #85803 has finished for PR 20019 at commit d1e2454.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@jiangxb1987
Copy link
Contributor

retest this please

@SparkQA
Copy link

SparkQA commented Jan 16, 2018

Test build #86191 has finished for PR 20019 at commit d1e2454.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@jiangxb1987
Copy link
Contributor

LGTM

@gatorsmile
Copy link
Member

Thanks! Merged to master/2.3

asfgit pushed a commit that referenced this pull request Jan 17, 2018
## What changes were proposed in this pull request?

There are already quite a few integration tests using window frames, but the unit tests coverage is not ideal.

In this PR the already existing tests are reorganized, extended and where gaps found additional cases added.

## How was this patch tested?

Automated: Pass the Jenkins.

Author: Gabor Somogyi <[email protected]>

Closes #20019 from gaborgsomogyi/SPARK-22361.

(cherry picked from commit a9b845e)
Signed-off-by: gatorsmile <[email protected]>
@asfgit asfgit closed this in a9b845e Jan 17, 2018
@gaborgsomogyi
Copy link
Contributor Author

@gatorsmile @jiangxb1987 @smurakozi Thanks for the help!

@gaborgsomogyi gaborgsomogyi deleted the SPARK-22361 branch January 17, 2018 13:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants