-
Notifications
You must be signed in to change notification settings - Fork 28.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SPARK-2180: support HAVING clauses in Hive queries #1136
Conversation
Merged build triggered. |
Merged build started. |
Merged build finished. All automated tests passed. |
All automated tests passed. |
Any idea why the having test from Hive is not runnable? |
@rxin, I'm not 100% sure but I think it's a problem with local map/reduce (the stack trace isn't too informative, but it's the same as the one for tests that are blacklisted due to missing local map/reduce). I have another commit to push here (adding a semantic exception when |
Merged build triggered. |
Merged build started. |
Merged build finished. |
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/15925/ |
Merged build triggered. |
Merged build started. |
Merged build finished. All automated tests passed. |
All automated tests passed. |
Thanks, @willb. There is at least one problem I found. - I think you'd need to add a cast to the having expression. Otherwise try run the following: In Hive this returns nothing, but in Spark SQL with this patch it throws a runtime exception failing to cast integer to boolean. |
To be more specific, I think you can always add a cast that cast the having expression to boolean, and then we have SimplifyCasts in the optimizer that would remove unnecessary casts. |
Thanks for the catch, @rxin! I'll make the change and add tests for it. |
So I've added a cast in cases in which non-boolean expressions are supplied to having expressions. It appears that (Checking the type of a variable during parse doesn't work, so I wind up with a different exception in examples like the one you posted. I'll either need to fix the behavior of |
That's definitely a bug - I will take a look at it later. |
Thanks! I'm happy to put together a preliminary patch as well, but probably won't be able to take a look until tomorrow morning. |
I found the issue and fixed it. Will push out a pull request soon. If you can just add the boolean cast (always add it - no need to check if the type is already boolean since once I fix the bug, the extra cast on boolean value will be removed), that'd be great. |
Here's the patch: #1144 |
BTW I really want this to go into 1.0.1, which will probably have a release candidate soon. So if you have a chance to rebase your PR and add the cast, please do. Thanks a lot, @willb! |
This is a simple test for HAVING clauses.
Merged build triggered. |
Merged build started. |
Thanks for the quick review and patch, @rxin! |
Merged build finished. All automated tests passed. |
All automated tests passed. |
I tried |
I'm going to merge this in master & branch-1.0. I will create a separate ticket to track progress on HAVING. Basically there are two things missing:
|
This PR extends Spark's HiveQL support to handle HAVING clauses in aggregations. The HAVING test from the Hive compatibility suite doesn't appear to be runnable from within Spark, so I added a simple comparable test to `HiveQuerySuite`. Author: William Benton <[email protected]> Closes #1136 from willb/SPARK-2180 and squashes the following commits: 3bbaf26 [William Benton] Added casts to HAVING expressions 83f1340 [William Benton] scalastyle fixes 18387f1 [William Benton] Add test for HAVING without GROUP BY b880bef [William Benton] Added semantic error for HAVING without GROUP BY 942428e [William Benton] Added test coverage for SPARK-2180. 56084cc [William Benton] Add support for HAVING clauses in Hive queries. (cherry picked from commit 171ebb3) Signed-off-by: Reynold Xin <[email protected]>
@rxin, re: the former, seems like most implementations signal this as an error. |
There are databases that support that, and it seems to me a very simple change (actually just removing the check code you added is probably enough). |
BTW two follow up tickets created: https://issues.apache.org/jira/browse/SPARK-2225 https://issues.apache.org/jira/browse/SPARK-2226 Let me know if you'd like to work on them. |
OK, I wasn't sure if strict Hive compatibility was the goal. I'm happy to take these tickets. Thanks again! |
I actually did 2225 already. I will assign 2226 to you. Thanks! |
This PR extends Spark's HiveQL support to handle HAVING clauses in aggregations. The HAVING test from the Hive compatibility suite doesn't appear to be runnable from within Spark, so I added a simple comparable test to `HiveQuerySuite`. Author: William Benton <[email protected]> Closes apache#1136 from willb/SPARK-2180 and squashes the following commits: 3bbaf26 [William Benton] Added casts to HAVING expressions 83f1340 [William Benton] scalastyle fixes 18387f1 [William Benton] Add test for HAVING without GROUP BY b880bef [William Benton] Added semantic error for HAVING without GROUP BY 942428e [William Benton] Added test coverage for SPARK-2180. 56084cc [William Benton] Add support for HAVING clauses in Hive queries.
This PR extends Spark's HiveQL support to handle HAVING clauses in aggregations. The HAVING test from the Hive compatibility suite doesn't appear to be runnable from within Spark, so I added a simple comparable test to `HiveQuerySuite`. Author: William Benton <[email protected]> Closes apache#1136 from willb/SPARK-2180 and squashes the following commits: 3bbaf26 [William Benton] Added casts to HAVING expressions 83f1340 [William Benton] scalastyle fixes 18387f1 [William Benton] Add test for HAVING without GROUP BY b880bef [William Benton] Added semantic error for HAVING without GROUP BY 942428e [William Benton] Added test coverage for SPARK-2180. 56084cc [William Benton] Add support for HAVING clauses in Hive queries.
…ute URI: ${system:user.name%7D (apache#1136) Co-authored-by: Egor Krivokon <>
This PR extends Spark's HiveQL support to handle HAVING clauses in aggregations. The HAVING test from the Hive compatibility suite doesn't appear to be runnable from within Spark, so I added a simple comparable test to
HiveQuerySuite
.