-
Notifications
You must be signed in to change notification settings - Fork 28.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-5817] [SQL] Fix bug of udtf with column names #4602
Conversation
Test build #27472 has started for PR 4602 at commit
|
Test build #27472 has finished for PR 4602 at commit
|
Test FAILed. |
Test build #27473 has started for PR 4602 at commit
|
@@ -101,6 +101,7 @@ case class Alias(child: Expression, name: String) | |||
extends NamedExpression with trees.UnaryNode[Expression] { | |||
|
|||
override type EvaluatedType = Any | |||
override lazy val resolved = childrenResolved && !child.isInstanceOf[Generator] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why this change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alias(Generator)
does not like the normal expression, and it will be transformed into Generate(Generator, alias)
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a comment to this effect?
Test build #27473 has finished for PR 4602 at commit
|
Test PASSed. |
@@ -137,6 +137,11 @@ class Analyzer(catalog: Catalog, | |||
failAnalysis( | |||
s"unresolved operator ${operator.simpleString}") | |||
|
|||
case p @ Project(exprs, _) if exprs.length > 1 && exprs.collect { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
perhaps exprs.find(_.isInstanceOf[Generator]).isDefined
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
e.g. Project(Alias(Generator1, name), Alias(Generator2, name2))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, it's a bug in my code, thanks for finding this. :)
Test build #27499 has started for PR 4602 at commit
|
Test build #27499 has finished for PR 4602 at commit
|
Test PASSed. |
@marmbrus any more comments on this? |
I tried the following
|
9656e51
to
f6907d2
Compare
Thank you @yhuai , I've updated the description and rebased the code. |
Test build #27617 has started for PR 4602 at commit
|
retest this please. |
Test build #27620 has started for PR 4602 at commit
|
Test build #27617 has finished for PR 4602 at commit
|
Test PASSed. |
Test build #27620 has finished for PR 4602 at commit
|
Test PASSed. |
@chenghao-intel After another look of the code, I think it may be better to remove aliases from the |
The |
@yhuai please ignore my previous comment. I was thinking some other possibilities. |
@@ -144,6 +144,12 @@ class Analyzer(catalog: Catalog, | |||
failAnalysis( | |||
s"unresolved operator ${operator.simpleString}") | |||
|
|||
case p @ Project(exprs, _) if exprs.length > 1 && exprs.flatMap(_.collect { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pull containsMultipleGenerators
out into a function.
case p @ Project(exprs, _) if containsMultipleGenerators(exprs) => | ||
failAnalysis( | ||
s"""Only a single table generating function is allowed in a SELECT clause, found: | ||
| ${exprs.map(_.prettyString).mkString(",")}""".stripMargin) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have a test for this error?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea, I added in the unit test. see HiveQuerySuite.scala
.
eb8178c
to
d2e8b43
Compare
Test build #30468 has started for PR 4602 at commit |
Test build #30468 has finished for PR 4602 at commit
|
Test FAILed. |
Test build #30489 has started for PR 4602 at commit |
Test build #30490 has started for PR 4602 at commit |
Test build #30491 has started for PR 4602 at commit |
Test build #30493 has started for PR 4602 at commit |
Test build #30489 has finished for PR 4602 at commit
|
Test FAILed. |
Test build #30490 has finished for PR 4602 at commit
|
Test FAILed. |
Test build #30491 has finished for PR 4602 at commit
|
Test FAILed. |
Test build #30493 has finished for PR 4602 at commit
|
Test PASSed. |
Thanks, merged to master. |
It's a bug while do query like: ```sql select d from (select explode(array(1,1)) d from src limit 1) t ``` And it will throws exception like: ``` org.apache.spark.sql.AnalysisException: cannot resolve 'd' given input columns _c0; line 1 pos 7 at org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42) at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$apply$3$$anonfun$apply$1.applyOrElse(CheckAnalysis.scala:48) at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$apply$3$$anonfun$apply$1.applyOrElse(CheckAnalysis.scala:45) at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:250) at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:250) at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:50) at org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:249) at org.apache.spark.sql.catalyst.plans.QueryPlan.org$apache$spark$sql$catalyst$plans$QueryPlan$$transformExpressionUp$1(QueryPlan.scala:103) at org.apache.spark.sql.catalyst.plans.QueryPlan$$anonfun$2$$anonfun$apply$2.apply(QueryPlan.scala:117) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) at scala.collection.TraversableLike$class.map(TraversableLike.scala:244) at scala.collection.AbstractTraversable.map(Traversable.scala:105) at org.apache.spark.sql.catalyst.plans.QueryPlan$$anonfun$2.apply(QueryPlan.scala:116) at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) ``` To solve the bug, it requires code refactoring for UDTF The major changes are about: * Simplifying the UDTF development, UDTF will manage the output attribute names any more, instead, the `logical.Generate` will handle that properly. * UDTF will be asked for the output schema (data types) during the logical plan analyzing. Author: Cheng Hao <[email protected]> Closes apache#4602 from chenghao-intel/explode_bug and squashes the following commits: c2a5132 [Cheng Hao] add back resolved for Alias 556e982 [Cheng Hao] revert the unncessary change 002c361 [Cheng Hao] change the rule of resolved for Generate 04ae500 [Cheng Hao] add qualifier only for generator output 5ee5d2c [Cheng Hao] prepend the new qualifier d2e8b43 [Cheng Hao] Update the code as feedback ca5e7f4 [Cheng Hao] shrink the commits
It's a bug while do query like:
And it will throws exception like:
To solve the bug, it requires code refactoring for UDTF
The major changes are about:
logical.Generate
will handle that properly.