-
Notifications
You must be signed in to change notification settings - Fork 28.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-21238][SQL] allow nested SQL execution #18450
Conversation
@@ -26,22 +26,9 @@ import org.apache.spark.sql.SparkSession | |||
class SQLExecutionSuite extends SparkFunSuite { | |||
|
|||
test("concurrent query execution (SPARK-10548)") { | |||
// Try to reproduce the issue with the old SparkContext |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
now we allow nested execution, so we can't reproduce this bug anymore.
Test build #78783 has finished for PR 18450 at commit
|
I think it is good that this would no longer throw exceptions at runtime. Is the purpose of not allowing nested executions to minimize the queries shown in the UI? If that's the only purpose then I agree that just eliminating the empty ones is a good strategy. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just one suggestion. Otherwise, LGTM
@@ -314,6 +314,8 @@ class SQLListener(conf: SparkConf) extends SparkListener with Logging { | |||
if (executionUIData.isFailed) { | |||
failedExecutions += executionUIData | |||
trimExecutionsIfNecessary(failedExecutions) | |||
} else if (executionUIData.jobs.isEmpty && executionUIData.accumulatorMetrics.isEmpty) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer to not do this. Otherwise, such query appears in running queries abut disappear after it finishes on UI and it seems pretty weird.
Test build #78853 has finished for PR 18450 at commit
|
thanks for the review, merging to master! |
## What changes were proposed in this pull request? This is kind of another follow-up for apache#18064 . In apache#18064 , we wrap every SQL command with SQL execution, which makes nested SQL execution very likely to happen. apache#18419 trid to improve it a little bit, by introduing `SQLExecition.ignoreNestedExecutionId`. However, this is not friendly to data source developers, they may need to update their code to use this `ignoreNestedExecutionId` API. This PR proposes a new solution, to just allow nested execution. The downside is that, we may have multiple executions for one query. We can improve this by updating the data organization in SQLListener, to have 1-n mapping from query to execution, instead of 1-1 mapping. This can be done in a follow-up. ## How was this patch tested? existing tests. Author: Wenchen Fan <[email protected]> Closes apache#18450 from cloud-fan/execution-id.
…hema ## What changes were proposed in this pull request? In apache#18064, we allowed `RunnableCommand` to have children in order to fix some UI issues. Then we made `InsertIntoXXX` commands take the input `query` as a child, when we do the actual writing, we just pass the physical plan to the writer(`FileFormatWriter.write`). However this is problematic. In Spark SQL, optimizer and planner are allowed to change the schema names a little bit. e.g. `ColumnPruning` rule will remove no-op `Project`s, like `Project("A", Scan("a"))`, and thus change the output schema from "<A: int>" to `<a: int>`. When it comes to writing, especially for self-description data format like parquet, we may write the wrong schema to the file and cause null values at the read path. Fortunately, in apache#18450 , we decided to allow nested execution and one query can map to multiple executions in the UI. This releases the major restriction in apache#18604 , and now we don't have to take the input `query` as child of `InsertIntoXXX` commands. So the fix is simple, this PR partially revert apache#18064 and make `InsertIntoXXX` commands leaf nodes again. ## How was this patch tested? new regression test Author: Wenchen Fan <[email protected]> Closes apache#19474 from cloud-fan/bug.
What changes were proposed in this pull request?
This is kind of another follow-up for #18064 .
In #18064 , we wrap every SQL command with SQL execution, which makes nested SQL execution very likely to happen. #18419 trid to improve it a little bit, by introduing
SQLExecition.ignoreNestedExecutionId
. However, this is not friendly to data source developers, they may need to update their code to use thisignoreNestedExecutionId
API.This PR proposes a new solution, to just allow nested execution. The downside is that, we may have multiple executions for one query. We can improve this by updating the data organization in SQLListener, to have 1-n mapping from query to execution, instead of 1-1 mapping. This can be done in a follow-up.
How was this patch tested?
existing tests.