[SPARK-21238][SQL] allow nested SQL execution #18450

cloud-fan · 2017-06-28T08:24:23Z

What changes were proposed in this pull request?

This is kind of another follow-up for #18064 .

In #18064 , we wrap every SQL command with SQL execution, which makes nested SQL execution very likely to happen. #18419 trid to improve it a little bit, by introduing SQLExecition.ignoreNestedExecutionId. However, this is not friendly to data source developers, they may need to update their code to use this ignoreNestedExecutionId API.

This PR proposes a new solution, to just allow nested execution. The downside is that, we may have multiple executions for one query. We can improve this by updating the data organization in SQLListener, to have 1-n mapping from query to execution, instead of 1-1 mapping. This can be done in a follow-up.

How was this patch tested?

existing tests.

cloud-fan · 2017-06-28T08:24:33Z

cc @rdblue @zsxwing

cloud-fan · 2017-06-28T08:25:40Z

sql/core/src/test/scala/org/apache/spark/sql/execution/SQLExecutionSuite.scala

@@ -26,22 +26,9 @@ import org.apache.spark.sql.SparkSession
 class SQLExecutionSuite extends SparkFunSuite {

  test("concurrent query execution (SPARK-10548)") {
-    // Try to reproduce the issue with the old SparkContext


now we allow nested execution, so we can't reproduce this bug anymore.

SparkQA · 2017-06-28T10:36:20Z

Test build #78783 has finished for PR 18450 at commit d47433e.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

rdblue · 2017-06-28T14:52:26Z

I think it is good that this would no longer throw exceptions at runtime. Is the purpose of not allowing nested executions to minimize the queries shown in the UI? If that's the only purpose then I agree that just eliminating the empty ones is a good strategy.

zsxwing

Just one suggestion. Otherwise, LGTM

zsxwing · 2017-06-28T17:53:31Z

sql/core/src/main/scala/org/apache/spark/sql/execution/ui/SQLListener.scala

@@ -314,6 +314,8 @@ class SQLListener(conf: SparkConf) extends SparkListener with Logging {
      if (executionUIData.isFailed) {
        failedExecutions += executionUIData
        trimExecutionsIfNecessary(failedExecutions)
+      } else if (executionUIData.jobs.isEmpty && executionUIData.accumulatorMetrics.isEmpty) {


I prefer to not do this. Otherwise, such query appears in running queries abut disappear after it finishes on UI and it seems pretty weird.

SparkQA · 2017-06-29T05:33:35Z

Test build #78853 has finished for PR 18450 at commit f8e9901.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

cloud-fan · 2017-06-29T06:37:57Z

thanks for the review, merging to master!

## What changes were proposed in this pull request? This is kind of another follow-up for apache#18064 . In apache#18064 , we wrap every SQL command with SQL execution, which makes nested SQL execution very likely to happen. apache#18419 trid to improve it a little bit, by introduing `SQLExecition.ignoreNestedExecutionId`. However, this is not friendly to data source developers, they may need to update their code to use this `ignoreNestedExecutionId` API. This PR proposes a new solution, to just allow nested execution. The downside is that, we may have multiple executions for one query. We can improve this by updating the data organization in SQLListener, to have 1-n mapping from query to execution, instead of 1-1 mapping. This can be done in a follow-up. ## How was this patch tested? existing tests. Author: Wenchen Fan <[email protected]> Closes apache#18450 from cloud-fan/execution-id.

…hema ## What changes were proposed in this pull request? In apache#18064, we allowed `RunnableCommand` to have children in order to fix some UI issues. Then we made `InsertIntoXXX` commands take the input `query` as a child, when we do the actual writing, we just pass the physical plan to the writer(`FileFormatWriter.write`). However this is problematic. In Spark SQL, optimizer and planner are allowed to change the schema names a little bit. e.g. `ColumnPruning` rule will remove no-op `Project`s, like `Project("A", Scan("a"))`, and thus change the output schema from "<A: int>" to `<a: int>`. When it comes to writing, especially for self-description data format like parquet, we may write the wrong schema to the file and cause null values at the read path. Fortunately, in apache#18450 , we decided to allow nested execution and one query can map to multiple executions in the UI. This releases the major restriction in apache#18604 , and now we don't have to take the input `query` as child of `InsertIntoXXX` commands. So the fix is simple, this PR partially revert apache#18064 and make `InsertIntoXXX` commands leaf nodes again. ## How was this patch tested? new regression test Author: Wenchen Fan <[email protected]> Closes apache#19474 from cloud-fan/bug.

allow nested SQL execution

d47433e

cloud-fan commented Jun 28, 2017

View reviewed changes

zsxwing requested changes Jun 28, 2017

View reviewed changes

do not ignore empty execution

f8e9901

asfgit closed this in 9f6b3e6 Jun 29, 2017

cloud-fan mentioned this pull request Oct 11, 2017

[SPARK-22252][SQL] FileFormatWriter should respect the input query schema #19474

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-21238][SQL] allow nested SQL execution #18450

[SPARK-21238][SQL] allow nested SQL execution #18450

cloud-fan commented Jun 28, 2017 •

edited

Loading

cloud-fan commented Jun 28, 2017

cloud-fan Jun 28, 2017

SparkQA commented Jun 28, 2017

rdblue commented Jun 28, 2017

zsxwing left a comment

zsxwing Jun 28, 2017 •

edited

Loading

SparkQA commented Jun 29, 2017

cloud-fan commented Jun 29, 2017

[SPARK-21238][SQL] allow nested SQL execution #18450

[SPARK-21238][SQL] allow nested SQL execution #18450

Conversation

cloud-fan commented Jun 28, 2017 • edited Loading

What changes were proposed in this pull request?

How was this patch tested?

cloud-fan commented Jun 28, 2017

cloud-fan Jun 28, 2017

Choose a reason for hiding this comment

SparkQA commented Jun 28, 2017

rdblue commented Jun 28, 2017

zsxwing left a comment

Choose a reason for hiding this comment

zsxwing Jun 28, 2017 • edited Loading

Choose a reason for hiding this comment

SparkQA commented Jun 29, 2017

cloud-fan commented Jun 29, 2017

cloud-fan commented Jun 28, 2017 •

edited

Loading

zsxwing Jun 28, 2017 •

edited

Loading