Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-30842][SQL] Adjust abstraction structure for join operators #27595

Closed
wants to merge 5 commits into from

Conversation

Eric5553
Copy link
Contributor

@Eric5553 Eric5553 commented Feb 15, 2020

What changes were proposed in this pull request?

Currently the join operators are not well abstracted, since there are lot of common logic. A trait can be created for easier pattern matching and other future handiness. This is a follow-up PR based on comment
#27509 (comment) .

This PR refined from the following aspects:

  1. Refined structure of all physical join operators
  2. Add missing joinType field for CartesianProductExec operator
  3. Refined codes related to Explain Formatted

The EXPLAIN FORMATTED changes are

  1. Converge all join operator verboseStringWithOperatorId implementations to BaseJoinExec. Join condition displayed, and join keys displayed if it’s not empty.
  2. #1 will add Join condition to BroadcastNestedLoopJoinExec.
  3. #1 will NOT affect CartesianProductExec,SortMergeJoin and HashJoins, since they already got there override implementation before.
  4. Converge all join operator simpleStringWithNodeId to BaseJoinExec, which will enhance the one line description for CartesianProductExec with JoinType added.
  5. Override simpleStringWithNodeId in BroadcastNestedLoopJoinExec to show BuildSide, which was only done for HashJoins before.

Why are the changes needed?

Make the code consistent with other operators and for future handiness of join operators.

Does this PR introduce any user-facing change?

No

How was this patch tested?

Existing tests

@Eric5553
Copy link
Contributor Author

@SparkQA
Copy link

SparkQA commented Feb 15, 2020

Test build #118481 has finished for PR 27595 at commit 3ded69e.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • trait BaseJoinExec extends BinaryExecNode
  • trait HashJoin extends BaseJoinExec

@SparkQA
Copy link

SparkQA commented Feb 17, 2020

Test build #118589 has finished for PR 27595 at commit 4397ce8.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@@ -286,7 +286,8 @@ abstract class SparkStrategies extends QueryPlanner[SparkPlan] {

def createCartesianProduct() = {
if (joinType.isInstanceOf[InnerLike]) {
Some(Seq(joins.CartesianProductExec(planLater(left), planLater(right), condition)))
Some(Seq(joins.CartesianProductExec(
planLater(left), planLater(right), condition)))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unnecessary change.

@@ -367,7 +368,8 @@ abstract class SparkStrategies extends QueryPlanner[SparkPlan] {

def createCartesianProduct() = {
if (joinType.isInstanceOf[InnerLike]) {
Some(Seq(joins.CartesianProductExec(planLater(left), planLater(right), condition)))
Some(Seq(joins.CartesianProductExec(
planLater(left), planLater(right), condition)))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unnecessary change.

@SparkQA
Copy link

SparkQA commented Feb 21, 2020

Test build #118746 has finished for PR 27595 at commit 620c70d.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Feb 21, 2020

Test build #118802 has finished for PR 27595 at commit 8604c67.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

} else "None"
s"""
|(${ExplainUtils.getOpId(this)}) $nodeName ${ExplainUtils.getCodegenId(this)}
|${ExplainUtils.generateFieldString("Join condition", joinCondStr)}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we include join keys here? Then we can remove the verboseStringWithOperatorId methods in join sub-classes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And we should print nothing if join keys are empty, instead of [].

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, will update. Thanks!

@SparkQA
Copy link

SparkQA commented Feb 24, 2020

Test build #118879 has finished for PR 27595 at commit 78a1bc7.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor

LGTM. @Eric5553 can you list the EXPLAIN output changes introduced by this PR? Thanks!

@Eric5553
Copy link
Contributor Author

Eric5553 commented Feb 25, 2020

@cloud-fan Sure
The EXPLAIN FORMATTED changes are:

  1. Converge all join operator verboseStringWithOperatorId implementations to BaseJoinExec. Join condition displayed, and join keys displayed if it’s not empty.
  2. #1 will add Join condition to BroadcastNestedLoopJoinExec.
  3. #1 will NOT affect CartesianProductExec,SortMergeJoin and HashJoins, since they already got there override implementation before.
  4. Converge all join operator simpleStringWithNodeId to BaseJoinExec, which will enhance the one line description for CartesianProductExec with JoinType added.
  5. Override simpleStringWithNodeId in BroadcastNestedLoopJoinExec to show BuildSide, which was only done for HashJoins before.

Also updated in PR description.

@cloud-fan
Copy link
Contributor

@Eric5553 can you fix the conflicts? thanks!

@Eric5553
Copy link
Contributor Author

retest this please

@SparkQA
Copy link

SparkQA commented Feb 27, 2020

Test build #119036 has finished for PR 27595 at commit 17af74f.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Feb 27, 2020

Test build #119038 has finished for PR 27595 at commit 17af74f.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@Eric5553
Copy link
Contributor Author

@cloud-fan Sure, the conflicts have been resolved. :-)

@cloud-fan
Copy link
Contributor

thanks, merging to master!

@cloud-fan cloud-fan closed this in eba2076 Feb 28, 2020
@Eric5553 Eric5553 deleted the RefineJoin branch March 13, 2020 06:51
sjincho pushed a commit to sjincho/spark that referenced this pull request Apr 15, 2020
### What changes were proposed in this pull request?
Currently the join operators are not well abstracted, since there are lot of common logic. A trait can be created for easier pattern matching and other future handiness. This is a follow-up PR based on comment
apache#27509 (comment) .

This PR refined from the following aspects:
1. Refined structure of all physical join operators
2. Add missing joinType field for CartesianProductExec operator
3. Refined codes related to Explain Formatted

The EXPLAIN FORMATTED changes are
1. Converge all join operator `verboseStringWithOperatorId` implementations to `BaseJoinExec`. Join condition displayed, and join keys displayed if it’s not empty.
2. `apache#1` will add Join condition to `BroadcastNestedLoopJoinExec`.
3. `apache#1` will **NOT** affect `CartesianProductExec`,`SortMergeJoin` and `HashJoin`s, since they already got there override implementation before.
4. Converge all join operator `simpleStringWithNodeId` to `BaseJoinExec`, which will enhance the one line description for `CartesianProductExec` with `JoinType` added.
5. Override `simpleStringWithNodeId` in `BroadcastNestedLoopJoinExec` to show `BuildSide`, which was only done for `HashJoin`s before.

### Why are the changes needed?
Make the code consistent with other operators and for future handiness of join operators.

### Does this PR introduce any user-facing change?
No

### How was this patch tested?
Existing tests

Closes apache#27595 from Eric5553/RefineJoin.

Authored-by: Eric Wu <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants