-
Notifications
You must be signed in to change notification settings - Fork 28.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-32282][SQL] Improve EnsureRquirement.reorderJoinKeys to handle more scenarios such as PartitioningCollection #29074
Conversation
leftKeys, rightKeys, UnknownPartitioning(0), rightPartitioning)) | ||
case (_, HashPartitioning(rightExpressions, _)) => | ||
reorder(leftKeys.toIndexedSeq, rightKeys.toIndexedSeq, rightExpressions, rightKeys) | ||
.orElse(reorderJoinKeysRecursively( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can be also implemented by looking at left partitioning first then move to the right partitionoing:
(leftPartitioning, rightPartitioning) match {
case (HashPartitioning(leftExpressions, _), _) =>
reorder(leftKeys.toIndexedSeq, rightKeys.toIndexedSeq, leftExpressions, leftKeys)
.orElse(reorderJoinKeysRecursively(
leftKeys, rightKeys, UnknownPartitioning(0), rightPartitioning))
case (PartitioningCollection(partitionings), _) =>
partitionings.foreach { p =>
reorderJoinKeysRecursively(leftKeys, rightKeys, p, rightPartitioning).map { k =>
return Some(k)
}
}
reorderJoinKeysRecursively(leftKeys, rightKeys, UnknownPartitioning(0), rightPartitioning)
case (_, HashPartitioning(rightExpressions, _)) =>
reorder(leftKeys.toIndexedSeq, rightKeys.toIndexedSeq, rightExpressions, rightKeys)
case (_, PartitioningCollection(partitionings)) =>
partitionings.foreach { p =>
reorderJoinKeysRecursively(leftKeys, rightKeys, leftPartitioning, p).map { k =>
return Some(k)
}
}
None
case _ =>
None
}
However, I chose this way so that the behavior remains the same. If you have leftPartitioning = PartitioningCollection
and rightPartitioning = HashPartitioning
, it will match the rightPartitioning
first, which is the existing behavior.
Test build #125704 has finished for PR 29074 at commit
|
retest this please |
Test build #125722 has finished for PR 29074 at commit
|
cc: @maropu @cloud-fan |
also cc: @viirya |
Test build #125890 has finished for PR 29074 at commit
|
Test build #125917 has finished for PR 29074 at commit
|
Gentle ping @cloud-fan / @maropu / @viirya |
Test build #126928 has finished for PR 29074 at commit
|
sql/core/src/test/scala/org/apache/spark/sql/execution/PlannerSuite.scala
Outdated
Show resolved
Hide resolved
sql/core/src/test/scala/org/apache/spark/sql/execution/PlannerSuite.scala
Outdated
Show resolved
Hide resolved
sql/core/src/test/scala/org/apache/spark/sql/execution/PlannerSuite.scala
Outdated
Show resolved
Hide resolved
sql/core/src/test/scala/org/apache/spark/sql/execution/PlannerSuite.scala
Outdated
Show resolved
Hide resolved
Test build #127164 has finished for PR 29074 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks okay except for the existing minor comments. @cloud-fan @viirya
sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/EnsureRequirements.scala
Outdated
Show resolved
Hide resolved
sql/core/src/test/scala/org/apache/spark/sql/execution/exchange/EnsureRequirementsSuite.scala
Outdated
Show resolved
Hide resolved
sql/core/src/test/scala/org/apache/spark/sql/execution/exchange/EnsureRequirementsSuite.scala
Outdated
Show resolved
Hide resolved
Test build #127213 has finished for PR 29074 at commit
|
retest this please |
Test build #127775 has finished for PR 29074 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @imback82 for adding this. This is important for complex bucketed-related queries to avoid shuffle. LGTM, just a nit.
sql/core/src/test/scala/org/apache/spark/sql/execution/exchange/EnsureRequirementsSuite.scala
Outdated
Show resolved
Hide resolved
Test build #127850 has finished for PR 29074 at commit
|
Gentle ping. @cloud-fan, do you think this PR can move forward? Thanks in advance! |
retest this please |
Test build #128455 has finished for PR 29074 at commit
|
retest this please |
Kubernetes integration test starting |
Kubernetes integration test status success |
Test build #129183 has finished for PR 29074 at commit
|
case (Some(PartitioningCollection(partitionings)), _) => | ||
partitionings.foreach { p => | ||
reorderJoinKeysRecursively(leftKeys, rightKeys, Some(p), rightPartitioning).map { k => | ||
return Some(k) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit:
partitionings.foldLeft(None) { (res, p) =>
res.orElse(reorderJoinKeysRecursively...)
}.getOrElse(reorderJoinKeysRecursively(leftKeys, rightKeys, None, rightPartitioning))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, updated.
ShuffleExchangeExec(HashPartitioning(rightPartitioningExpressions, _), _, _), _), _) => | ||
assert(leftKeys !== smjExec1.leftKeys) | ||
assert(rightKeys !== smjExec1.rightKeys) | ||
assert(leftKeys === leftPartitionings.head.asInstanceOf[HashPartitioning].expressions) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we simply check leftKeys === Seq(exprA, exprB)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK. I simplified the checks in this test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM except some minor comments
Kubernetes integration test starting |
Kubernetes integration test status success |
Test build #129483 has finished for PR 29074 at commit
|
res.orElse(reorderJoinKeysRecursively(leftKeys, rightKeys, Some(p), rightPartitioning)) | ||
}.orElse(reorderJoinKeysRecursively(leftKeys, rightKeys, None, rightPartitioning)) | ||
case (_, Some(PartitioningCollection(partitionings))) => | ||
partitionings.foreach { p => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you do the same refactor here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
case SortMergeJoinExec(leftKeys, rightKeys, _, _, | ||
SortExec(_, _, DummySparkPlan(_, _, _: PartitioningCollection, _, _), _), | ||
SortExec(_, _, ShuffleExchangeExec(_: HashPartitioning, _, _), _), _) => | ||
assert(leftKeys !== smjExec1.leftKeys) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this check needed? We already check leftKeys === Seq(exprA, exprB)
and it's obvious that leftKeys !== smjExec1.leftKeys
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed, thanks!
Kubernetes integration test starting |
Kubernetes integration test status failure |
Test build #129523 has finished for PR 29074 at commit
|
thanks, merging to master! |
What changes were proposed in this pull request?
This PR proposes to improve
EnsureRquirement.reorderJoinKeys
to handle the following scenarios:HashPartitioning
, consider the right-sideHashPartitioning
.PartitioningCollection
, which may containHashPartitioning
Why are the changes needed?
HashPartitioning
or the right-sideHashPartitioning
. This means that if both sides areHashPartitioning
, it will try to match only the left side.The following will not consider the right-side
HashPartitioning
:PartitioningCollection
:Does this PR introduce any user-facing change?
Yes, now from the above examples, the shuffle/sort nodes pointed by
This can be removed
are now removed:How was this patch tested?
Added tests.