-
Notifications
You must be signed in to change notification settings - Fork 28.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-32286][SQL] Coalesce bucketed table for shuffled hash join if applicable #29079
Conversation
@maropu, @cloud-fan @gatorsmile @sameeragarwal Could you help check this PR? Thanks. |
Could you show us performance numbers in the PR description, first? I think we need to check the trade-off between #parallelism and shuffle I/O. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comparing to SortMergeJoin, I think to coalesce bucketed tables on HashJoin has higher risk to fail due to OOM. Although this limits only to coalesce on stream side, we build hash map for each bucket on other side and it also sounds to OOM easily.
This feature is disabled by a config by default, so it may be okay. But we should be careful not to enable it by default later.
@viirya, thanks for comment. I agree this feature should be selectively enabled, but sorry I don't see OOM has anything to do with this feature. You are saying OOM is an issue for shuffled hash join on bucketed table, which I agree. This feature is coalescing on stream side (not touch build side at all), so I don't think it's adding any more risk for OOM on build side. As sort merge join is by default preferred over shuffled hash join, so when users enable shuffled hash join by config explicitly, they should already pay attention to OOM problem. Am I miss anything? Thanks. |
Assume you are joining two tables with 512 and 256 buckets. Without coalescing table, two tables might be shuffled to 1024 or more partitions. Building hash map is okay. When coalescing table, now you build hash map on each bucket. Each bucket now has much more data than shuffling case. It sounds more likely to OOM. |
@viirya, I see your point for coalescing reduces parallelism to cause more OOM on build side. I agree this can happen. All in all, this is a disable-by-default feature, and user can selectively enable it depending on their table size. But I think it's worth to have as it indeed helped our users in production for using shuffled hash join on bucketed tables. Re OOM issue in shuffled hash join - I think we can add a fallback mechanism when building hash map and fall back to sort merge join if the size of hash map being too big to OOM (i.e. rethink https://issues.apache.org/jira/browse/SPARK-21505), we have been running this feature in production for years, and it works well. |
@maropu, update PR description for one test query in TPCDS (with modification). Thanks. |
Test build #125736 has finished for PR 29079 at commit
|
cc @cloud-fan |
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
Outdated
Show resolved
Hide resolved
d0e12d6
to
43a59b9
Compare
Test build #125889 has finished for PR 29079 at commit
|
Test build #125903 has finished for PR 29079 at commit
|
sql/core/src/main/scala/org/apache/spark/sql/execution/bucketing/CoalesceBucketsInJoin.scala
Outdated
Show resolved
Hide resolved
The PR itself looks okay (Note that this is disabled by default and we can control this behaviour by a new separate config.) @cloud-fan @viirya @imback82 |
retest this please |
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
Outdated
Show resolved
Hide resolved
...ore/src/test/scala/org/apache/spark/sql/execution/bucketing/CoalesceBucketsInJoinSuite.scala
Outdated
Show resolved
Hide resolved
...ore/src/test/scala/org/apache/spark/sql/execution/bucketing/CoalesceBucketsInJoinSuite.scala
Outdated
Show resolved
Hide resolved
Test build #125955 has finished for PR 29079 at commit
|
sql/core/src/main/scala/org/apache/spark/sql/execution/bucketing/CoalesceBucketsInJoin.scala
Outdated
Show resolved
Hide resolved
...ore/src/test/scala/org/apache/spark/sql/execution/bucketing/CoalesceBucketsInJoinSuite.scala
Show resolved
Hide resolved
...ore/src/test/scala/org/apache/spark/sql/execution/bucketing/CoalesceBucketsInJoinSuite.scala
Outdated
Show resolved
Hide resolved
Test build #126032 has finished for PR 29079 at commit
|
retest this please |
Test build #126126 has finished for PR 29079 at commit
|
@cloud-fan - all comments are addressed, wondering is there any other things needed for this PR? Thanks. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks okay to me and I'll leave this to the others: @cloud-fan @viirya
It's too late today. I will take another look tomorrow if this is not merged yet. |
Good night~, @viirya |
...ore/src/test/scala/org/apache/spark/sql/execution/bucketing/CoalesceBucketsInJoinSuite.scala
Outdated
Show resolved
Hide resolved
...ore/src/test/scala/org/apache/spark/sql/execution/bucketing/CoalesceBucketsInJoinSuite.scala
Outdated
Show resolved
Hide resolved
...ore/src/test/scala/org/apache/spark/sql/execution/bucketing/CoalesceBucketsInJoinSuite.scala
Show resolved
Hide resolved
LGTM except for some comments in the test |
sql/core/src/main/scala/org/apache/spark/sql/execution/bucketing/CoalesceBucketsInJoin.scala
Outdated
Show resolved
Hide resolved
sql/core/src/main/scala/org/apache/spark/sql/execution/bucketing/CoalesceBucketsInJoin.scala
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Test build #126263 has finished for PR 29079 at commit
|
Test build #126266 has finished for PR 29079 at commit
|
Thanks, all! Merged to master. |
Thank you all @maropu, @cloud-fan, @viirya and @imback82 for review! |
What changes were proposed in this pull request?
Based on a follow up comment in #28123, where we can coalesce buckets for shuffled hash join as well. The note here is we only coalesce the buckets from shuffled hash join stream side (i.e. the side not building hash map), so we don't need to worry about OOM when coalescing multiple buckets in one task for building hash map.
Refactor existing physical plan rule
CoalesceBucketsInSortMergeJoin
toCoalesceBucketsInJoin
, for covering shuffled hash join as well.Refactor existing unit test
CoalesceBucketsInSortMergeJoinSuite
toCoalesceBucketsInJoinSuite
, for covering shuffled hash join as well.Why are the changes needed?
Avoid shuffle for joining different bucketed tables, is also useful for shuffled hash join. In production, we are seeing users to use shuffled hash join to join bucketed tables (set
spark.sql.join.preferSortMergeJoin
=false, to avoid sort), and this can help avoid shuffle if number of buckets are not same.Does this PR introduce any user-facing change?
No.
How was this patch tested?
Added unit tests in
CoalesceBucketsInJoinSuite
for verifying shuffled hash join physical plan.Performance number per request from @maropu
I was looking at TPCDS per suggestion from @maropu. But I found most of queries from TPCDS are doing aggregate, and only several ones are doing join. None of input tables are bucketed. So I took the approach to test a modified version of
TPCDS q93
asAnd make
store_sales
andstore_returns
to be bucketed tables.Physical query plan without coalesce:
Physical query plan with coalesce:
Run time improvement as 50% of wall clock time: