-
Notifications
You must be signed in to change notification settings - Fork 28.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-17556] [CORE] [SQL] Executor side broadcast for broadcast joins #15240
Conversation
Test build #65907 has finished for PR 15240 at commit
|
Test build #65909 has finished for PR 15240 at commit
|
Test build #65913 has finished for PR 15240 at commit
|
Test build #65916 has finished for PR 15240 at commit
|
Test build #65914 has finished for PR 15240 at commit
|
Test build #65918 has finished for PR 15240 at commit
|
Test build #65925 has finished for PR 15240 at commit
|
Test build #66030 has finished for PR 15240 at commit
|
Test build #66088 has finished for PR 15240 at commit
|
Test build #66095 has finished for PR 15240 at commit
|
/cc @rxin can you help review this? |
Test build #70745 has finished for PR 15240 at commit
|
Test build #70804 has started for PR 15240 at commit |
Test build #70807 has finished for PR 15240 at commit
|
Test build #70818 has finished for PR 15240 at commit
|
retest this please |
This reverts commit 76dfc20.
Test build #70867 has finished for PR 15240 at commit
|
Are you still working on this? @scwf |
@jiangxb1987 there's another solution to this JIRA at #15178. We take different approaches. You can also take a look. |
What changes were proposed in this pull request?
JIRA: https://issues.apache.org/jira/browse/SPARK-17556
Design doc :
https://issues.apache.org/jira/secure/attachment/12830668/executor%20broadcast.pdf
Added two api for RDD to perform executor side broadcast and apply it on sql's broadcast join.
[1].
def broadcast[U: ClassTag](f: Iterator[T] => U): Broadcast[U]
User only need pass a function to describe how to translate all the element of the rdd to the value they want to broadcast
[2].
def broadcast[U: ClassTag](transFunc: TransFunc[T, U]): Broadcast[U]
This is only used in spark sql(spark internal), TransFunc is a interface to describe how to translate all the element of the rdd to a single value.
TransFunc is inherited by BroadcastMode in spark sql.
When construct broadcast, firstly it write blocks to block manager from executor and then create
Broadcast
from driver(not write blocks)How was this patch tested?
added unit test and manual test.