-
Notifications
You must be signed in to change notification settings - Fork 28.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-6028][Core]A new RPC implemetation based on the network module #6457
Conversation
Test build #33660 has finished for PR 6457 at commit
|
Test build #33748 has finished for PR 6457 at commit
|
Test build #33783 has finished for PR 6457 at commit
|
retest this please |
Test build #33850 has finished for PR 6457 at commit
|
Test build #33852 has finished for PR 6457 at commit
|
Test build #33854 has finished for PR 6457 at commit
|
Test build #34367 timed out for PR 6457 at commit |
Test build #34450 has finished for PR 6457 at commit
|
Test build #35687 has finished for PR 6457 at commit
|
@zsxwing can you bring this up to date? I'd like to review this this week. |
Test build #38094 has finished for PR 6457 at commit
|
retest this please |
Test build #38146 has finished for PR 6457 at commit
|
Test build #68 has finished for PR 6457 at commit
|
def registerRpcEndpoint(name: String, endpoint: RpcEndpoint): NettyRpcEndpointRef = { | ||
val addr = new NettyRpcAddress(nettyEnv.address.host, nettyEnv.address.port, name) | ||
val endpointRef = new NettyRpcEndpointRef(nettyEnv.conf, addr, nettyEnv) | ||
nameToEndpoint.put(name, new RpcEndpointPair(endpoint, endpointRef)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this check whether name
is already being used? Just in case.
|
||
private val clientConnectionExecutor = ThreadUtils.newDaemonCachedThreadPool( | ||
"netty-rpc-connection", | ||
conf.getInt("spark.rpc.connect.threads", 256)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks a bit excessive, but then it kinda sucks that TransportClientFactory.createClient
is blocking. Perhaps we should write that down as a future enhancement.
Left some minor feedback. After those are taken care of, this LGTM. @rxin did you mean to also take a look at this? |
Will take a look later. But given it is early in the release cycle & size of the patch, I'm also ok with merging it first and do post-hoc reviews & feedback addressing later. |
Test build #42806 has finished for PR 6457 at commit
|
|
||
override def onSuccess(response: Array[Byte]): Unit = { | ||
val reply = deserialize[AskResponse](response) | ||
if (reply.reply != null && reply.reply.isInstanceOf[RpcFailure]) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: null check is redundant.
LGTM; the duplicate event issue is very minor and doesn't happen with the default config (where connections per peer is 1), so not super critical to fix now. |
Test build #42872 has finished for PR 6457 at commit
|
@zsxwing did you intend to leave "netty" as the default rpc implementation? Just double checking. Let me know and I'll merge this. |
Yes. Actually it's better to use |
I'm going to merge this and review it post-hoc. |
@zsxwing @rxin I saw this test failure on Jenkins master. If this happens frequently, we may need to revert this patch. I created https://issues.apache.org/jira/browse/SPARK-10799 for the issue.
|
Also this might be related: org.apache.spark.deploy.StandaloneDynamicAllocationSuite (https://issues.apache.org/jira/browse/SPARK-10800).
|
I have already sent #8905 to fix this test |
Actually, the second is worse, which broke most recent master builds. |
@zsxwing I reverted this patch due to master build failures. Please fix the tests and re-submit a PR. Thanks! |
@mengxr I saw some race conditions in StandaloneDynamicAllocationSuite after reviewing it. This patch just exposed them. I'm thinking about how to fix them. It's better to remerge this patch after fixing |
Just realized we don't need to revert it. The new RPC can be disabled... |
…#8905 This PR just reverted 02144d6 to remerge #6457 and also included the commits in #8905. Author: zsxwing <[email protected]> Closes #8944 from zsxwing/SPARK-6028.
Design doc: https://docs.google.com/document/d/1CF5G6rGVQMKSyV_QKo4D2M-x6rxz5x1Ew7aK3Uq6u8c/edit?usp=sharing