[SPARK-4946] [CORE] Using AkkaUtils.askWithReply in MapOutputTracker.askTracker to reduce the chance of the communicating problem #3785

YanTangZhai · 2014-12-24T06:20:40Z

Using AkkaUtils.askWithReply in MapOutputTracker.askTracker to reduce the chance of the communicating problem

update

Update

update

Update

update

…askTracker to reduce the chance of the communicating problem

SparkQA · 2014-12-24T06:22:29Z

Test build #24765 has started for PR 3785 at commit 9ca6541.

This patch merges cleanly.

SparkQA · 2014-12-24T07:46:27Z

Test build #24765 has finished for PR 3785 at commit 9ca6541.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

AmplabJenkins · 2014-12-24T07:46:31Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24765/
Test PASSed.

JoshRosen · 2014-12-29T19:23:08Z

On the surface, this seems like an okay change. I wonder whether this retry logic could have unexpected consequences. Let me try to reason it out:

askTracker is only called with GetMapOutputStatuses.
In the master actor, it calls getSerializedMapOutputStatuses. This method never throws exceptions: if a shuffle is missing, then it just stores an empty array and serializes it.
It's possible that the serialized map statuses could exceed the Akka frame size (although extremely unlikely and perhaps impossible with the new output status compression techniques). In this case, though, the master would throw an exception and fail to send a reply back to the asker. In this case, with this patch we'd end up performing a bunch of retries for an operation that will ultimately fail, so we'll take longer to detect a failure.

In the common cases, though, this seems fine, even if the map output statuses are missing (since it won't introduce a bunch of futile retries). Therefore, I think we should pull this in; I don't know if this fixes an actual bug, but it seems like it could make things more robust.

JoshRosen · 2014-12-29T19:30:58Z

Alright, I'm going to merge this into master (1.3.0). Thanks!

YanTangZhai and others added 11 commits August 6, 2014 21:07

Merge pull request #1 from apache/master

cdef539

update

Merge pull request #3 from apache/master

cbcba66

Update

Merge pull request #6 from apache/master

8a00106

Update

Merge pull request #7 from apache/master

03b62b0

Update

Merge pull request #8 from apache/master

76d4027

update

Merge pull request #9 from apache/master

d26d982

Update

Merge pull request #10 from apache/master

e249846

Update

Merge pull request #11 from apache/master

6e643f8

Update

Merge pull request #12 from apache/master

718afeb

update

Merge pull request #15 from apache/master

e4c2c0a

update

[SPARK-4946] [CORE] Using AkkaUtils.askWithReply in MapOutputTracker.…

9ca6541

…askTracker to reduce the chance of the communicating problem

asfgit closed this in 815de54 Dec 29, 2014

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-4946] [CORE] Using AkkaUtils.askWithReply in MapOutputTracker.askTracker to reduce the chance of the communicating problem #3785

[SPARK-4946] [CORE] Using AkkaUtils.askWithReply in MapOutputTracker.askTracker to reduce the chance of the communicating problem #3785

YanTangZhai commented Dec 24, 2014

SparkQA commented Dec 24, 2014

SparkQA commented Dec 24, 2014

AmplabJenkins commented Dec 24, 2014

JoshRosen commented Dec 29, 2014

JoshRosen commented Dec 29, 2014

[SPARK-4946] [CORE] Using AkkaUtils.askWithReply in MapOutputTracker.askTracker to reduce the chance of the communicating problem #3785

[SPARK-4946] [CORE] Using AkkaUtils.askWithReply in MapOutputTracker.askTracker to reduce the chance of the communicating problem #3785

Conversation

YanTangZhai commented Dec 24, 2014

SparkQA commented Dec 24, 2014

SparkQA commented Dec 24, 2014

AmplabJenkins commented Dec 24, 2014

JoshRosen commented Dec 29, 2014

JoshRosen commented Dec 29, 2014