Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-15262] Synchronize block manager / scheduler executor state #13055

Closed
wants to merge 1 commit into from

Conversation

andrewor14
Copy link
Contributor

@andrewor14 andrewor14 commented May 11, 2016

What changes were proposed in this pull request?

If an executor is still alive even after the scheduler has removed its metadata, we may receive a heartbeat from that executor and tell its block manager to reregister itself. If that happens, the block manager master will know about the executor, but the scheduler will not.

That is a dangerous situation, because when the executor does get disconnected later, the scheduler will not ask the block manager to also remove metadata for that executor. Later, when we try to clean up an RDD or a broadcast variable, we may try to send a message to that executor, triggering an exception.

How was this patch tested?

Jenkins.

@andrewor14
Copy link
Contributor Author

@zsxwing

@SparkQA
Copy link

SparkQA commented May 11, 2016

Test build #58392 has finished for PR 13055 at commit c03767f.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@zsxwing
Copy link
Member

zsxwing commented May 11, 2016

LGTM. Merging to master, 2.0 and 1.6.

asfgit pushed a commit that referenced this pull request May 11, 2016
## What changes were proposed in this pull request?

If an executor is still alive even after the scheduler has removed its metadata, we may receive a heartbeat from that executor and tell its block manager to reregister itself. If that happens, the block manager master will know about the executor, but the scheduler will not.

That is a dangerous situation, because when the executor does get disconnected later, the scheduler will not ask the block manager to also remove metadata for that executor. Later, when we try to clean up an RDD or a broadcast variable, we may try to send a message to that executor, triggering an exception.

## How was this patch tested?

Jenkins.

Author: Andrew Or <[email protected]>

Closes #13055 from andrewor14/block-manager-remove.

(cherry picked from commit 40a949a)
Signed-off-by: Shixiong Zhu <[email protected]>
@asfgit asfgit closed this in 40a949a May 11, 2016
asfgit pushed a commit that referenced this pull request May 11, 2016
## What changes were proposed in this pull request?

If an executor is still alive even after the scheduler has removed its metadata, we may receive a heartbeat from that executor and tell its block manager to reregister itself. If that happens, the block manager master will know about the executor, but the scheduler will not.

That is a dangerous situation, because when the executor does get disconnected later, the scheduler will not ask the block manager to also remove metadata for that executor. Later, when we try to clean up an RDD or a broadcast variable, we may try to send a message to that executor, triggering an exception.

## How was this patch tested?

Jenkins.

Author: Andrew Or <[email protected]>

Closes #13055 from andrewor14/block-manager-remove.

(cherry picked from commit 40a949a)
Signed-off-by: Shixiong Zhu <[email protected]>
@andrewor14 andrewor14 deleted the block-manager-remove branch May 11, 2016 20:47
zzcclp pushed a commit to zzcclp/spark that referenced this pull request May 12, 2016
## What changes were proposed in this pull request?

If an executor is still alive even after the scheduler has removed its metadata, we may receive a heartbeat from that executor and tell its block manager to reregister itself. If that happens, the block manager master will know about the executor, but the scheduler will not.

That is a dangerous situation, because when the executor does get disconnected later, the scheduler will not ask the block manager to also remove metadata for that executor. Later, when we try to clean up an RDD or a broadcast variable, we may try to send a message to that executor, triggering an exception.

## How was this patch tested?

Jenkins.

Author: Andrew Or <[email protected]>

Closes apache#13055 from andrewor14/block-manager-remove.

(cherry picked from commit 40a949a)
Signed-off-by: Shixiong Zhu <[email protected]>
(cherry picked from commit e2a43d0)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants