Skip to content

Commit

Permalink
[SPARK-15262] Synchronize block manager / scheduler executor state
Browse files Browse the repository at this point in the history
## What changes were proposed in this pull request?

If an executor is still alive even after the scheduler has removed its metadata, we may receive a heartbeat from that executor and tell its block manager to reregister itself. If that happens, the block manager master will know about the executor, but the scheduler will not.

That is a dangerous situation, because when the executor does get disconnected later, the scheduler will not ask the block manager to also remove metadata for that executor. Later, when we try to clean up an RDD or a broadcast variable, we may try to send a message to that executor, triggering an exception.

## How was this patch tested?

Jenkins.

Author: Andrew Or <[email protected]>

Closes apache#13055 from andrewor14/block-manager-remove.

(cherry picked from commit 40a949a)
Signed-off-by: Shixiong Zhu <[email protected]>
(cherry picked from commit e2a43d0)
  • Loading branch information
Andrew Or authored and zzcclp committed May 12, 2016
1 parent 20e5755 commit 7acd3e9
Showing 1 changed file with 8 additions and 1 deletion.
Original file line number Diff line number Diff line change
Expand Up @@ -275,7 +275,14 @@ class CoarseGrainedSchedulerBackend(scheduler: TaskSchedulerImpl, val rpcEnv: Rp
scheduler.executorLost(executorId, if (killed) ExecutorKilled else reason)
listenerBus.post(
SparkListenerExecutorRemoved(System.currentTimeMillis(), executorId, reason.toString))
case None => logInfo(s"Asked to remove non-existent executor $executorId")
case None =>
// SPARK-15262: If an executor is still alive even after the scheduler has removed
// its metadata, we may receive a heartbeat from that executor and tell its block
// manager to reregister itself. If that happens, the block manager master will know
// about the executor, but the scheduler will not. Therefore, we should remove the
// executor from the block manager when we hit this case.
scheduler.sc.env.blockManager.master.removeExecutor(executorId)
logInfo(s"Asked to remove non-existent executor $executorId")
}
}

Expand Down

0 comments on commit 7acd3e9

Please sign in to comment.