-
Notifications
You must be signed in to change notification settings - Fork 28.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-20843][Core]Add a config to set driver terminate timeout #18126
Conversation
Looks ok. I wonder if keeping the old behavior by default wouldn't be better, to avoid surprising users who upgrade and run into this. |
This is the behavior in 2.1.0, if we change the default value to I'm inclined to keep it as 2.1.0. |
Ah, I thought the original change was in 2.2 only. Looks good then. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, just one thought on the property name but it is fine as is. In hind-sight, 10s is pretty short for a driver timeout... thanks for putting up a quick fix
@@ -57,7 +57,8 @@ private[deploy] class DriverRunner( | |||
@volatile private[worker] var finalException: Option[Exception] = None | |||
|
|||
// Timeout to wait for when trying to terminate a driver. | |||
private val DRIVER_TERMINATE_TIMEOUT_MS = 10 * 1000 | |||
private val DRIVER_TERMINATE_TIMEOUT_MS = | |||
conf.getTimeAsMs("spark.worker.driverTerminateTimeout", "10s") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just wondering if maybe adding something to the property to be clear that this is for a driver with deploy mode cluster only? Although it is prefixed with worker
so maybe that is good enough.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
spark.worker
means this is only for Spark workers, so I think it should be obvious. Do you have a better config name?
This is usually not a problem. If worker is trying to kill a driver, it often means the driver is unhealthy or being killed by the user intentionally. 10 seconds to allow shutdown hooks cleaning up resources such as deleting temp files is usually enough. Given that replying on shutdown hooks to persist data is not common, and we have a configuration for special cases, I think it's fine. |
No, I couldn't think of anything else without it being too long winded. I
agree that the `worker` prefix gives enough meaning, plus whomever uses
this should already know the context that it's intended for.
…On Fri, May 26, 2017 at 4:37 PM, Shixiong Zhu ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In core/src/main/scala/org/apache/spark/deploy/worker/DriverRunner.scala
<#18126 (comment)>:
> @@ -57,7 +57,8 @@ private[deploy] class DriverRunner(
@volatile private[worker] var finalException: Option[Exception] = None
// Timeout to wait for when trying to terminate a driver.
- private val DRIVER_TERMINATE_TIMEOUT_MS = 10 * 1000
+ private val DRIVER_TERMINATE_TIMEOUT_MS =
+ conf.getTimeAsMs("spark.worker.driverTerminateTimeout", "10s")
spark.worker means this is only for Spark workers, so I think it should
be obvious. Do you have a better config name?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#18126 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AEUwdQJ6gVTbxvCqyzvN8y5l0OtfzwOEks5r92IhgaJpZM4NoEr0>
.
|
Test build #77438 has finished for PR 18126 at commit
|
Thanks! Merging to master, 2.2 and 2.1. |
## What changes were proposed in this pull request? Add a `worker` configuration to set how long to wait before forcibly killing driver. ## How was this patch tested? Jenkins Author: Shixiong Zhu <[email protected]> Closes #18126 from zsxwing/SPARK-20843. (cherry picked from commit 6c1dbd6) Signed-off-by: Shixiong Zhu <[email protected]>
## What changes were proposed in this pull request? Add a `worker` configuration to set how long to wait before forcibly killing driver. ## How was this patch tested? Jenkins Author: Shixiong Zhu <[email protected]> Closes #18126 from zsxwing/SPARK-20843. (cherry picked from commit 6c1dbd6) Signed-off-by: Shixiong Zhu <[email protected]>
What changes were proposed in this pull request?
Add a
worker
configuration to set how long to wait before forcibly killing driver.How was this patch tested?
Jenkins