Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ML] Cap graceful shutdown time #76342

Merged
merged 1 commit into from
Aug 11, 2021

Conversation

droberts195
Copy link
Contributor

The node shutdown work done in #75188 did not impose any
upper bound on the time allowed for ML jobs to shut down
gracefully.

This change imposes a cap of 10 minutes on shutdown time.
In reality closing ML jobs shouldn't take this long, but
we don't want ML to stall the shutdown process forever
due to a bug, and require user intervention to recover.

The node shutdown work done in elastic#75188 did not impose any
upper bound on the time allowed for ML jobs to shut down
gracefully.

This change imposes a cap of 10 minutes on shutdown time.
In reality closing ML jobs shouldn't take this long, but
we don't want ML to stall the shutdown process forever
due to a bug, and require user intervention to recover.
@elasticmachine elasticmachine added the Team:ML Meta label for the ML team label Aug 11, 2021
@elasticmachine
Copy link
Collaborator

Pinging @elastic/ml-core (Team:ML)

@droberts195
Copy link
Contributor Author

>non-issue as it's a tweak to unreleased functionality.

@droberts195 droberts195 merged commit 8f5e457 into elastic:master Aug 11, 2021
@droberts195 droberts195 deleted the cap_graceful_shutdown_time branch August 11, 2021 11:44
elasticsearchmachine pushed a commit that referenced this pull request Aug 11, 2021
The node shutdown work done in #75188 did not impose any
upper bound on the time allowed for ML jobs to shut down
gracefully.

This change imposes a cap of 10 minutes on shutdown time.
In reality closing ML jobs shouldn't take this long, but
we don't want ML to stall the shutdown process forever
due to a bug, and require user intervention to recover.

Backport of #76342
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:ml Machine learning >non-issue Team:ML Meta label for the ML team v7.15.0 v8.0.0-alpha2
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants