Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ML] indicate overall deployment failure if all node routes are failed #88378

Merged

Conversation

benwtrent
Copy link
Member

If all node routes are failed, we should indicate that the whole deployment is failed through its assignment state.

The failures could be due to multiple reasons, so for detailed information, the individual node routing reasons should be investigated.

@benwtrent benwtrent added >enhancement :ml Machine learning test-full-bwc Trigger full BWC version matrix tests v8.4.0 labels Jul 8, 2022
@elasticmachine elasticmachine added the Team:ML Meta label for the ML team label Jul 8, 2022
@elasticmachine
Copy link
Collaborator

Pinging @elastic/ml-core (Team:ML)

@elasticsearchmachine
Copy link
Collaborator

Hi @benwtrent, I've created a changelog YAML for you.

Copy link
Member

@davidkyle davidkyle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@@ -519,7 +519,15 @@ public void writeTo(StreamOutput out) throws IOException {
out.writeOptionalVInt(queueCapacity);
out.writeInstant(startTime);
out.writeList(nodeStats);
out.writeOptionalEnum(state);
if (out.getVersion().onOrAfter(Version.V_8_4_0)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: the code could be structure in a way that makes it more explicit that the reason for these checks is that the FAILED state cannot be streamed to an < 8.4 node.

if (AssignmentState.FAILED.equals(state) && out.getVersion().before(Version.V_8_4_0)) {
   out.writeOptionalEnum(AssignmentState.STARTING);
} else {
    out.writeOptionalEnum(state);
}

@benwtrent
Copy link
Member Author

@elasticmachine update branch

@benwtrent
Copy link
Member Author

@elasticmachine update branch

@benwtrent benwtrent merged commit 0a94091 into elastic:master Jul 13, 2022
@benwtrent benwtrent deleted the feature/ml-add-new-failed-deployment-state branch July 13, 2022 13:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>enhancement :ml Machine learning Team:ML Meta label for the ML team test-full-bwc Trigger full BWC version matrix tests v8.4.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants