-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ML] Adds basic notifications for trained model deployments #88214
[ML] Adds basic notifications for trained model deployments #88214
Conversation
For specific models: - deployment started - deployment stopped System notifications when rebalance occurrs with reasons: - model deployment started - model deployment stopped - nodes changed
Pinging @elastic/ml-core (Team:ML) |
...re/src/main/java/org/elasticsearch/xpack/core/common/notifications/AbstractAuditMessage.java
Outdated
Show resolved
Hide resolved
x-pack/plugin/ml/src/main/java/org/elasticsearch/xpack/ml/notifications/SystemAuditor.java
Show resolved
Hide resolved
@elasticmachine update branch |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Will you add system audit messages for node changes?
Sorry I missed that the message was already there
|
||
assertNotificationsContain(modelId1, "Started deployment", "Stopped deployment"); | ||
assertNotificationsContain(modelId2, "Started deployment", "Stopped deployment"); | ||
assertSystemNotificationsContain("Rebalanced trained model allocations because [model deployment started]"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't there also be a Rebalanced trained model allocations because [model deployment STOPPED]
message can you test for that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, because the remaining model is satisfied when the other gets stopped. We only proceed with the rebalance if there are unsatisfied models after a deployment is stopped.
@@ -414,7 +415,12 @@ public ClusterState execute(ClusterState currentState) { | |||
|
|||
if (areClusterStatesCompatibleForRebalance(clusterState, currentState)) { | |||
isUpdated = true; | |||
return update(currentState, rebalancedMetadata); | |||
ClusterState updatedState = update(currentState, rebalancedMetadata); | |||
if (TrainedModelAssignmentMetadata.fromState(currentState) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can the update()
method return a Tuple<Boolean, ClusterState>
where the boolean indicates a change. It saves extracting the metadata from cluster state again and the equals()
check
ClusterState updatedState = update(currentState, rebalancedMetadata); | ||
if (TrainedModelAssignmentMetadata.fromState(currentState) | ||
.equals(TrainedModelAssignmentMetadata.fromState(updatedState)) == false) { | ||
systemAuditor.info(Messages.getMessage(Messages.INFERENCE_DEPLOYMENT_REBALANCED, reason)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's better to be safe and not do the auditing inside the ClusterStateUpdateTask::execute
when it could be done in clusterStateProcessed
using the ML utility thread pool.
The auditor can potentially do a lot of work if it is the first time a message is audited
Line 122 in 898d849
protected void indexDoc(ToXContent toXContent) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. I have pushed a commit that addresses this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM2
For specific models:
System notifications when rebalance occurrs with reasons: