[ML] Adds basic notifications for trained model deployments #88214

dimitris-athanasiou · 2022-06-30T15:01:19Z

For specific models:

deployment started
deployment stopped

System notifications when rebalance occurrs with reasons:

model deployment started
model deployment stopped
nodes changed

For specific models: - deployment started - deployment stopped System notifications when rebalance occurrs with reasons: - model deployment started - model deployment stopped - nodes changed

elasticmachine · 2022-06-30T15:01:27Z

Pinging @elastic/ml-core (Team:ML)

...re/src/main/java/org/elasticsearch/xpack/core/common/notifications/AbstractAuditMessage.java

x-pack/plugin/ml/src/main/java/org/elasticsearch/xpack/ml/notifications/SystemAuditor.java

dimitris-athanasiou · 2022-07-04T12:11:04Z

@elasticmachine update branch

davidkyle

LGTM

~~Will you add system audit messages for node changes?~~

Sorry I missed that the message was already there

davidkyle · 2022-07-04T13:37:56Z

...-node-tests/src/javaRestTest/java/org/elasticsearch/xpack/ml/integration/PyTorchModelIT.java

+
+        assertNotificationsContain(modelId1, "Started deployment", "Stopped deployment");
+        assertNotificationsContain(modelId2, "Started deployment", "Stopped deployment");
+        assertSystemNotificationsContain("Rebalanced trained model allocations because [model deployment started]");


Shouldn't there also be a Rebalanced trained model allocations because [model deployment STOPPED] message can you test for that.

No, because the remaining model is satisfied when the other gets stopped. We only proceed with the rebalance if there are unsatisfied models after a deployment is stopped.

davidkyle · 2022-07-04T14:00:26Z

...va/org/elasticsearch/xpack/ml/inference/assignment/TrainedModelAssignmentClusterService.java

@@ -414,7 +415,12 @@ public ClusterState execute(ClusterState currentState) {

                    if (areClusterStatesCompatibleForRebalance(clusterState, currentState)) {
                        isUpdated = true;
-                        return update(currentState, rebalancedMetadata);
+                        ClusterState updatedState = update(currentState, rebalancedMetadata);
+                        if (TrainedModelAssignmentMetadata.fromState(currentState)


Can the update() method return a Tuple<Boolean, ClusterState> where the boolean indicates a change. It saves extracting the metadata from cluster state again and the equals() check

davidkyle · 2022-07-04T14:07:30Z

...va/org/elasticsearch/xpack/ml/inference/assignment/TrainedModelAssignmentClusterService.java

+                        ClusterState updatedState = update(currentState, rebalancedMetadata);
+                        if (TrainedModelAssignmentMetadata.fromState(currentState)
+                            .equals(TrainedModelAssignmentMetadata.fromState(updatedState)) == false) {
+                            systemAuditor.info(Messages.getMessage(Messages.INFERENCE_DEPLOYMENT_REBALANCED, reason));


It's better to be safe and not do the auditing inside the ClusterStateUpdateTask::execute when it could be done in clusterStateProcessed using the ML utility thread pool.

The auditor can potentially do a lot of work if it is the first time a message is audited

elasticsearch/x-pack/plugin/core/src/main/java/org/elasticsearch/xpack/core/common/notifications/AbstractAuditor.java

Line 122 in 898d849

protected void indexDoc(ToXContent toXContent) {

Good point. I have pushed a commit that addresses this.

davidkyle

LGTM2

[ML] Adds basic notifications for trained model deployments

9e9675c

For specific models: - deployment started - deployment stopped System notifications when rebalance occurrs with reasons: - model deployment started - model deployment stopped - nodes changed

dimitris-athanasiou added >non-issue :ml Machine learning v8.4.0 labels Jun 30, 2022

elasticmachine added the Team:ML Meta label for the ML team label Jun 30, 2022

benwtrent self-requested a review June 30, 2022 15:57

benwtrent reviewed Jun 30, 2022

View reviewed changes

...re/src/main/java/org/elasticsearch/xpack/core/common/notifications/AbstractAuditMessage.java Outdated Show resolved Hide resolved

x-pack/plugin/ml/src/main/java/org/elasticsearch/xpack/ml/notifications/SystemAuditor.java Show resolved Hide resolved

Address review feedback

bd55fb5

Merge branch 'master' into audit-messages-for-trained-model-deployments

7b3fc2b

davidkyle approved these changes Jul 4, 2022

View reviewed changes

davidkyle reviewed Jul 4, 2022

View reviewed changes

Auditor shouldn't run in cluster state task execute

fea8518

davidkyle approved these changes Jul 4, 2022

View reviewed changes

dimitris-athanasiou merged commit 7e9a6fe into elastic:master Jul 5, 2022

dimitris-athanasiou deleted the audit-messages-for-trained-model-deployments branch July 5, 2022 06:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ML] Adds basic notifications for trained model deployments #88214

[ML] Adds basic notifications for trained model deployments #88214

dimitris-athanasiou commented Jun 30, 2022

elasticmachine commented Jun 30, 2022

dimitris-athanasiou commented Jul 4, 2022

davidkyle left a comment •

edited

Loading

davidkyle Jul 4, 2022

dimitris-athanasiou Jul 4, 2022

davidkyle Jul 4, 2022

davidkyle Jul 4, 2022

dimitris-athanasiou Jul 4, 2022

davidkyle left a comment

[ML] Adds basic notifications for trained model deployments #88214

[ML] Adds basic notifications for trained model deployments #88214

Conversation

dimitris-athanasiou commented Jun 30, 2022

elasticmachine commented Jun 30, 2022

dimitris-athanasiou commented Jul 4, 2022

davidkyle left a comment • edited Loading

Choose a reason for hiding this comment

davidkyle Jul 4, 2022

Choose a reason for hiding this comment

dimitris-athanasiou Jul 4, 2022

Choose a reason for hiding this comment

davidkyle Jul 4, 2022

Choose a reason for hiding this comment

davidkyle Jul 4, 2022

Choose a reason for hiding this comment

dimitris-athanasiou Jul 4, 2022

Choose a reason for hiding this comment

davidkyle left a comment

Choose a reason for hiding this comment

davidkyle left a comment •

edited

Loading