Fix ml autoscaling for zero allocations #114982

jan-elastic · 2024-10-17T09:37:57Z

elasticsearchmachine · 2024-10-17T09:38:57Z

Pinging @elastic/ml-core (Team:ML)

jan-elastic · 2024-10-17T09:39:41Z

...in/ml/src/main/java/org/elasticsearch/xpack/ml/autoscaling/MlAutoscalingResourceTracker.java

            if (assignment.getNodeRoutingTable().isEmpty() == false
                && assignment.getNodeRoutingTable().values().stream().allMatch(r -> r.getState().consumesMemory() == false)) {
                // Ignore states that don't consume memory, for example all allocations are failed or stopped
                // if the node routing table is empty, then it will match the above condition, but it needs to be handled in the next branch
                continue;
+            }
+
+            if (assignment.getNodeRoutingTable().isEmpty() == false) {


Everything down here in this file is just indentation.

If you want to review it, I'd recommend settings -> hide whitespace

…ng decisions.

davidkyle · 2024-10-17T09:51:21Z

.../src/main/java/org/elasticsearch/xpack/core/ml/action/StartTrainedModelDeploymentAction.java

@@ -623,6 +623,9 @@ public String getDeploymentId() {
         * @return the estimated memory (in bytes) required for the model deployment to run
         */
        public long estimateMemoryUsageBytes() {
+            if (numberOfAllocations == 0) {


This method is on TaskParams - StartTrainedModelDeploymentAction.TaskParams. estimateMemoryUsageBytes()

There is another public method StartTrainedModelDeploymentAction.estimateMemoryUsageBytes() on line 792 that needs this check.

If StartTrainedModelDeploymentAction.estimateMemoryUsageBytes() can return - then line 635 + (cacheSize.getBytes() - modelBytes); needs a Max.(0,...) to ensure the return is non-negative

Fixed StartTrainedModelDeploymentAction.estimateMemoryUsageBytes

I don't think the Max is necessary. StartTrainedModelDeploymentAction.estimateMemoryUsageBytes returns 0 only if the number of allocations is 0, in which case the TaskParams.estimateMemoryUsageBytes already returns 0.

davidkyle

LGTM

elasticsearchmachine · 2024-10-17T11:56:24Z

💔 Backport failed

The backport operation could not be completed due to the following error:

An unexpected error occurred when attempting to backport this PR.

You can use sqren/backport to manually backport by running backport --upstream elastic/elasticsearch --pr 114982

* Fix estimated memory usage for a model with zero allocations. * Ignore number of threads of models with zero allocations in autoscaling decisions. * Add some long overdue comments. * Another estimateMemoryUsageBytes fix

jan-elastic requested a review from davidkyle October 17, 2024 09:38

jan-elastic added :ml Machine learning Team:ML Meta label for the ML team >non-issue v8.16.0 v9.0.0 auto-backport Automatically create backport pull requests when merged labels Oct 17, 2024

jan-elastic commented Oct 17, 2024

View reviewed changes

jan-elastic added 2 commits October 17, 2024 11:46

Fix estimated memory usage for a model with zero allocations.

e5c8da6

Ignore number of threads of models with zero allocations in autoscali…

d4d069d

…ng decisions.

jan-elastic force-pushed the fix-ml-autoscaling-for-zero-allocations branch from 9714a73 to d4d069d Compare October 17, 2024 09:46

davidkyle reviewed Oct 17, 2024

View reviewed changes

jan-elastic added 2 commits October 17, 2024 12:16

Add some long overdue comments.

2da5870

Another estimateMemoryUsageBytes fix

aa12b95

davidkyle approved these changes Oct 17, 2024

View reviewed changes

jan-elastic merged commit 12062cb into main Oct 17, 2024
17 checks passed

jan-elastic deleted the fix-ml-autoscaling-for-zero-allocations branch October 17, 2024 11:56

elasticsearchmachine added the backport pending label Oct 17, 2024

jan-elastic mentioned this pull request Oct 17, 2024

[8.x] Fix ml autoscaling for zero allocations #114993

Merged

wwang500 mentioned this pull request Oct 17, 2024

[ML] ML nodes autoscaling not down to 0 in stateful and serverless #114930

Closed

jan-elastic mentioned this pull request Oct 18, 2024

[8.16] Fix ml autoscaling for zero allocations #115125

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix ml autoscaling for zero allocations #114982

Fix ml autoscaling for zero allocations #114982

jan-elastic commented Oct 17, 2024

elasticsearchmachine commented Oct 17, 2024

jan-elastic Oct 17, 2024 •

edited

Loading

davidkyle Oct 17, 2024

jan-elastic Oct 17, 2024

davidkyle left a comment

elasticsearchmachine commented Oct 17, 2024

Fix ml autoscaling for zero allocations #114982

Fix ml autoscaling for zero allocations #114982

Conversation

jan-elastic commented Oct 17, 2024

elasticsearchmachine commented Oct 17, 2024

jan-elastic Oct 17, 2024 • edited Loading

Choose a reason for hiding this comment

davidkyle Oct 17, 2024

Choose a reason for hiding this comment

jan-elastic Oct 17, 2024

Choose a reason for hiding this comment

davidkyle left a comment

Choose a reason for hiding this comment

elasticsearchmachine commented Oct 17, 2024

💔 Backport failed

jan-elastic Oct 17, 2024 •

edited

Loading