Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix ml autoscaling for zero allocations #114982

Merged
merged 4 commits into from
Oct 17, 2024

Conversation

jan-elastic
Copy link
Contributor

Fixes: #114930

@jan-elastic jan-elastic requested a review from davidkyle October 17, 2024 09:38
@jan-elastic jan-elastic added :ml Machine learning Team:ML Meta label for the ML team >non-issue v8.16.0 v9.0.0 auto-backport Automatically create backport pull requests when merged labels Oct 17, 2024
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/ml-core (Team:ML)

if (assignment.getNodeRoutingTable().isEmpty() == false
&& assignment.getNodeRoutingTable().values().stream().allMatch(r -> r.getState().consumesMemory() == false)) {
// Ignore states that don't consume memory, for example all allocations are failed or stopped
// if the node routing table is empty, then it will match the above condition, but it needs to be handled in the next branch
continue;
}

if (assignment.getNodeRoutingTable().isEmpty() == false) {
Copy link
Contributor Author

@jan-elastic jan-elastic Oct 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Everything down here in this file is just indentation.

If you want to review it, I'd recommend settings -> hide whitespace

@jan-elastic jan-elastic force-pushed the fix-ml-autoscaling-for-zero-allocations branch from 9714a73 to d4d069d Compare October 17, 2024 09:46
@@ -623,6 +623,9 @@ public String getDeploymentId() {
* @return the estimated memory (in bytes) required for the model deployment to run
*/
public long estimateMemoryUsageBytes() {
if (numberOfAllocations == 0) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method is on TaskParams - StartTrainedModelDeploymentAction.TaskParams. estimateMemoryUsageBytes()

There is another public method StartTrainedModelDeploymentAction.estimateMemoryUsageBytes() on line 792 that needs this check.

If StartTrainedModelDeploymentAction.estimateMemoryUsageBytes() can return - then line 635 + (cacheSize.getBytes() - modelBytes); needs a Max.(0,...) to ensure the return is non-negative

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed StartTrainedModelDeploymentAction.estimateMemoryUsageBytes

I don't think the Max is necessary. StartTrainedModelDeploymentAction.estimateMemoryUsageBytes returns 0 only if the number of allocations is 0, in which case the TaskParams.estimateMemoryUsageBytes already returns 0.

Copy link
Member

@davidkyle davidkyle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@jan-elastic jan-elastic merged commit 12062cb into main Oct 17, 2024
17 checks passed
@jan-elastic jan-elastic deleted the fix-ml-autoscaling-for-zero-allocations branch October 17, 2024 11:56
@elasticsearchmachine
Copy link
Collaborator

💔 Backport failed

The backport operation could not be completed due to the following error:

An unexpected error occurred when attempting to backport this PR.

You can use sqren/backport to manually backport by running backport --upstream elastic/elasticsearch --pr 114982

jan-elastic added a commit that referenced this pull request Oct 17, 2024
* Fix estimated memory usage for a model with zero allocations.

* Ignore number of threads of models with zero allocations in autoscaling decisions.

* Add some long overdue comments.

* Another estimateMemoryUsageBytes fix
elasticsearchmachine pushed a commit that referenced this pull request Oct 17, 2024
* Fix estimated memory usage for a model with zero allocations.

* Ignore number of threads of models with zero allocations in autoscaling decisions.

* Add some long overdue comments.

* Another estimateMemoryUsageBytes fix
jan-elastic added a commit that referenced this pull request Oct 18, 2024
* Fix estimated memory usage for a model with zero allocations.

* Ignore number of threads of models with zero allocations in autoscaling decisions.

* Add some long overdue comments.

* Another estimateMemoryUsageBytes fix
elasticsearchmachine pushed a commit that referenced this pull request Oct 18, 2024
* Fix estimated memory usage for a model with zero allocations.

* Ignore number of threads of models with zero allocations in autoscaling decisions.

* Add some long overdue comments.

* Another estimateMemoryUsageBytes fix
georgewallace pushed a commit to georgewallace/elasticsearch that referenced this pull request Oct 25, 2024
* Fix estimated memory usage for a model with zero allocations.

* Ignore number of threads of models with zero allocations in autoscaling decisions.

* Add some long overdue comments.

* Another estimateMemoryUsageBytes fix
jfreden pushed a commit to jfreden/elasticsearch that referenced this pull request Nov 4, 2024
* Fix estimated memory usage for a model with zero allocations.

* Ignore number of threads of models with zero allocations in autoscaling decisions.

* Add some long overdue comments.

* Another estimateMemoryUsageBytes fix
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
auto-backport Automatically create backport pull requests when merged backport pending :ml Machine learning >non-issue Team:ML Meta label for the ML team v8.16.0 v9.0.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[ML] ML nodes autoscaling not down to 0 in stateful and serverless
3 participants