[ML] ML nodes autoscaling not down to 0 in stateful and serverless #114930

wwang500 · 2024-10-16T16:46:50Z

Environment

Stateful cloud (gcp-us-west2)
Serverless QA

build

  "build": {
    "hash": "8ccfb227c2131c859033f409ee37a87023fada62",
    "date": "2024-10-16T05:50:43.944345200Z"
  }

Step to reproduce

Deploy a serverless or stateful cluster, for stateful cluster, make sure ML autoscaling is ON
Create an inference endpoint with adaptive allocation ON

PUT _inference/sparse_embedding/elser-endpoint
{
  "service": "elser", 
  "service_settings": {"num_threads": 4, "adaptive_allocations": {"enabled": true}}
}

wait a few minutes for scaling up event, ml node available and allocated to that inference endpoint, you can confirm by running: GET _ml/trained_models/elser-endpoint/_stats
Run inference, you can follow the steps in this tutorial, https://www.elastic.co/guide/en/elasticsearch/reference/current/semantic-search-semantic-text.html
after that, wait at least 15 minutes for allocation to scale down to 0

{
  "count": 1,
  "trained_model_stats": [
    {
      "model_id": ".elser_model_2_linux-x86_64",
      "model_size_stats": {
        "model_size_bytes": 274756282,
        "required_native_memory_bytes": 2101346304
      },
      "pipeline_count": 1,
      "ingest": {
        "total": {
          "count": 0,
          "time_in_millis": 0,
          "current": 0,
          "failed": 0
        },
        "pipelines": {
          ".kibana-elastic-ai-assistant-ingest-pipeline-knowledge-base": {
            "count": 0,
            "time_in_millis": 0,
            "current": 0,
            "failed": 0,
            "ingested_as_first_pipeline_in_bytes": 0,
            "produced_as_first_pipeline_in_bytes": 0,
            "processors": [
              {
                "inference": {
                  "type": "inference",
                  "stats": {
                    "count": 0,
                    "time_in_millis": 0,
                    "current": 0,
                    "failed": 0
                  }
                }
              }
            ]
          }
        }
      },
      "inference_stats": {
        "failure_count": 0,
        "inference_count": 0,
        "cache_miss_count": 0,
        "missing_all_fields_count": 0,
        "timestamp": 1729097542245
      },
      "deployment_stats": {
        "deployment_id": "elser-endpoint",
        "model_id": ".elser_model_2_linux-x86_64",
        "threads_per_allocation": 4,
        "number_of_allocations": 0,
        "adaptive_allocations": {
          "enabled": true
        },
        "queue_capacity": 1024,
        "state": "started",
        "allocation_status": {
          "allocation_count": 0,
          "target_allocation_count": 0,
          "state": "fully_allocated"
        },
        "cache_size": "262mb",
        "priority": "normal",
        "start_time": 1729044099355,
        "peak_throughput_per_minute": 0,
        "nodes": []
      }
    }
  ]
}

after allocation scales down to 0, ml nodes autoscaling (down to 0) should happen in ~1 hour
Observed:

After hours wait, ml nodes autoscaling (down to 0) didnt happen

for stateful, GET /_autoscaling/capacity/ returns:

"ml": {
      "required_capacity": {
        "node": {
          "memory": 0,
          "processors": 4
        },
        "total": {
          "memory": 0,
          "processors": 0
        }
      },
      "current_capacity": {
        "node": {
          "storage": 0,
          "memory": 8585740288,
          "processors": 4
        },
        "total": {
          "storage": 0,
          "memory": 17171480576,
          "processors": 8
        }
      },
      "current_nodes": [
        {
          "name": "instance-0000000003"
        },
        {
          "name": "instance-0000000004"
        }
      ],
      "deciders": {
        "ml": {
          "required_capacity": {
            "node": {
              "memory": 0,
              "processors": 4
            },
            "total": {
              "memory": 0,
              "processors": 0
            }
          },
          "reason_summary": "[memory_decider] Requesting scale down as tier and/or node size could be smaller; [processor_decider] requesting scale down as tier and/or node size could be smaller",
          "reason_details": {
            "waiting_analytics_jobs": [],
            "waiting_anomaly_jobs": [],
            "waiting_models": [],
            "configuration": {},
            "perceived_current_capacity": {
              "node": {
                "memory": 8585740288,
                "processors": 4
              },
              "total": {
                "memory": 17171480576,
                "processors": 8
              }
            },
            "reason": "[memory_decider] Requesting scale down as tier and/or node size could be smaller; [processor_decider] requesting scale down as tier and/or node size could be smaller"
          }
        }
      }
    }

for serverless, GET /_internal/serverless/autoscaling returns:

"ml": {
    "metrics": {
      "nodes": {
        "value": 1,
        "quality": "exact"
      },
      "node_memory_in_bytes": {
        "value": 34359738368,
        "quality": "exact"
      },
      "model_memory_in_bytes": {
        "value": 0,
        "quality": "exact"
      },
      "min_nodes": {
        "value": 0,
        "quality": "exact"
      },
      "extra_single_node_model_memory_in_bytes": {
        "value": 2101346304,
        "quality": "exact"
      },
      "extra_single_node_processors": {
        "value": 0,
        "quality": "exact"
      },
      "extra_model_memory_in_bytes": {
        "value": 2101346304,
        "quality": "exact"
      },
      "extra_processors": {
        "value": 0,
        "quality": "exact"
      },
      "remove_node_memory_in_bytes": {
        "value": 0,
        "quality": "exact"
      },
      "per_node_memory_overhead_in_bytes": {
        "value": 31457280,
        "quality": "exact"
      }
    }
  }

The text was updated successfully, but these errors were encountered:

elasticsearchmachine · 2024-10-16T16:51:00Z

Pinging @elastic/ml-core (Team:ML)

jan-elastic · 2024-10-16T18:45:37Z

Regarding stateful, the response says:

          "required_capacity": {
            (...),
            "total": {
              "memory": 0,
              "processors": 0
            }
          },
          "reason_summary": "[memory_decider] Requesting scale down as tier and/or node size could be smaller; [processor_decider] requesting scale down as tier and/or node size could be smaller",

So it looks like this /_autoscaling/capacity endpoint gives a correct result (namely nothing needed; request to scale down).

Who consuming this result? I think the elasticsearch-autoscaler. Looks like something is wrong there.

jan-elastic · 2024-10-16T18:46:35Z

BTW, I don't fully understand:

      "required_capacity": {
        "node": {
          "memory": 0,
          "processors": 4
        },
        "total": {
          "memory": 0,
          "processors": 0
        }
      }

If we need a total of 0 processors and 0 memory, why each node (of the 0) have 4 processors.

jan-elastic · 2024-10-16T18:48:41Z

I don't really know the serverless API, but

      "min_nodes": {
        "value": 0,
        "quality": "exact"
      },

sounds like the autoscaler should scale down to 0 nodes.

wwang500 · 2024-10-17T19:29:14Z

I have verified this in a stateful cluster.
build hash:

"build": {
    "hash": "979710150c133840ef0852edaa4aee02c144fdb2",
    "date": "2024-10-17T13:24:45.133712420Z"
  }

github commit: https://github.com/elastic/elasticsearch/commits/979710150c133840ef0852edaa4aee02c144fdb2, which should include the fix.

Unfortunately the problem is still there after hour wait. the decider now says: "reason": "[memory_decider] Passing currently perceived capacity as there are running analytics and anomaly jobs or deployed models, but their assignment explanations are unexpected or their memory usage estimates are inaccurate."

 "ml": {
      "current_capacity": {
        "node": {
          "storage": 0,
          "memory": 8585740288,
          "processors": 4
        },
        "total": {
          "storage": 0,
          "memory": 17171480576,
          "processors": 8
        }
      },
      "current_nodes": [
        {
          "name": "instance-0000000003"
        },
        {
          "name": "instance-0000000004"
        }
      ],
      "deciders": {
        "ml": {
          "reason_summary": "[memory_decider] Passing currently perceived capacity as there are running analytics and anomaly jobs or deployed models, but their assignment explanations are unexpected or their memory usage estimates are inaccurate.",
          "reason_details": {
            "waiting_analytics_jobs": [],
            "waiting_anomaly_jobs": [],
            "waiting_models": [],
            "configuration": {},
            "perceived_current_capacity": {
              "node": {
                "memory": 8585740288,
                "processors": 4
              },
              "total": {
                "memory": 17171480576,
                "processors": 8
              }
            },
            "reason": "[memory_decider] Passing currently perceived capacity as there are running analytics and anomaly jobs or deployed models, but their assignment explanations are unexpected or their memory usage estimates are inaccurate."
          }
        }
      }
    }

@jan-elastic

jan-elastic · 2024-10-18T06:48:16Z

Thanks, investigating...

Have you also tried serverless? The autoscaling code for that is different.

jan-elastic · 2024-10-18T09:03:48Z

This should fix stateful: #115082
Hopefully, serverless already works.

wwang500 · 2024-10-18T14:17:39Z

Hopefully, serverless already works.

I can confirm serverless works. our serverless QA environment just had rollout yesterday night, the current commit is

  "build": {
    "hash": "d3fceaddefcc32c71321768d05f268bce2374634",
    "date": "2024-10-17T17:17:20.047405199Z"
  },

https://github.com/elastic/elasticsearch/commits/d3fceaddefcc32c71321768d05f268bce2374634

it includes the fix

I tried the below steps:

Create an endpoint:

PUT _inference/sparse_embedding/elser-endpoint
{
  "service": "elser", 
  "service_settings": {"num_threads": 4, "adaptive_allocations": {"enabled": true}}
}

Wait for ml nodes and allocation to 1, using GET _ml/trained_models/elser-endpoint/_stats
Run inference on a semantic_text field


PUT semantic-embeddings
{
  "mappings": {
    "properties": {
      "description": { 
        "type": "semantic_text", 
        "inference_id": "elser-endpoint" 
      }
    }
  }
}

POST /semantic-embeddings/_doc
{
    "description": "Bisected north to south by the ... }
}

Wait for >15 minutes, make sure allocation number turns from 1 to 0
Make sure ml node still there
After another wait, like around 10 minutes, the ml node disappeared.

wwang500 · 2024-10-21T04:21:03Z

Reopen this, after the #115082, not scaling up event is broken (Classic stateful environment).

After

PUT _inference/sparse_embedding/elser-endpoint
{
  "service": "elser", 
  "service_settings": {"num_threads": 4, "adaptive_allocations": {"enabled": true}}
}

model stats:

        "state": "starting",
        "reason": "No ML nodes exist in the cluster",

but GET _autoscaling/capacity shows:

"ml": {
      "required_capacity": {
        "node": {
          "storage": 0,
          "memory": 0,
          "processors": 0
        },
        "total": {
          "storage": 0,
          "memory": 0,
          "processors": 0
        }
      },
      "current_capacity": {
        "node": {
          "storage": 0,
          "memory": 0,
          "processors": 0
        },
        "total": {
          "storage": 0,
          "memory": 0,
          "processors": 0
        }
      },
      "current_nodes": [],
      "deciders": {
        "ml": {
          "required_capacity": {
            "node": {
              "storage": 0,
              "memory": 0,
              "processors": 0
            },
            "total": {
              "storage": 0,
              "memory": 0,
              "processors": 0
            }
          },
          "reason_summary": "Passing currently perceived capacity as no scaling changes are necessary",
          "reason_details": {
            "waiting_analytics_jobs": [],
            "waiting_anomaly_jobs": [],
            "waiting_models": [
              "elser-endpoint"
            ],
            "configuration": {},
            "perceived_current_capacity": {
              "node": {
                "memory": 0,
                "processors": 0
              },
              "total": {
                "memory": 0,
                "processors": 0
              }
            },
            "reason": "Passing currently perceived capacity as no scaling changes are necessary"
          }
        }
      }
    }

it is not right. because no ml nodes scaling up events will be triggered.

jan-elastic · 2024-10-21T09:10:42Z

Next fix: #115189

wwang500 added >bug Team:ML Meta label for the ML team labels Oct 16, 2024

elasticsearchmachine added needs:triage Requires assignment of a team area label and removed Team:ML Meta label for the ML team labels Oct 16, 2024

wwang500 added :ml Machine learning and removed needs:triage Requires assignment of a team area label labels Oct 16, 2024

elasticsearchmachine added the Team:ML Meta label for the ML team label Oct 16, 2024

jan-elastic mentioned this issue Oct 17, 2024

Fix ml autoscaling for zero allocations #114982

Merged

jan-elastic closed this as completed in #114982 Oct 17, 2024

wwang500 reopened this Oct 17, 2024

jan-elastic mentioned this issue Oct 18, 2024

Fix ML autoscaling (classic cloud) for models with zero allocations #115082

Merged

jan-elastic closed this as completed in #115082 Oct 18, 2024

wwang500 reopened this Oct 21, 2024

jan-elastic mentioned this issue Oct 21, 2024

Fix scale up for model allocations #115189

Merged

jan-elastic closed this as completed in #115189 Oct 21, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ML] ML nodes autoscaling not down to 0 in stateful and serverless #114930

[ML] ML nodes autoscaling not down to 0 in stateful and serverless #114930

wwang500 commented Oct 16, 2024 •

edited

Loading

elasticsearchmachine commented Oct 16, 2024

jan-elastic commented Oct 16, 2024

jan-elastic commented Oct 16, 2024

jan-elastic commented Oct 16, 2024

wwang500 commented Oct 17, 2024 •

edited

Loading

jan-elastic commented Oct 18, 2024

jan-elastic commented Oct 18, 2024

wwang500 commented Oct 18, 2024

wwang500 commented Oct 21, 2024

jan-elastic commented Oct 21, 2024

[ML] ML nodes autoscaling not down to 0 in stateful and serverless #114930

[ML] ML nodes autoscaling not down to 0 in stateful and serverless #114930

Comments

wwang500 commented Oct 16, 2024 • edited Loading

Environment

build

Step to reproduce

elasticsearchmachine commented Oct 16, 2024

jan-elastic commented Oct 16, 2024

jan-elastic commented Oct 16, 2024

jan-elastic commented Oct 16, 2024

wwang500 commented Oct 17, 2024 • edited Loading

jan-elastic commented Oct 18, 2024

jan-elastic commented Oct 18, 2024

wwang500 commented Oct 18, 2024

wwang500 commented Oct 21, 2024

jan-elastic commented Oct 21, 2024

wwang500 commented Oct 16, 2024 •

edited

Loading

wwang500 commented Oct 17, 2024 •

edited

Loading