Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ML] ML nodes autoscaling not down to 0 in stateful and serverless #114930

Closed
wwang500 opened this issue Oct 16, 2024 · 10 comments · Fixed by #114982, #115082 or #115189
Closed

[ML] ML nodes autoscaling not down to 0 in stateful and serverless #114930

wwang500 opened this issue Oct 16, 2024 · 10 comments · Fixed by #114982, #115082 or #115189
Labels
>bug :ml Machine learning Team:ML Meta label for the ML team

Comments

@wwang500
Copy link

wwang500 commented Oct 16, 2024

Environment

  • Stateful cloud (gcp-us-west2)
  • Serverless QA

build

  "build": {
    "hash": "8ccfb227c2131c859033f409ee37a87023fada62",
    "date": "2024-10-16T05:50:43.944345200Z"
  }

Step to reproduce

  1. Deploy a serverless or stateful cluster, for stateful cluster, make sure ML autoscaling is ON
  2. Create an inference endpoint with adaptive allocation ON
PUT _inference/sparse_embedding/elser-endpoint
{
  "service": "elser", 
  "service_settings": {"num_threads": 4, "adaptive_allocations": {"enabled": true}}
}
  1. wait a few minutes for scaling up event, ml node available and allocated to that inference endpoint, you can confirm by running: GET _ml/trained_models/elser-endpoint/_stats
  2. Run inference, you can follow the steps in this tutorial, https://www.elastic.co/guide/en/elasticsearch/reference/current/semantic-search-semantic-text.html
  3. after that, wait at least 15 minutes for allocation to scale down to 0
{
  "count": 1,
  "trained_model_stats": [
    {
      "model_id": ".elser_model_2_linux-x86_64",
      "model_size_stats": {
        "model_size_bytes": 274756282,
        "required_native_memory_bytes": 2101346304
      },
      "pipeline_count": 1,
      "ingest": {
        "total": {
          "count": 0,
          "time_in_millis": 0,
          "current": 0,
          "failed": 0
        },
        "pipelines": {
          ".kibana-elastic-ai-assistant-ingest-pipeline-knowledge-base": {
            "count": 0,
            "time_in_millis": 0,
            "current": 0,
            "failed": 0,
            "ingested_as_first_pipeline_in_bytes": 0,
            "produced_as_first_pipeline_in_bytes": 0,
            "processors": [
              {
                "inference": {
                  "type": "inference",
                  "stats": {
                    "count": 0,
                    "time_in_millis": 0,
                    "current": 0,
                    "failed": 0
                  }
                }
              }
            ]
          }
        }
      },
      "inference_stats": {
        "failure_count": 0,
        "inference_count": 0,
        "cache_miss_count": 0,
        "missing_all_fields_count": 0,
        "timestamp": 1729097542245
      },
      "deployment_stats": {
        "deployment_id": "elser-endpoint",
        "model_id": ".elser_model_2_linux-x86_64",
        "threads_per_allocation": 4,
        "number_of_allocations": 0,
        "adaptive_allocations": {
          "enabled": true
        },
        "queue_capacity": 1024,
        "state": "started",
        "allocation_status": {
          "allocation_count": 0,
          "target_allocation_count": 0,
          "state": "fully_allocated"
        },
        "cache_size": "262mb",
        "priority": "normal",
        "start_time": 1729044099355,
        "peak_throughput_per_minute": 0,
        "nodes": []
      }
    }
  ]
}
  1. after allocation scales down to 0, ml nodes autoscaling (down to 0) should happen in ~1 hour
    Observed:

After hours wait, ml nodes autoscaling (down to 0) didnt happen

  • for stateful, GET /_autoscaling/capacity/ returns:
"ml": {
      "required_capacity": {
        "node": {
          "memory": 0,
          "processors": 4
        },
        "total": {
          "memory": 0,
          "processors": 0
        }
      },
      "current_capacity": {
        "node": {
          "storage": 0,
          "memory": 8585740288,
          "processors": 4
        },
        "total": {
          "storage": 0,
          "memory": 17171480576,
          "processors": 8
        }
      },
      "current_nodes": [
        {
          "name": "instance-0000000003"
        },
        {
          "name": "instance-0000000004"
        }
      ],
      "deciders": {
        "ml": {
          "required_capacity": {
            "node": {
              "memory": 0,
              "processors": 4
            },
            "total": {
              "memory": 0,
              "processors": 0
            }
          },
          "reason_summary": "[memory_decider] Requesting scale down as tier and/or node size could be smaller; [processor_decider] requesting scale down as tier and/or node size could be smaller",
          "reason_details": {
            "waiting_analytics_jobs": [],
            "waiting_anomaly_jobs": [],
            "waiting_models": [],
            "configuration": {},
            "perceived_current_capacity": {
              "node": {
                "memory": 8585740288,
                "processors": 4
              },
              "total": {
                "memory": 17171480576,
                "processors": 8
              }
            },
            "reason": "[memory_decider] Requesting scale down as tier and/or node size could be smaller; [processor_decider] requesting scale down as tier and/or node size could be smaller"
          }
        }
      }
    }
  • for serverless, GET /_internal/serverless/autoscaling returns:
"ml": {
    "metrics": {
      "nodes": {
        "value": 1,
        "quality": "exact"
      },
      "node_memory_in_bytes": {
        "value": 34359738368,
        "quality": "exact"
      },
      "model_memory_in_bytes": {
        "value": 0,
        "quality": "exact"
      },
      "min_nodes": {
        "value": 0,
        "quality": "exact"
      },
      "extra_single_node_model_memory_in_bytes": {
        "value": 2101346304,
        "quality": "exact"
      },
      "extra_single_node_processors": {
        "value": 0,
        "quality": "exact"
      },
      "extra_model_memory_in_bytes": {
        "value": 2101346304,
        "quality": "exact"
      },
      "extra_processors": {
        "value": 0,
        "quality": "exact"
      },
      "remove_node_memory_in_bytes": {
        "value": 0,
        "quality": "exact"
      },
      "per_node_memory_overhead_in_bytes": {
        "value": 31457280,
        "quality": "exact"
      }
    }
  }
@wwang500 wwang500 added >bug Team:ML Meta label for the ML team labels Oct 16, 2024
@elasticsearchmachine elasticsearchmachine added needs:triage Requires assignment of a team area label and removed Team:ML Meta label for the ML team labels Oct 16, 2024
@wwang500 wwang500 added :ml Machine learning and removed needs:triage Requires assignment of a team area label labels Oct 16, 2024
@elasticsearchmachine elasticsearchmachine added the Team:ML Meta label for the ML team label Oct 16, 2024
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/ml-core (Team:ML)

@jan-elastic
Copy link
Contributor

Regarding stateful, the response says:

          "required_capacity": {
            (...),
            "total": {
              "memory": 0,
              "processors": 0
            }
          },
          "reason_summary": "[memory_decider] Requesting scale down as tier and/or node size could be smaller; [processor_decider] requesting scale down as tier and/or node size could be smaller",

So it looks like this /_autoscaling/capacity endpoint gives a correct result (namely nothing needed; request to scale down).

Who consuming this result? I think the elasticsearch-autoscaler. Looks like something is wrong there.

@jan-elastic
Copy link
Contributor

BTW, I don't fully understand:

      "required_capacity": {
        "node": {
          "memory": 0,
          "processors": 4
        },
        "total": {
          "memory": 0,
          "processors": 0
        }
      }

If we need a total of 0 processors and 0 memory, why each node (of the 0) have 4 processors.

@jan-elastic
Copy link
Contributor

I don't really know the serverless API, but

      "min_nodes": {
        "value": 0,
        "quality": "exact"
      },

sounds like the autoscaler should scale down to 0 nodes.

@wwang500
Copy link
Author

wwang500 commented Oct 17, 2024

I have verified this in a stateful cluster.
build hash:

"build": {
    "hash": "979710150c133840ef0852edaa4aee02c144fdb2",
    "date": "2024-10-17T13:24:45.133712420Z"
  }

github commit: https://github.com/elastic/elasticsearch/commits/979710150c133840ef0852edaa4aee02c144fdb2, which should include the fix.

Unfortunately the problem is still there after hour wait. the decider now says: "reason": "[memory_decider] Passing currently perceived capacity as there are running analytics and anomaly jobs or deployed models, but their assignment explanations are unexpected or their memory usage estimates are inaccurate."

 "ml": {
      "current_capacity": {
        "node": {
          "storage": 0,
          "memory": 8585740288,
          "processors": 4
        },
        "total": {
          "storage": 0,
          "memory": 17171480576,
          "processors": 8
        }
      },
      "current_nodes": [
        {
          "name": "instance-0000000003"
        },
        {
          "name": "instance-0000000004"
        }
      ],
      "deciders": {
        "ml": {
          "reason_summary": "[memory_decider] Passing currently perceived capacity as there are running analytics and anomaly jobs or deployed models, but their assignment explanations are unexpected or their memory usage estimates are inaccurate.",
          "reason_details": {
            "waiting_analytics_jobs": [],
            "waiting_anomaly_jobs": [],
            "waiting_models": [],
            "configuration": {},
            "perceived_current_capacity": {
              "node": {
                "memory": 8585740288,
                "processors": 4
              },
              "total": {
                "memory": 17171480576,
                "processors": 8
              }
            },
            "reason": "[memory_decider] Passing currently perceived capacity as there are running analytics and anomaly jobs or deployed models, but their assignment explanations are unexpected or their memory usage estimates are inaccurate."
          }
        }
      }
    }

@jan-elastic

@jan-elastic
Copy link
Contributor

Thanks, investigating...

Have you also tried serverless? The autoscaling code for that is different.

@jan-elastic
Copy link
Contributor

This should fix stateful: #115082
Hopefully, serverless already works.

@wwang500
Copy link
Author

Hopefully, serverless already works.

I can confirm serverless works. our serverless QA environment just had rollout yesterday night, the current commit is

  "build": {
    "hash": "d3fceaddefcc32c71321768d05f268bce2374634",
    "date": "2024-10-17T17:17:20.047405199Z"
  },

https://github.com/elastic/elasticsearch/commits/d3fceaddefcc32c71321768d05f268bce2374634

it includes the fix

I tried the below steps:

  • Create an endpoint:
PUT _inference/sparse_embedding/elser-endpoint
{
  "service": "elser", 
  "service_settings": {"num_threads": 4, "adaptive_allocations": {"enabled": true}}
}
  • Wait for ml nodes and allocation to 1, using GET _ml/trained_models/elser-endpoint/_stats

  • Run inference on a semantic_text field


PUT semantic-embeddings
{
  "mappings": {
    "properties": {
      "description": { 
        "type": "semantic_text", 
        "inference_id": "elser-endpoint" 
      }
    }
  }
}

POST /semantic-embeddings/_doc
{
    "description": "Bisected north to south by the ... }
}

  • Wait for >15 minutes, make sure allocation number turns from 1 to 0

  • Make sure ml node still there
    Image

  • After another wait, like around 10 minutes, the ml node disappeared.
    Image

@wwang500
Copy link
Author

Reopen this, after the #115082, not scaling up event is broken (Classic stateful environment).

After

PUT _inference/sparse_embedding/elser-endpoint
{
  "service": "elser", 
  "service_settings": {"num_threads": 4, "adaptive_allocations": {"enabled": true}}
}

model stats:

        "state": "starting",
        "reason": "No ML nodes exist in the cluster",

but GET _autoscaling/capacity shows:

"ml": {
      "required_capacity": {
        "node": {
          "storage": 0,
          "memory": 0,
          "processors": 0
        },
        "total": {
          "storage": 0,
          "memory": 0,
          "processors": 0
        }
      },
      "current_capacity": {
        "node": {
          "storage": 0,
          "memory": 0,
          "processors": 0
        },
        "total": {
          "storage": 0,
          "memory": 0,
          "processors": 0
        }
      },
      "current_nodes": [],
      "deciders": {
        "ml": {
          "required_capacity": {
            "node": {
              "storage": 0,
              "memory": 0,
              "processors": 0
            },
            "total": {
              "storage": 0,
              "memory": 0,
              "processors": 0
            }
          },
          "reason_summary": "Passing currently perceived capacity as no scaling changes are necessary",
          "reason_details": {
            "waiting_analytics_jobs": [],
            "waiting_anomaly_jobs": [],
            "waiting_models": [
              "elser-endpoint"
            ],
            "configuration": {},
            "perceived_current_capacity": {
              "node": {
                "memory": 0,
                "processors": 0
              },
              "total": {
                "memory": 0,
                "processors": 0
              }
            },
            "reason": "Passing currently perceived capacity as no scaling changes are necessary"
          }
        }
      }
    }

it is not right. because no ml nodes scaling up events will be triggered.

@wwang500 wwang500 reopened this Oct 21, 2024
@jan-elastic
Copy link
Contributor

Next fix: #115189

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>bug :ml Machine learning Team:ML Meta label for the ML team
Projects
None yet
3 participants