Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error running multimodel endpoints in sagemaker #1911

Open
Najib-Haq opened this issue May 15, 2024 · 1 comment
Open

Error running multimodel endpoints in sagemaker #1911

Najib-Haq opened this issue May 15, 2024 · 1 comment
Labels
bug Something isn't working

Comments

@Najib-Haq
Copy link

Description

Multi model endpoint deployment in sagemaker through DJL serving is supposed to be supported. Here is the related AWS page and associated tutorial. I tried to run the code in the demo as it is. The endpoint gets created, but getting errors when trying to invoke the endpoint.

Expected Behavior

Successful invocation of endpoint based on target model name in multimodel endpoint scenario.

Error Message

---------------------------------------------------------------------------
ModelError                                Traceback (most recent call last)
Cell In[14], line 1
----> 1 print(predictor.predict( {"prompt": "Large model inference is"}, target_model="opt-350m.tar.gz"))
      2 print(predictor.predict({"prompt": "Large model inference is"}, target_model="bloomz-560m.tar.gz"))
      3 print(predictor.predict({"prompt": "Large model inference is"}, target_model="gpt-neo-125m.tar.gz"))

File /opt/conda/lib/python3.10/site-packages/sagemaker/base_predictor.py:212, in Predictor.predict(self, data, initial_args, target_model, target_variant, inference_id, custom_attributes, component_name)
    209 if inference_component_name:
    210     request_args["InferenceComponentName"] = inference_component_name
--> 212 response = self.sagemaker_session.sagemaker_runtime_client.invoke_endpoint(**request_args)
    213 return self._handle_response(response)

File /opt/conda/lib/python3.10/site-packages/botocore/client.py:565, in ClientCreator._create_api_method.<locals>._api_call(self, *args, **kwargs)
    561     raise TypeError(
    562         f"{py_operation_name}() only accepts keyword arguments."
    563     )
    564 # The "self" in this scope is referring to the BaseClient.
--> 565 return self._make_api_call(operation_name, kwargs)

File /opt/conda/lib/python3.10/site-packages/botocore/client.py:1021, in BaseClient._make_api_call(self, operation_name, api_params)
   1017     error_code = error_info.get("QueryErrorCode") or error_info.get(
   1018         "Code"
   1019     )
   1020     error_class = self.exceptions.from_code(error_code)
-> 1021     raise error_class(parsed_response, operation_name)
   1022 else:
   1023     return parsed_response

ModelError: An error occurred (ModelError) when calling the InvokeEndpoint operation: Received client error (404) from primary with message "{
  "code": 404,
  "type": "ModelNotFoundException",
  "message": "Failed to detect engine of the model: /opt/ml/models/5ce62ebae83b18f9141573e424631f2e/model/temp"
}
". See https://eu-central-1.console.aws.amazon.com/cloudwatch/home?region=eu-central-1#logEventViewer:group=/aws/sagemaker/Endpoints/lmi-model-2024-05-15-10-10-58-392 in account 662744937784 for more information.

How to Reproduce?

Execute the given demo code at here.

Steps to reproduce

(Paste the commands you ran that produced the error.)

  1. Run the code at https://github.com/deepjavalibrary/djl-demo/blob/master/aws/sagemaker/Multi-Model-Inference-Demo.ipynb.

What have you tried to solve it?

  1. Tried to instead use models stored in s3 buckets. Gave same error. These models can be successfully deployed in their own endpoints through DJL-serving but not in the multimodel scenario.
@Najib-Haq Najib-Haq added the bug Something isn't working label May 15, 2024
@sindhuvahinis
Copy link
Contributor

@Najib-Haq Which version of DLC image did you use?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants