[Bug]: Accelerator enablement in kserve is not working #2244

lucferbux · 2023-11-29T16:34:44Z

Is there an existing issue for this?

I have searched the existing issues

Deploy type

OpenDataHub core version (eg. v1.6.0)

Version

2.5.0

Current Behavior

We are currently assigning gpu resources to all the containers in a ServingRuntime spec for KServe, this has been ok for Modelmesh but it's creating an issue where we add more resources than the cluster might have.

Expected Behavior

The outcome will be the following:

For modelmesh

Keep the same flow and get the creation of InferenceServices and ServingRuntimes as it is right now

For kserve

Remove the assignation of tolerations and GPU resources in the containers of the serving runtime, for that add a conditional and just add them for modelmesh.
Add that logic for inferenceservices:
- Add tolerations in spec.predictor.tolerations such as this example
- Add the GPU resrouces in the spec.predictor-model-resources section

Steps To Reproduce

Creaate a new project with accelerator
Deploy a kserve model
Select the maximum number of accelerator nodes

Workaround (if any)

No response

What browsers are you seeing the problem on?

No response

Anything else

No response

The text was updated successfully, but these errors were encountered:

lucferbux added kind/bug Something isn't working untriaged Indicates the newly create issue has not been triaged yet priority/blocker Critical issue that needs to be fixed asap; blocks up coming releases priority/normal An issue with the product; fix when possible labels Nov 29, 2023

lucferbux self-assigned this Nov 29, 2023

github-project-automation bot added this to ODH Dashboard Planning Nov 29, 2023

github-project-automation bot moved this to Untriaged in ODH Dashboard Planning Nov 29, 2023

andrewballantyne moved this from Untriaged to Dev In progress in ODH Dashboard Planning Nov 29, 2023

andrewballantyne added the rhods-2.5 label Nov 29, 2023

This was referenced Dec 1, 2023

Modify Accelerator support for kserve #2261

Merged

Model Serving Enhancements #1312

Closed

openshift-merge-bot bot closed this as completed in #2261 Dec 1, 2023

github-project-automation bot moved this from Dev In progress to Done in ODH Dashboard Planning Dec 1, 2023

dpanshug removed the untriaged Indicates the newly create issue has not been triaged yet label Dec 4, 2023

andrewballantyne added feature/model-serving Model Serving Feature and removed priority/normal An issue with the product; fix when possible labels Dec 4, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: Accelerator enablement in kserve is not working #2244

[Bug]: Accelerator enablement in kserve is not working #2244

lucferbux commented Nov 29, 2023

[Bug]: Accelerator enablement in kserve is not working #2244

[Bug]: Accelerator enablement in kserve is not working #2244

Comments

lucferbux commented Nov 29, 2023

Is there an existing issue for this?

Deploy type

Version

Current Behavior

Expected Behavior

For modelmesh

For kserve

Steps To Reproduce

Workaround (if any)

What browsers are you seeing the problem on?

Anything else