Modify Accelerator support for kserve #2261

lucferbux · 2023-12-01T12:34:38Z

Description

How Has This Been Tested?

Prerequisites

Enable accelerator support in your cluster, you can add this accelerator into your cluster:

apiVersion: dashboard.opendatahub.io/v1
kind: AcceleratorProfile
metadata:
  name: migrated-gpu
  namespace: redhat-ods-applications
spec:
  displayName: Nvidia GPU
  enabled: true
  identifier: nvidia.com/gpu
  tolerations:
    - effect: NoSchedule
      key: nvidia.com/gpu
      operator: Exists

KServe Resource Creation

Deploy a new KServe model selecting the accelerator and a number of nodes
Check the ServingRuntime spec. It shouldn't contain tolerations, neither the gpu requests/limits nvidia.com/gpu
Check the InferenceService spec. It should contain tolerations under spec.predictor.tolerations and the gpu request/limits under spec.predictor.model.resources

KServe Resource Editing

Edit the model deployed and remove the accelerator
Check the ServingRuntime spec. It shouldn't be the same, just removing the opendatahub.io/accelerator-name label
Check the InferenceService spec. It shouldn't contain tolerations under spec.predictor.tolerations neither the gpu request/limits under spec.predictor.model.resources

Modelmesh Resrouce Creation

Create a new modelmesh server and select accelerator
Check the ServingRuntime spec. It should contain tolerations and gpu resources
Deploy a model
Check the InferenceService spec. It shouldn't contain any gpu resource

Test Impact

Covered all paths with unit testing

Request review criteria:

Self checklist (all need to be checked):

The developer has manually tested the changes and verified that the changes work
Commits have been squashed into descriptive, self-contained units of work (e.g. 'WIP' and 'Implements feedback' style messages have been removed)
Testing instructions have been added in the PR body (for PRs involving changes that are not immediately obvious).
The developer has added tests or explained why testing cannot be added (unit tests & storybook for related changes)

If you have UI changes:

Included any necessary screenshots or gifs if it was a UI change.
Included tags to the UX team if it was a UI/UX change (find relevant UX in the SMEs section).

After the PR is posted & before it merges:

The developer has tested their solution on a cluster by using the image produced by the PR to main

I haven't tested this changes with a proper gpu cluster, we should do that asap

frontend/src/__mocks__/mockAcceleratork8sResource.ts

andrewballantyne

Looking good, one question...

frontend/src/k8sTypes.ts

andrewballantyne

Sounds good... type question was resolved. LGTM.

openshift-ci · 2023-12-01T17:31:42Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: andrewballantyne, Xaenalt

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [andrewballantyne]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

lucferbux · 2023-12-04T12:02:32Z

I updated the types of InferenceService and ServingRuntime since there was a mismatch with the CRDs and handle the exceptions accordingly

lucferbux requested review from Jooho, andrewballantyne, Xaenalt and christianvogt December 1, 2023 12:34

openshift-ci bot requested review from alexcreasy and manaswinidas December 1, 2023 12:34

Modify Accelerator support for kserve

00c677c

lucferbux force-pushed the issue-2244 branch from aeef743 to 00c677c Compare December 1, 2023 15:47

Xaenalt approved these changes Dec 1, 2023

View reviewed changes

openshift-ci bot assigned Xaenalt Dec 1, 2023

openshift-ci bot added the lgtm label Dec 1, 2023

andrewballantyne reviewed Dec 1, 2023

View reviewed changes

frontend/src/__mocks__/mockAcceleratork8sResource.ts Show resolved Hide resolved

andrewballantyne reviewed Dec 1, 2023

View reviewed changes

frontend/src/k8sTypes.ts Show resolved Hide resolved

andrewballantyne approved these changes Dec 1, 2023

View reviewed changes

openshift-ci bot assigned andrewballantyne Dec 1, 2023

openshift-ci bot added the approved label Dec 1, 2023

openshift-merge-bot bot merged commit f5f08a0 into opendatahub-io:main Dec 1, 2023
6 checks passed

This was referenced Dec 4, 2023

Modify Accelerator support for kserve #2266

Merged

Modify Accelerator support for kserve red-hat-data-services/odh-dashboard#306

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Modify Accelerator support for kserve #2261

Modify Accelerator support for kserve #2261

lucferbux commented Dec 1, 2023

andrewballantyne left a comment

andrewballantyne left a comment

openshift-ci bot commented Dec 1, 2023

lucferbux commented Dec 4, 2023

Modify Accelerator support for kserve #2261

Modify Accelerator support for kserve #2261

Conversation

lucferbux commented Dec 1, 2023

Description

How Has This Been Tested?

Prerequisites

KServe Resource Creation

KServe Resource Editing

Modelmesh Resrouce Creation

Test Impact

Request review criteria:

andrewballantyne left a comment

Choose a reason for hiding this comment

andrewballantyne left a comment

Choose a reason for hiding this comment

openshift-ci bot commented Dec 1, 2023

lucferbux commented Dec 4, 2023