Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

To deploy multiple Model Serving Runtime in NERC RHOAI #856

Open
Milstein opened this issue Dec 6, 2024 · 4 comments
Open

To deploy multiple Model Serving Runtime in NERC RHOAI #856

Milstein opened this issue Dec 6, 2024 · 4 comments
Labels
openshift This issue pertains to NERC OpenShift

Comments

@Milstein
Copy link

Milstein commented Dec 6, 2024

Can we check and install all these or most of them to our OCP Test and Prod setup:

https://github.com/rh-aiservices-bu/llm-on-openshift/tree/main?tab=readme-ov-file#inference-servers

How to configure is explained here.

@Milstein Milstein added the openshift This issue pertains to NERC OpenShift label Dec 6, 2024
@Milstein
Copy link
Author

Milstein commented Dec 6, 2024

Our current Model Server only offers "OpenVINO" Serving runtime:

Image

@Milstein Milstein changed the title To deploy multiple Model Serving Runtimes on NERC RHOAI To deploy multiple Model Serving Runtime in NERC RHOAI Dec 6, 2024
@IsaiahStapleton
Copy link

@Milstein We do not have only OpenVINO serving runtimes. For some reason though, only that shows up as an option when using a Data Science Project created by kube:admin. These are all of the serving runtimes that come by default with RHOAI, except for LlamaCPP which I manually added::

Image

@IsaiahStapleton
Copy link

Also, our test cluster currently has an issue with KServe. When trying to deploy a model the kserve webhook fails to call.

ai-performance-profiling 6s Warning InternalError inferenceservice/granite fails to reconcile predictor: Internal error occurred: failed calling webhook "webhook.serving.knative.dev": failed to call webhook: Post "https://webhook.knative-serving.svc:443/defaulting?timeout=10s": context deadline

However when inspecting the knative-serving namespace there does not appear to be an issue with the webhook pod so I'm not sure the issue. I was granted access to the Albany cluster and I set up RHOAI from scratch for model serving and have not run into any issues with model serving there, and currently have granite model deployed there.

@Milstein
Copy link
Author

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
openshift This issue pertains to NERC OpenShift
Projects
None yet
Development

No branches or pull requests

2 participants