-
Notifications
You must be signed in to change notification settings - Fork 236
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: NVIDIA NIM on EKS Pattern #565
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we deploy this using existing Triton + vLLM stack? From NIM docs, it seems like NIM can detect TensorRT-LLM or vLLM profiles for optimizations. Also can we include an observability section?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 to @askulkarni2 . I had a conversation with @hustshawn and explained the necessary changes to reuse the existing blueprint, as mentioned in the issue here.
1/ Use the existing NVIDIA Triton blueprint.
2/ Create a new file called nvidia-nim.tf
to add the EFS resources and NIM Helm chart, with a variable called enable_nvidia_nim=true
.
3/ Add a new variable called enable_nvidia_triton_server=true
as default, so it can be disabled when NIM is enabled.
4/ Add labels and tolerations to the NVIDIA NIM Helm chart values.
5/ Show observability dashboards using Grafana in the website docs.
6/ Create a python client that reads prompts from a file and generates responses for hosted NIM model.
7/ Create a comparison between NVIDIA Triton with vLLM and NVIDIA NIM using the same model, and publish this as part of the website docs.
Sure, that makes sense to me. I will make changes accordingly and update you once it's ready for your review again. |
Hi @vara-bonthu , in terms of the comparison, there are some limitations here,
|
…3.md and terraform doc
…nt output for actual execution time
Hi @vara-bonthu , please review again. Thanks. |
Thanks for updating the PR @hustshawn !I will test your PR and update accordingly. @askulkarni2 @ratnopamc please review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
couple of minor comments for the PR
@@ -140,6 +145,20 @@ module "eks_blueprints_addons" { | |||
], | |||
} | |||
|
|||
helm_releases = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice!
…script and docs; updated the nim-client as well
@vara-bonthu updated based on our discussion and latest comment. Please review again. Thanks |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Thanks for the updates 🙌🏼
@askulkarni2 please review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@hustshawn thanks for the PR. Could you please review the comments and address?
…addon, and updated nim-llm doc for NVIDIA potential license warning
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
What does this PR do?
🛑 Please open an issue first to discuss any significant work and flesh out details/direction - we would hate for your time to be wasted.
Consult the CONTRIBUTING guide for submitting pull-requests.
Motivation
This PR is to address #560
More
website/docs
orwebsite/blog
section for this featurepre-commit run -a
with this PR. Link for installing pre-commit locallyFor Moderators
Additional Notes