feat: NVIDIA NIM on EKS Pattern #565

hustshawn · 2024-06-30T09:28:59Z

What does this PR do?

🛑 Please open an issue first to discuss any significant work and flesh out details/direction - we would hate for your time to be wasted.
Consult the CONTRIBUTING guide for submitting pull-requests.

Motivation

This PR is to address #560

More

Yes, I have tested the PR using my local account setup (Provide any test evidence report under Additional Notes)
Mandatory for new blueprints. Yes, I have added a example to support my blueprint PR
Mandatory for new blueprints. Yes, I have updated the website/docs or website/blog section for this feature
Yes, I ran pre-commit run -a with this PR. Link for installing pre-commit locally

For Moderators

E2E Test successfully complete before merge?

Additional Notes

…I result

askulkarni2

Can we deploy this using existing Triton + vLLM stack? From NIM docs, it seems like NIM can detect TensorRT-LLM or vLLM profiles for optimizations. Also can we include an observability section?

vara-bonthu

+1 to @askulkarni2 . I had a conversation with @hustshawn and explained the necessary changes to reuse the existing blueprint, as mentioned in the issue here.

1/ Use the existing NVIDIA Triton blueprint.
2/ Create a new file called nvidia-nim.tf to add the EFS resources and NIM Helm chart, with a variable called enable_nvidia_nim=true.
3/ Add a new variable called enable_nvidia_triton_server=true as default, so it can be disabled when NIM is enabled.
4/ Add labels and tolerations to the NVIDIA NIM Helm chart values.
5/ Show observability dashboards using Grafana in the website docs.
6/ Create a python client that reads prompts from a file and generates responses for hosted NIM model.
7/ Create a comparison between NVIDIA Triton with vLLM and NVIDIA NIM using the same model, and publish this as part of the website docs.

hustshawn · 2024-07-03T01:45:39Z

Sure, that makes sense to me. I will make changes accordingly and update you once it's ready for your review again.

hustshawn · 2024-07-04T07:44:55Z

7/ Create a comparison between NVIDIA Triton with vLLM and NVIDIA NIM using the same model, and publish this as part of the website docs.

Hi @vara-bonthu , in terms of the comparison, there are some limitations here,

current triton_vllm pattern only provides Llama2 and mistral-7b models, while NIM pattern uses Llama3 models;
NIM has builtin vLLM and TRT-LLM, should I just compare Llama3-7b vs Llama3-70b on NIM, so that it can be an apple-apple comparison?
I created a similar script as triton_client, but found the result has significant difference. So again I cannot see it's a valid comparison with the existing one mentioned in triton doc.

Loading inputs from `prompts.txt`...
Model meta/llama3-8b-instruct - Request 14: 4877.96 ms
Model meta/llama3-8b-instruct - Request 10: 6582.67 ms
Model meta/llama3-8b-instruct - Request 3: 7919.11 ms
Model meta/llama3-8b-instruct - Request 15: 7972.63 ms
Model meta/llama3-8b-instruct - Request 1: 8646.89 ms
Model meta/llama3-8b-instruct - Request 5: 8933.86 ms
Model meta/llama3-8b-instruct - Request 12: 9068.71 ms
Model meta/llama3-8b-instruct - Request 18: 9932.78 ms
Model meta/llama3-8b-instruct - Request 0: 10393.83 ms
Model meta/llama3-8b-instruct - Request 6: 10416.12 ms
Model meta/llama3-8b-instruct - Request 16: 10688.25 ms
Model meta/llama3-8b-instruct - Request 4: 10735.83 ms
Model meta/llama3-8b-instruct - Request 11: 10938.80 ms
Model meta/llama3-8b-instruct - Request 8: 11117.52 ms
Model meta/llama3-8b-instruct - Request 17: 12112.03 ms
Model meta/llama3-8b-instruct - Request 2: 12302.77 ms
Model meta/llama3-8b-instruct - Request 19: 12796.39 ms
Model meta/llama3-8b-instruct - Request 9: 13636.33 ms
Model meta/llama3-8b-instruct - Request 13: 13731.89 ms
Model meta/llama3-8b-instruct - Request 7: 15027.77 ms
Storing results into `results.txt`...
Total time for all requests: 207.83 seconds (207832.13 milliseconds)
PASS: NVIDIA NIM example

…3.md and terraform doc

…nt output for actual execution time

hustshawn · 2024-07-07T16:00:27Z

Hi @vara-bonthu , please review again. Thanks.

vara-bonthu · 2024-07-07T18:11:06Z

Thanks for updating the PR @hustshawn !I will test your PR and update accordingly.

@askulkarni2 @ratnopamc please review

vara-bonthu

couple of minor comments for the PR

vara-bonthu · 2024-07-09T00:42:06Z

ai-ml/nvidia-triton-server/addons.tf

@@ -140,6 +145,20 @@ module "eks_blueprints_addons" {
    ],
  }

+  helm_releases = {


ai-ml/nvidia-triton-server/helm-values/kube-prometheus.yaml

ai-ml/nvidia-triton-server/nvidia-nim.tf

ai-ml/nvidia-triton-server/variables.tf

website/docs/gen-ai/inference/nvidia-nim-llama3.md

…script and docs; updated the nim-client as well

hustshawn · 2024-07-09T03:04:35Z

@vara-bonthu updated based on our discussion and latest comment. Please review again. Thanks

vara-bonthu

LGTM! Thanks for the updates 🙌🏼

vara-bonthu · 2024-07-09T04:25:21Z

@askulkarni2 please review

ai-ml/nvidia-triton-server/addons.tf

ai-ml/nvidia-triton-server/helm-values/kube-prometheus.yaml

website/docs/gen-ai/inference/nvidia-nim-llama3.md

ratnopamc

@hustshawn thanks for the PR. Could you please review the comments and address?

…addon, and updated nim-llm doc for NVIDIA potential license warning

askulkarni2

LGTM!

hustshawn added 3 commits June 30, 2024 15:34

feat: added terraform code for nim-llm pattern

fed91ef

feat: added website docs for nvidia-nims pattern

d9ce21d

fix: fixed minor doc formating issue based on pre-commit suggestions

9d5c3d0

hustshawn changed the title ~~NVIDIA NIM on EKS Pattern~~ feat: NVIDIA NIM on EKS Pattern Jun 30, 2024

hustshawn added 2 commits June 30, 2024 17:35

fix: fixed '<REGION>' from the website doc HTML formatting based on C…

2e5a271

…I result

fix: fixed the url format in website doc

8af9846

askulkarni2 reviewed Jul 2, 2024

View reviewed changes

vara-bonthu reviewed Jul 2, 2024

View reviewed changes

vara-bonthu added the gen-ai pattern Distributed Training and Inference Patterns for Various Generative AI Large Language Models (LLMs) label Jul 3, 2024

hustshawn added 5 commits July 4, 2024 16:17

refactor: reuse triton-server pattern for NIM

bc32386

fix: fixed the typo in website/docs/gen-ai/inference/nvidia-nim-llama…

7844cdf

…3.md and terraform doc

chore: removed unecessary commented code and updated the testing clie…

658107b

…nt output for actual execution time

feat: added monitoring for NIM

ac1d01a

fix: fixed the merge conflicts

73ced9f

vara-bonthu reviewed Jul 9, 2024

View reviewed changes

website/docs/gen-ai/inference/nvidia-nim-llama3.md Outdated Show resolved Hide resolved

hustshawn added 3 commits July 9, 2024 10:13

chore: change enable_nvidia_nim default to false and updated install …

5da02c3

…script and docs; updated the nim-client as well

fix: fixed terraform doc according to pre-commit suggestion

b6fd53f

refactoring: created nim-llm dashboard json locally

e503da6

vara-bonthu approved these changes Jul 9, 2024

View reviewed changes

ratnopamc reviewed Jul 9, 2024

View reviewed changes

ai-ml/nvidia-triton-server/addons.tf Outdated Show resolved Hide resolved

ratnopamc reviewed Jul 9, 2024

View reviewed changes

ai-ml/nvidia-triton-server/helm-values/kube-prometheus.yaml Show resolved Hide resolved

ratnopamc reviewed Jul 9, 2024

View reviewed changes

website/docs/gen-ai/inference/nvidia-nim-llama3.md Show resolved Hide resolved

ratnopamc requested changes Jul 9, 2024

View reviewed changes

hustshawn added 3 commits July 9, 2024 14:12

chore: dyanmically inject prometheus namespace to prometheus-adapter …

c1510d2

…addon, and updated nim-llm doc for NVIDIA potential license warning

chore: updated the nim-client script to accept cli passed model name

e3b5cbe

fix: updated the cleanup for resources smooth destroy

9d6bb92

askulkarni2 approved these changes Jul 10, 2024

View reviewed changes

feat: added genai-perf tool with instructions

49a1d56

vara-bonthu merged commit 78d4e0f into awslabs:main Jul 11, 2024
36 of 37 checks passed

ovaleanu pushed a commit to ovaleanu/data-on-eks that referenced this pull request Aug 10, 2024

feat: NVIDIA NIM on EKS Pattern (awslabs#565)

aa91b4f

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: NVIDIA NIM on EKS Pattern #565

feat: NVIDIA NIM on EKS Pattern #565

hustshawn commented Jun 30, 2024

askulkarni2 left a comment •

edited

Loading

vara-bonthu left a comment

hustshawn commented Jul 3, 2024

hustshawn commented Jul 4, 2024

hustshawn commented Jul 7, 2024

vara-bonthu commented Jul 7, 2024 •

edited

Loading

vara-bonthu left a comment

vara-bonthu Jul 9, 2024

hustshawn commented Jul 9, 2024

vara-bonthu left a comment

vara-bonthu commented Jul 9, 2024

ratnopamc left a comment

askulkarni2 left a comment

feat: NVIDIA NIM on EKS Pattern #565

feat: NVIDIA NIM on EKS Pattern #565

Conversation

hustshawn commented Jun 30, 2024

What does this PR do?

Motivation

More

For Moderators

Additional Notes

askulkarni2 left a comment • edited Loading

Choose a reason for hiding this comment

vara-bonthu left a comment

Choose a reason for hiding this comment

hustshawn commented Jul 3, 2024

hustshawn commented Jul 4, 2024

hustshawn commented Jul 7, 2024

vara-bonthu commented Jul 7, 2024 • edited Loading

vara-bonthu left a comment

Choose a reason for hiding this comment

vara-bonthu Jul 9, 2024

Choose a reason for hiding this comment

hustshawn commented Jul 9, 2024

vara-bonthu left a comment

Choose a reason for hiding this comment

vara-bonthu commented Jul 9, 2024

ratnopamc left a comment

Choose a reason for hiding this comment

askulkarni2 left a comment

Choose a reason for hiding this comment

askulkarni2 left a comment •

edited

Loading

vara-bonthu commented Jul 7, 2024 •

edited

Loading