-
Notifications
You must be signed in to change notification settings - Fork 236
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: NVIDIA NIM on EKS Pattern #565
Merged
Merged
Changes from 10 commits
Commits
Show all changes
17 commits
Select commit
Hold shift + click to select a range
fed91ef
feat: added terraform code for nim-llm pattern
hustshawn d9ce21d
feat: added website docs for nvidia-nims pattern
hustshawn 9d5c3d0
fix: fixed minor doc formating issue based on pre-commit suggestions
hustshawn 2e5a271
fix: fixed '<REGION>' from the website doc HTML formatting based on C…
hustshawn 8af9846
fix: fixed the url format in website doc
hustshawn bc32386
refactor: reuse triton-server pattern for NIM
hustshawn 7844cdf
fix: fixed the typo in website/docs/gen-ai/inference/nvidia-nim-llama…
hustshawn 658107b
chore: removed unecessary commented code and updated the testing clie…
hustshawn ac1d01a
feat: added monitoring for NIM
hustshawn 73ced9f
fix: fixed the merge conflicts
hustshawn 5da02c3
chore: change enable_nvidia_nim default to false and updated install …
hustshawn b6fd53f
fix: fixed terraform doc according to pre-commit suggestion
hustshawn e503da6
refactoring: created nim-llm dashboard json locally
hustshawn c1510d2
chore: dyanmically inject prometheus namespace to prometheus-adapter …
hustshawn e3b5cbe
chore: updated the nim-client script to accept cli passed model name
hustshawn 9d6bb92
fix: updated the cleanup for resources smooth destroy
hustshawn 49a1d56
feat: added genai-perf tool with instructions
hustshawn File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,2 @@ | ||
nim-llm/ | ||
planfile |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,53 @@ | ||
# ref: https://github.com/NVIDIA/nim-deploy/blob/main/helm/nim-llm/values.yaml | ||
image: | ||
repository: nvcr.io/nim/meta/llama3-8b-instruct | ||
tag: latest | ||
model: | ||
ngcAPIKey: ${ngc_api_key} | ||
nimCache: /model-store | ||
resources: | ||
limits: | ||
nvidia.com/gpu: 1 | ||
requests: | ||
nvidia.com/gpu: 1 | ||
statefulSet: | ||
enabled: true | ||
persistence: | ||
enabled: true | ||
existingClaim: ${pvc_name} | ||
nodeSelector: | ||
NodeGroupType: g5-gpu-karpenter | ||
type: karpenter | ||
tolerations: | ||
- key: "nvidia.com/gpu" | ||
operator: "Exists" | ||
effect: "NoSchedule" | ||
metrics: | ||
enabled: true | ||
serviceMonitor: | ||
enabled: true | ||
additionalLabels: | ||
release: prometheus | ||
app: prometheus | ||
autoscaling: | ||
enabled: true | ||
minReplicas: 1 | ||
maxReplicas: 5 | ||
scaleDownStabilizationSecs: 300 | ||
metrics: | ||
- type: Pods | ||
pods: | ||
metric: | ||
name: num_requests_running | ||
target: | ||
type: Value | ||
averageValue: 5 | ||
ingress: | ||
enabled: true | ||
className: nginx | ||
annotations: {} | ||
hosts: | ||
- paths: | ||
- path: / | ||
pathType: ImplementationSpecific | ||
serviceType: openai |
14 changes: 14 additions & 0 deletions
14
ai-ml/nvidia-triton-server/helm-values/prometheus-adapter.yaml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,14 @@ | ||
# ref: https://github.com/prometheus-community/helm-charts/blob/main/charts/prometheus-adapter/values.yaml | ||
prometheus: | ||
url: http://kube-prometheus-stack-prometheus.kube-prometheus-stack | ||
port: 9090 | ||
rules: | ||
default: false | ||
custom: | ||
- seriesQuery: '{__name__=~"num_requests_running"}' | ||
resources: | ||
template: <<.Resource>> | ||
name: | ||
matches: "num_requests_running" | ||
as: "" | ||
metricsQuery: sum(<<.Series>>{<<.LabelMatchers>>}) by (<<.GroupBy>>) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,126 @@ | ||
#--------------------------------------------------------------- | ||
# EFS | ||
#--------------------------------------------------------------- | ||
module "efs" { | ||
count = var.enable_nvidia_nim ? 1 : 0 | ||
source = "terraform-aws-modules/efs/aws" | ||
version = "~> 1.6" | ||
|
||
creation_token = local.name | ||
name = local.name | ||
|
||
# Mount targets / security group | ||
mount_targets = { | ||
for k, v in zipmap(local.azs, slice(module.vpc.private_subnets, length(module.vpc.private_subnets) - 2, length(module.vpc.private_subnets))) : k => { subnet_id = v } | ||
} | ||
security_group_description = "${local.name} EFS security group" | ||
security_group_vpc_id = module.vpc.vpc_id | ||
security_group_rules = { | ||
vpc = { | ||
# relying on the defaults provided for EFS/NFS (2049/TCP + ingress) | ||
description = "NFS ingress from VPC private subnets" | ||
cidr_blocks = module.vpc.private_subnets_cidr_blocks | ||
} | ||
} | ||
|
||
tags = local.tags | ||
} | ||
|
||
resource "kubernetes_storage_class_v1" "efs" { | ||
count = var.enable_nvidia_nim ? 1 : 0 | ||
metadata { | ||
name = "efs" | ||
} | ||
|
||
storage_provisioner = "efs.csi.aws.com" | ||
parameters = { | ||
provisioningMode = "efs-ap" # Dynamic provisioning | ||
fileSystemId = module.efs[count.index].id | ||
directoryPerms = "777" | ||
} | ||
|
||
mount_options = [ | ||
"iam" | ||
] | ||
|
||
depends_on = [ | ||
module.eks_blueprints_addons.aws_efs_csi_driver | ||
] | ||
} | ||
|
||
resource "kubernetes_namespace" "nim" { | ||
count = var.enable_nvidia_nim ? 1 : 0 | ||
metadata { | ||
name = "nim" | ||
} | ||
|
||
depends_on = [module.eks] | ||
} | ||
|
||
resource "kubernetes_persistent_volume_claim_v1" "efs_pvc" { | ||
count = var.enable_nvidia_nim ? 1 : 0 | ||
metadata { | ||
name = kubernetes_namespace.nim[count.index].metadata[0].name | ||
namespace = "nim" | ||
} | ||
spec { | ||
access_modes = ["ReadWriteMany"] | ||
storage_class_name = kubernetes_storage_class_v1.efs[count.index].metadata[0].name | ||
resources { | ||
requests = { | ||
storage = "100Gi" | ||
} | ||
} | ||
} | ||
} | ||
|
||
#--------------------------------------------------------------- | ||
# NIM LLM Helm Chart | ||
#--------------------------------------------------------------- | ||
|
||
hustshawn marked this conversation as resolved.
Show resolved
Hide resolved
|
||
resource "null_resource" "download_nim_deploy" { | ||
count = var.enable_nvidia_nim ? 1 : 0 | ||
# This trigger ensures the script runs only when the file doesn't exist | ||
triggers = { | ||
script_executed = fileexists("${path.module}/nim-llm/Chart.yaml") ? "false" : "true" | ||
} | ||
|
||
provisioner "local-exec" { | ||
command = <<-EOT | ||
if [ ! -d "${path.module}/nim-llm" ]; then | ||
echo "Downloading nim-deploy repo ..." | ||
TEMP_DIR=$(mktemp -d) | ||
git clone https://github.com/NVIDIA/nim-deploy.git "$TEMP_DIR/nim-deploy" | ||
cp -r "$TEMP_DIR/nim-deploy/helm/nim-llm" ${path.module}/nim-llm | ||
rm -rf "$TEMP_DIR" | ||
echo "Download completed." | ||
else | ||
echo "nim-llm directory already exists. Skipping download." | ||
fi | ||
EOT | ||
} | ||
} | ||
|
||
|
||
resource "helm_release" "nim_llm" { | ||
count = var.enable_nvidia_nim ? 1 : 0 | ||
name = "nim-llm" | ||
chart = "${path.module}/nim-llm" | ||
create_namespace = true | ||
namespace = kubernetes_namespace.nim[count.index].metadata[0].name | ||
timeout = 360 | ||
wait = false | ||
values = [ | ||
templatefile( | ||
"${path.module}/helm-values/nim-llm.yaml", | ||
{ | ||
ngc_api_key = var.ngc_api_key | ||
hustshawn marked this conversation as resolved.
Show resolved
Hide resolved
|
||
pvc_name = kubernetes_persistent_volume_claim_v1.efs_pvc[count.index].metadata[0].name | ||
} | ||
) | ||
] | ||
|
||
depends_on = [ | ||
null_resource.download_nim_deploy | ||
] | ||
} |
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice!