fix: For issue 429 "Unable to deploy llama2 on Eks/Ray Serve/inf2" #430

harishvs · 2024-02-12T06:40:09Z

What does this PR do?

This is a fix for issue 429 - "Unable to deploy llama2 on Eks/Ray Serve/inf2"

🛑 Please open an issue first to discuss any significant work and flesh out details/direction - we would hate for your time to be wasted.
Consult the CONTRIBUTING guide for submitting pull-requests.

Changed the core instance type to m5.2xlarge since the rayhead pod was not getting scheduled due to lack of memory
Changed the inf2.24xlarge instance type to ondemand since those seem to be more appropriate for a chat application workload
fixed the label on inf2.24xlarge and inf2.48xlarge node so that the ray serve pod could find it

Motivation

I could not complete the tutorial as it is outlined here -> https://awslabs.github.io/data-on-eks/docs/gen-ai/inference/Llama2

More

[ *] Yes, I have tested the PR using my local account setup (Provide any test evidence report under Additional Notes)
Mandatory for new blueprints. Yes, I have added a example to support my blueprint PR
Mandatory for new blueprints. Yes, I have updated the website/docs or website/blog section for this feature
[* ] Yes, I ran pre-commit run -a with this PR. Link for installing pre-commit locally

For Moderators

E2E Test successfully complete before merge?

Additional Notes

I don't know if i broke any other work flow !!

vara-bonthu · 2024-02-14T06:22:01Z

@harishvs, thank you for your efforts in testing and creating the PR. It's great to see that some of the fixes you've identified align with those we've implemented in the Stable Diffusion model.

Could you please rebase your code with the latest updates from the main branch and then resubmit your PR, particularly focusing on the gradio.app format changes? I am especially keen on addressing the formatting issue present in the Llama2 chat model.

harishvs · 2024-02-14T17:10:28Z

@vara-bonthu I will create a separate PR for the gradio format changes. It is still work in progress. I will rebase this PR for now with a narrow focus of fixing issue 429

harishvs · 2024-02-14T23:26:17Z

@vara-bonthu I rebased on latest main. Please review and merge

vara-bonthu

@harishvs You can run Llama2 inference example with Karpenter as of now. If you want to run the model with managed nodegroups then you have to change the Ray deployment yaml to match with managed node groups labels.

vara-bonthu · 2024-02-16T22:13:00Z

ai-ml/trainium-inferentia/eks.tf

+        instanceType    = "mixed-x86"
+        provisionerType = "Karpenter"


These labels are for Karpenter to spinup the nodes, but not the Managed Node groups with CA. I would say remove these or update as below.

provisionerType = "ClusterAutosclaer"

If you want to deploy Llama2 model on Managed Nodegroups with these instances then you have to update the Ray deployment yaml with unique label that is used only by this node group

Ok , addressed

vara-bonthu · 2024-02-16T22:13:24Z

ai-ml/trainium-inferentia/eks.tf

-        instanceType    = "inf2-24xl"
-        provisionerType = "cluster-autoscaler"
+        instanceType    = "inferentia-inf2"
+        provisionerType = "Karpenter"


Same as above comment

Ok , addressed

vara-bonthu · 2024-02-16T22:13:40Z

ai-ml/trainium-inferentia/eks.tf

-        instanceType    = "inf2-48xl"
-        provisionerType = "cluster-autoscaler"
+        instanceType    = "inferentia-inf2"
+        provisionerType = "Karpenter"


same as above comment

Ok , addressed

harishvs temporarily deployed to DoEKS Test February 12, 2024 06:40 — with GitHub Actions Inactive

harishvs changed the title ~~fix: for issue 429 "Unable to deploy llama2 on Eks/Ray Serve/inf2"~~ fix: For issue 429 "Unable to deploy llama2 on Eks/Ray Serve/inf2" Feb 12, 2024

harishvs temporarily deployed to DoEKS Test February 14, 2024 04:53 — with GitHub Actions Inactive

harishvs marked this pull request as draft February 14, 2024 17:15

fixes to 429

87f07c9

harishvs force-pushed the issue_429 branch from d147e30 to 87f07c9 Compare February 14, 2024 23:23

harishvs temporarily deployed to DoEKS Test February 14, 2024 23:23 — with GitHub Actions Inactive

harishvs marked this pull request as ready for review February 14, 2024 23:24

vara-bonthu reviewed Feb 16, 2024

View reviewed changes

Addessed PR comments

5c903d4

harishvs temporarily deployed to DoEKS Test February 17, 2024 22:39 — with GitHub Actions Inactive

Terraform fmt changes

a60f293

harishvs temporarily deployed to DoEKS Test February 17, 2024 22:45 — with GitHub Actions Inactive

vara-bonthu approved these changes Feb 22, 2024

View reviewed changes

vara-bonthu merged commit 30c5387 into awslabs:main Feb 26, 2024
52 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: For issue 429 "Unable to deploy llama2 on Eks/Ray Serve/inf2" #430

fix: For issue 429 "Unable to deploy llama2 on Eks/Ray Serve/inf2" #430

harishvs commented Feb 12, 2024 •

edited

Loading

vara-bonthu commented Feb 14, 2024

harishvs commented Feb 14, 2024

harishvs commented Feb 14, 2024

vara-bonthu left a comment

vara-bonthu Feb 16, 2024

harishvs Feb 17, 2024

vara-bonthu Feb 16, 2024

harishvs Feb 17, 2024

vara-bonthu Feb 16, 2024

harishvs Feb 17, 2024

fix: For issue 429 "Unable to deploy llama2 on Eks/Ray Serve/inf2" #430

fix: For issue 429 "Unable to deploy llama2 on Eks/Ray Serve/inf2" #430

Conversation

harishvs commented Feb 12, 2024 • edited Loading

What does this PR do?

Motivation

More

For Moderators

Additional Notes

vara-bonthu commented Feb 14, 2024

harishvs commented Feb 14, 2024

harishvs commented Feb 14, 2024

vara-bonthu left a comment

Choose a reason for hiding this comment

vara-bonthu Feb 16, 2024

Choose a reason for hiding this comment

harishvs Feb 17, 2024

Choose a reason for hiding this comment

vara-bonthu Feb 16, 2024

Choose a reason for hiding this comment

harishvs Feb 17, 2024

Choose a reason for hiding this comment

vara-bonthu Feb 16, 2024

Choose a reason for hiding this comment

harishvs Feb 17, 2024

Choose a reason for hiding this comment

harishvs commented Feb 12, 2024 •

edited

Loading