Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: For issue 429 "Unable to deploy llama2 on Eks/Ray Serve/inf2" #430

Merged
merged 3 commits into from
Feb 26, 2024

Conversation

harishvs
Copy link
Contributor

@harishvs harishvs commented Feb 12, 2024

What does this PR do?

This is a fix for issue 429 - "Unable to deploy llama2 on Eks/Ray Serve/inf2"

🛑 Please open an issue first to discuss any significant work and flesh out details/direction - we would hate for your time to be wasted.
Consult the CONTRIBUTING guide for submitting pull-requests.

  1. Changed the core instance type to m5.2xlarge since the rayhead pod was not getting scheduled due to lack of memory
  2. Changed the inf2.24xlarge instance type to ondemand since those seem to be more appropriate for a chat application workload
  3. fixed the label on inf2.24xlarge and inf2.48xlarge node so that the ray serve pod could find it

Motivation

I could not complete the tutorial as it is outlined here -> https://awslabs.github.io/data-on-eks/docs/gen-ai/inference/Llama2

More

  • [ *] Yes, I have tested the PR using my local account setup (Provide any test evidence report under Additional Notes)
  • Mandatory for new blueprints. Yes, I have added a example to support my blueprint PR
  • Mandatory for new blueprints. Yes, I have updated the website/docs or website/blog section for this feature
  • [* ] Yes, I ran pre-commit run -a with this PR. Link for installing pre-commit locally

For Moderators

  • E2E Test successfully complete before merge?

Additional Notes

I don't know if i broke any other work flow !!

@harishvs harishvs changed the title fix: for issue 429 "Unable to deploy llama2 on Eks/Ray Serve/inf2" fix: For issue 429 "Unable to deploy llama2 on Eks/Ray Serve/inf2" Feb 12, 2024
@vara-bonthu
Copy link
Collaborator

@harishvs, thank you for your efforts in testing and creating the PR. It's great to see that some of the fixes you've identified align with those we've implemented in the Stable Diffusion model.

Could you please rebase your code with the latest updates from the main branch and then resubmit your PR, particularly focusing on the gradio.app format changes? I am especially keen on addressing the formatting issue present in the Llama2 chat model.

@harishvs
Copy link
Contributor Author

@vara-bonthu I will create a separate PR for the gradio format changes. It is still work in progress. I will rebase this PR for now with a narrow focus of fixing issue 429

@harishvs harishvs marked this pull request as draft February 14, 2024 17:15
@harishvs harishvs marked this pull request as ready for review February 14, 2024 23:24
@harishvs
Copy link
Contributor Author

@vara-bonthu I rebased on latest main. Please review and merge

Copy link
Collaborator

@vara-bonthu vara-bonthu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@harishvs You can run Llama2 inference example with Karpenter as of now. If you want to run the model with managed nodegroups then you have to change the Ray deployment yaml to match with managed node groups labels.

Comment on lines 125 to 126
instanceType = "mixed-x86"
provisionerType = "Karpenter"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These labels are for Karpenter to spinup the nodes, but not the Managed Node groups with CA. I would say remove these or update as below.

        provisionerType = "ClusterAutosclaer"

If you want to deploy Llama2 model on Managed Nodegroups with these instances then you have to update the Ray deployment yaml with unique label that is used only by this node group

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok , addressed

instanceType = "inf2-24xl"
provisionerType = "cluster-autoscaler"
instanceType = "inferentia-inf2"
provisionerType = "Karpenter"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above comment

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok , addressed

instanceType = "inf2-48xl"
provisionerType = "cluster-autoscaler"
instanceType = "inferentia-inf2"
provisionerType = "Karpenter"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as above comment

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok , addressed

@vara-bonthu vara-bonthu merged commit 30c5387 into awslabs:main Feb 26, 2024
52 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants