-
Notifications
You must be signed in to change notification settings - Fork 37
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add truss example for Qwen1.5-110B with vllm & streaming support #282
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey Immar! Thank you so much for the contribution. Looks great.
Just a few comments on the config and hardware requirements. Once you address those, we're looking forward to merging this PR and adding this model to our model library.
Thank you again for the contribution!
qwen/qwen-110B-chat/config.yaml
Outdated
@@ -0,0 +1,19 @@ | |||
environment_variables: {CUDA_VISIBLE_DEVICES: "0,1,2,3"} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To use 4 devices, please up the resources below to grant you access to 4 GPUs by changing to A100:4
. If you don't need 4 devices, you can drop this env var.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So it turns out this error goes away with passing any random env not just CUDA_VISIBLE_DEVICES.
I just tried setting {test: "okok"}
in env and model loading was a breeze.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Immar, Can you just drop the whole environment_variables
from the config? I should work better that way. I think something is off with the yaml config of this dictionary and the default should work well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey Bola, I have made the necessary changes. I think the workflow's awaiting approval from a maintainer.
Best
qwen/qwen-110B-chat/config.yaml
Outdated
requirements: | ||
- torch==2.1.2 | ||
- transformers==4.37.0 | ||
- vllm |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible to pin the vllm version as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done!
qwen/qwen-110B-chat/config.yaml
Outdated
cpu: '40' | ||
memory: 100Gi |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please drop cpu and memory and just keep accelerator.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is also done.
command = "ray start --head" | ||
subprocess.check_output(command, shell=True, text=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this is still necessary with newer vlllm versions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for responding. Looks good in general.
qwen/qwen-110B-chat/config.yaml
Outdated
@@ -0,0 +1,19 @@ | |||
environment_variables: {CUDA_VISIBLE_DEVICES: "0,1,2,3"} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Immar, Can you just drop the whole environment_variables
from the config? I should work better that way. I think something is off with the yaml config of this dictionary and the default should work well
This reverts commit 5c90be0.
I have created an example truss for Qwen1.5-110B LLM that was recently released. It has vllm and streaming support added.