Add truss example for Qwen1.5-110B with vllm & streaming support #282

ImmarKarim · 2024-05-01T17:46:39Z

I have created an example truss for Qwen1.5-110B LLM that was recently released. It has vllm and streaming support added.

bolasim

Hey Immar! Thank you so much for the contribution. Looks great.

Just a few comments on the config and hardware requirements. Once you address those, we're looking forward to merging this PR and adding this model to our model library.

Thank you again for the contribution!

bolasim · 2024-05-03T05:09:53Z

qwen/qwen-110B-chat/config.yaml

@@ -0,0 +1,19 @@
+environment_variables: {CUDA_VISIBLE_DEVICES: "0,1,2,3"}


To use 4 devices, please up the resources below to grant you access to 4 GPUs by changing to A100:4. If you don't need 4 devices, you can drop this env var.

Sure, let me change the resources section in the config file.

However, I am getting this error when dropping this env:

So it turns out this error goes away with passing any random env not just CUDA_VISIBLE_DEVICES.

I just tried setting {test: "okok"} in env and model loading was a breeze.

Hi Immar, Can you just drop the whole environment_variables from the config? I should work better that way. I think something is off with the yaml config of this dictionary and the default should work well

Hey Bola, I have made the necessary changes. I think the workflow's awaiting approval from a maintainer.

Best

bolasim · 2024-05-03T05:10:12Z

qwen/qwen-110B-chat/config.yaml

+requirements:
+- torch==2.1.2
+- transformers==4.37.0
+- vllm


Is it possible to pin the vllm version as well?

bolasim · 2024-05-03T05:10:35Z

qwen/qwen-110B-chat/config.yaml

+  cpu: '40'
+  memory: 100Gi


Please drop cpu and memory and just keep accelerator.

This is also done.

bolasim · 2024-05-03T05:10:57Z

qwen/qwen-110B-chat/model/model.py

+        command = "ray start --head"
+        subprocess.check_output(command, shell=True, text=True)


I don't think this is still necessary with newer vlllm versions

I was getting this error without this command. I read it somewhere online to use this if our model loading isn't being done in the main thread.

during pod startup:

During inference:

bolasim

Thanks for responding. Looks good in general.

bolasim · 2024-05-06T15:42:35Z

qwen/qwen-110B-chat/config.yaml

@@ -0,0 +1,19 @@
+environment_variables: {CUDA_VISIBLE_DEVICES: "0,1,2,3"}


Hi Immar, Can you just drop the whole environment_variables from the config? I should work better that way. I think something is off with the yaml config of this dictionary and the default should work well

This reverts commit 5c90be0.

Add truss example for Qwen1.5-110B with vllm & streaming support

1bcdbbb

bolasim reviewed May 3, 2024

View reviewed changes

Changed the resources section - PR feedback

ecc0070

bolasim approved these changes May 6, 2024

View reviewed changes

ImmarKarim added 3 commits May 11, 2024 00:29

Removed env from config

5c90be0

Revert "Removed env from config"

ede8d24

This reverts commit 5c90be0.

Fixed all issues

ee5a83a

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add truss example for Qwen1.5-110B with vllm & streaming support #282

Add truss example for Qwen1.5-110B with vllm & streaming support #282

ImmarKarim commented May 1, 2024

bolasim left a comment

bolasim May 3, 2024

ImmarKarim May 3, 2024

ImmarKarim May 3, 2024

bolasim May 6, 2024

ImmarKarim May 10, 2024

bolasim May 3, 2024

ImmarKarim May 3, 2024

bolasim May 3, 2024

ImmarKarim May 3, 2024

bolasim May 3, 2024

ImmarKarim May 3, 2024

bolasim left a comment •

edited

Loading

bolasim May 6, 2024

		@@ -0,0 +1,19 @@
		environment_variables: {CUDA_VISIBLE_DEVICES: "0,1,2,3"}

		command = "ray start --head"
		subprocess.check_output(command, shell=True, text=True)

Add truss example for Qwen1.5-110B with vllm & streaming support #282

Are you sure you want to change the base?

Add truss example for Qwen1.5-110B with vllm & streaming support #282

Conversation

ImmarKarim commented May 1, 2024

bolasim left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bolasim left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bolasim left a comment •

edited

Loading