-
-
Notifications
You must be signed in to change notification settings - Fork 4.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug]: No output on WSL (Debian, Windows 11) #3646
Comments
What is the GPU you use? |
Gtx 1660 super. I can generate stuff with the chosen model with Transformers, so I don't think it's a GPU problem. |
If you are using pre-built wheels, your GPU is not supported. See #2635 . You can try to build vllm yourself, from source, and it might help. Please check https://docs.vllm.ai/en/latest/getting_started/installation.html#build-from-source . |
Okay, I'll try to build it from source and see if that works. Thanks! |
In addition, your GPU seems to lack enough memory to load a 7B model. |
Weird, since I can use 7B models just fine with Transformers with my GPU. I've generated text with Mistral 7B before |
According to https://www.techpowerup.com/gpu-specs/geforce-gtx-1660-super.c3458 , it just has 6 GB memory. |
My PC has 10 times that amount of memory, at 63.9 GB |
So you are using CPU with Transformers , while vllm is designed to work with GPU. |
Odd, since I use 'cuda' when generating with Transformers. Wouldn't that make Transformers use the GPU? Sorry, I'm a bit confused. |
Then you need to dive deep into |
Im sorry, but what do you mean by that? |
They have weight offload strategy https://discuss.huggingface.co/t/big-model-inference-cpu-disk-offloading-for-transformers-using-from-pretrained/75165 . |
I see, so I have been using my CPU without realizing. Sorry for the inconvenience! I'll have to come back to vLLM when I have a better GPU, then. |
Your current environment
🐛 Describe the bug
No output when running the offline_inference.py example code with mistralai/Mistral-7B-Instruct-v0.2. This does not happen with
transformers
.I also have to use dtype="half" because I can't use bfloat16.
The exact code follows:
And I don't see any outputs. When running straight from VSCode, I get
and no generated text however running from the commandline I get
and no generated text.
Sorry if this is unclear, was in a bit of a rush to post this. I can provide more information if needed.
The text was updated successfully, but these errors were encountered: