-
-
Notifications
You must be signed in to change notification settings - Fork 5.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug]: Error while running inference with LLava 1.6 in v0.5.1 #6224
Comments
^ @DarkLight1337 this might be related to the refactoring? |
Did you get any warnings in the log? If so, can you show them as well? |
What are the Btw, I'm unable to load the model inside 1x L40 (which should have around the same memory as 4x A10G), so it might be a OOM problem that's not being reported correctly. |
@DarkLight1337 L40S is 48GB. where 4xA10G (24GB) is 96GB. |
@DarkLight1337 Yes request_params are empty. Let me find if there are warnings. Let me check the memory for OOM as well. But on Saturday, I tried with 0.5.0.post1, I was able to test llava-1.6 though. Let me try again and post here. |
@DarkLight1337 I'm quite sure the issue is due to the wrong |
The tokenizer loaded by vllm seems to have 3 extra tokens [<|reserved000|>, <|reserved001|>, <|reserved002|> at positions 64000-64002 causing the issue. |
Loading the tokenizer using transformers lead to the same issue. This indicates a mismatch between the
In summary, this is not a vllm issue. The simplest working hack is to change |
How did you determine this? I'm not getting this issue on my machine. >>> from transformers import AutoTokenizer
>>> tokenizer = AutoTokenizer.from_pretrained("llava-hf/llava-v1.6-34b-hf")
>>> tokenizer.encode("<image>")
[64000]
>>> from vllm.transformers_utils.tokenizer import get_tokenizer
>>> tokenizer = get_tokenizer("llava-hf/llava-v1.6-34b-hf")
>>> tokenizer.encode("<image>")
[64000] |
@DarkLight1337 maybe this is an issue for tokenizer version? |
The tokenizer config for that model was last modified 4 months ago (according to HF) so I doubt that's the cause of the problem. |
@DarkLight1337 I just tested with lastest transformers (4.42.3) and got the issue
output
Can confirm transformers (4.40.1) does not have this issue
|
Interesting that HuggingFace library itself caused this problem... now I have to figure out why. Thanks for looking into this! |
The underlying issue has been fixed by HuggingFace. Please upgrade to |
Your current environment
🐛 Describe the bug
My LLM Configuration: I have enabled enforce_eager=True and enable_prefix_caching=False
Code we used something like this
My PromptInputs is
My input image is
https://h2o-release.s3.amazonaws.com/h2ogpt/bigben.jpg
I get the same error described in this issue #6176
Error we got
The text was updated successfully, but these errors were encountered: