-
-
Notifications
You must be signed in to change notification settings - Fork 5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bugfix] Fix with verifying model max len #4885
[Bugfix] Fix with verifying model max len #4885
Conversation
Hello, I was struggling with OpenAI compatible API server, when I tried to use command-r with 128k context with gptq version: https://huggingface.co/Cyleux/command-r-gptq I passed prompts with size of apprx. 15k tokens and received error that it reaches model's maximum context length. After a little research I found out that maximum context is 8192 for this model. But commands'r context length is supposed to be 128k Starting from this point I began my research I found out that there are two keys that are iterated thorough in The same behaviour is shown when we start vllm openai-like server: as That is why here is a PR that picks up max_length based on all keys provided with model config and not the minimum one. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you update the PR description? (also can you add a test case?)
Here is what is logged after applying the change: So it picked up the largest value of context in specified keys. And generation is good since now I can go over 8192: |
It has been specially processed for this problem in line 1169. Could you check why it doesn't work? |
Hello! It works fine when we specify And in
But in command-r there are You can check model config here: https://huggingface.co/CohereForAI/c4ai-command-r-v01/blob/main/config.json#L17 |
@dimaioksha I think this is related to the model config. Current way is safe and conservative. This PR might break the model with |
Should we then rename Probably there is a problem with me, not with namings, but I want to hear your opinion |
@dimaioksha Just to give you more context on the design decision - here's the original PR to fix the context length problem of command-R: |
bdcb6f8
to
f790ad3
Compare
This PR fixes bug, when multiple keys of
max_model_len
are specified inside huggingface model config, the minimum one is picked up.This PR fixes it and picks up the maximum one that is supposed to be by the name of the functions that are used to determine
max_model_len
.This PR applies changes to
vllm._get_and_verify_max_len
function