Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bugfix] Fix with verifying model max len #4885

Conversation

dimaioksha
Copy link

@dimaioksha dimaioksha commented May 17, 2024

This PR fixes bug, when multiple keys of max_model_len are specified inside huggingface model config, the minimum one is picked up.

This PR fixes it and picks up the maximum one that is supposed to be by the name of the functions that are used to determine max_model_len.

This PR applies changes to vllm._get_and_verify_max_len function

@dimaioksha
Copy link
Author

dimaioksha commented May 17, 2024

Hello, I was struggling with OpenAI compatible API server, when I tried to use command-r with 128k context with gptq version: https://huggingface.co/Cyleux/command-r-gptq

I passed prompts with size of apprx. 15k tokens and received error that it reaches model's maximum context length. After a little research I found out that maximum context is 8192 for this model. But commands'r context length is supposed to be 128k

Starting from this point I began my research

telegram-cloud-photo-size-2-5348206892205333279-x

I found out that there are two keys that are iterated thorough in _get_and_verify_max_len. But when we find the first key max_positional_embedding = 8192, the second one model_max_length = 131072 is not picked.

The same behaviour is shown when we start vllm openai-like server:

telegram-cloud-document-2-5348206891749100817

as max_seq_len is the same as max_model_len, as shown here: https://github.com/vllm-project/vllm/blob/main/vllm/engine/llm_engine.py#L123

That is why here is a PR that picks up max_length based on all keys provided with model config and not the minimum one.
If my intuition is wrong feel free to point on

Copy link
Collaborator

@rkooo567 rkooo567 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you update the PR description? (also can you add a test case?)

@rkooo567 rkooo567 self-assigned this May 17, 2024
@dimaioksha
Copy link
Author

dimaioksha commented May 17, 2024

Here is what is logged after applying the change:

tg_image_3498943594

So it picked up the largest value of context in specified keys.
Unfortunately, I do not have enough resources to run c4ai with 128k (beeing precise 131072) context, but here is another example with specifiyng --max-model-len 16384 that fits into my memory:

tg_image_1666609912

And generation is good since now I can go over 8192:

telegram-cloud-photo-size-2-5348206892205333756-y

@esmeetu
Copy link
Collaborator

esmeetu commented May 19, 2024

It has been specially processed for this problem in line 1169. Could you check why it doesn't work?

@dimaioksha
Copy link
Author

dimaioksha commented May 19, 2024

It has been specially processed for this problem in line 1169. Could you check why it doesn't work?

Hello! It works fine when we specify --max-model-len like I did in the previous comment. But when we do not specify --max-model-len interpreter does not go into elif block and stays in if one because max_model_len = None.

And in if it picks the min value of the context size, but supposed to pick the maximum one. It is not a problem when only one key of presented is specified:

possible_keys = [ # OPT "max_position_embeddings", # GPT-2 "n_positions", # MPT "max_seq_len", # ChatGLM2 "seq_length", # Command-R "model_max_length", # Others "max_sequence_length", "max_seq_length", "seq_len", ]

But in command-r there are max_position_embeddings and model_max_length which have different values, and the minimum max_position_embeddings = 8192 is picked instead of model_max_length = 131072 by default. But the function _get_and_verify_max_len is supposed to pick the maximum value of context. Also I am not sure that max_position_embeddings is supposed to be model context length by the authors of commandr.

You can check model config here: https://huggingface.co/CohereForAI/c4ai-command-r-v01/blob/main/config.json#L17
I think this PR can solve this issue not only for command-r but for any model which has multiple keys from possible_keys list

@esmeetu
Copy link
Collaborator

esmeetu commented May 20, 2024

@dimaioksha I think this is related to the model config. Current way is safe and conservative. This PR might break the model with rope scaling. For example, max_seq_length is 16384 and max_position_embeddings is 4096, if we choose 16384, and then rope length will be 16384*4.
So you can follow current design and using --max-model-len params.

@dimaioksha
Copy link
Author

dimaioksha commented May 20, 2024

@dimaioksha I think this is related to the model config. Current way is safe and conservative. This PR might break the model with rope scaling. For example, max_seq_length is 16384 and max_position_embeddings is 4096, if we choose 16384, and then rope length will be 16384*4. So you can follow current design and using --max-model-len params.

Should we then rename derived_max_model_len to derived_min_model_len variable because we find minimum value around all the keys?
Just to be accurate with namings, because I expected that function will take maximum and that is why misunderstood current design.

Probably there is a problem with me, not with namings, but I want to hear your opinion

@ywang96
Copy link
Member

ywang96 commented May 25, 2024

@dimaioksha Just to give you more context on the design decision - here's the original PR to fix the context length problem of command-R:

@dimaioksha dimaioksha closed this Jun 2, 2024
@dimaioksha dimaioksha force-pushed the dimaioksha/change-max-len-parsing branch from bdcb6f8 to f790ad3 Compare June 2, 2024 15:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants