[Bugfix] Fix with verifying model max len #4885

dimaioksha · 2024-05-17T13:51:20Z

This PR fixes bug, when multiple keys of max_model_len are specified inside huggingface model config, the minimum one is picked up.

This PR fixes it and picks up the maximum one that is supposed to be by the name of the functions that are used to determine max_model_len.

This PR applies changes to vllm._get_and_verify_max_len function

dimaioksha · 2024-05-17T14:12:36Z

Hello, I was struggling with OpenAI compatible API server, when I tried to use command-r with 128k context with gptq version: https://huggingface.co/Cyleux/command-r-gptq

I passed prompts with size of apprx. 15k tokens and received error that it reaches model's maximum context length. After a little research I found out that maximum context is 8192 for this model. But commands'r context length is supposed to be 128k

Starting from this point I began my research

I found out that there are two keys that are iterated thorough in _get_and_verify_max_len. But when we find the first key max_positional_embedding = 8192, the second one model_max_length = 131072 is not picked.

The same behaviour is shown when we start vllm openai-like server:

as max_seq_len is the same as max_model_len, as shown here: https://github.com/vllm-project/vllm/blob/main/vllm/engine/llm_engine.py#L123

That is why here is a PR that picks up max_length based on all keys provided with model config and not the minimum one.
If my intuition is wrong feel free to point on

rkooo567

Can you update the PR description? (also can you add a test case?)

dimaioksha · 2024-05-17T18:00:39Z

Here is what is logged after applying the change:

So it picked up the largest value of context in specified keys.
Unfortunately, I do not have enough resources to run c4ai with 128k (beeing precise 131072) context, but here is another example with specifiyng --max-model-len 16384 that fits into my memory:

And generation is good since now I can go over 8192:

esmeetu · 2024-05-19T00:43:33Z

It has been specially processed for this problem in line 1169. Could you check why it doesn't work?

dimaioksha · 2024-05-19T07:32:56Z

It has been specially processed for this problem in line 1169. Could you check why it doesn't work?

Hello! It works fine when we specify --max-model-len like I did in the previous comment. But when we do not specify --max-model-len interpreter does not go into elif block and stays in if one because max_model_len = None.

And in if it picks the min value of the context size, but supposed to pick the maximum one. It is not a problem when only one key of presented is specified:

possible_keys = [ # OPT "max_position_embeddings", # GPT-2 "n_positions", # MPT "max_seq_len", # ChatGLM2 "seq_length", # Command-R "model_max_length", # Others "max_sequence_length", "max_seq_length", "seq_len", ]

But in command-r there are max_position_embeddings and model_max_length which have different values, and the minimum max_position_embeddings = 8192 is picked instead of model_max_length = 131072 by default. But the function _get_and_verify_max_len is supposed to pick the maximum value of context. Also I am not sure that max_position_embeddings is supposed to be model context length by the authors of commandr.

You can check model config here: https://huggingface.co/CohereForAI/c4ai-command-r-v01/blob/main/config.json#L17
I think this PR can solve this issue not only for command-r but for any model which has multiple keys from possible_keys list

esmeetu · 2024-05-20T01:20:17Z

@dimaioksha I think this is related to the model config. Current way is safe and conservative. This PR might break the model with rope scaling. For example, max_seq_length is 16384 and max_position_embeddings is 4096， if we choose 16384, and then rope length will be 16384*4.
So you can follow current design and using --max-model-len params.

dimaioksha · 2024-05-20T11:40:29Z

@dimaioksha I think this is related to the model config. Current way is safe and conservative. This PR might break the model with rope scaling. For example, max_seq_length is 16384 and max_position_embeddings is 4096， if we choose 16384, and then rope length will be 16384*4. So you can follow current design and using --max-model-len params.

Should we then rename derived_max_model_len to derived_min_model_len variable because we find minimum value around all the keys?
Just to be accurate with namings, because I expected that function will take maximum and that is why misunderstood current design.

Probably there is a problem with me, not with namings, but I want to hear your opinion

ywang96 · 2024-05-25T20:16:24Z

@dimaioksha Just to give you more context on the design decision - here's the original PR to fix the context length problem of command-R:

[Bugfix] Command-R Max Model Length #3727

rkooo567 reviewed May 17, 2024

View reviewed changes

rkooo567 self-assigned this May 17, 2024

dimaioksha closed this Jun 2, 2024

dimaioksha force-pushed the dimaioksha/change-max-len-parsing branch from bdcb6f8 to f790ad3 Compare June 2, 2024 15:26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bugfix] Fix with verifying model max len #4885

[Bugfix] Fix with verifying model max len #4885

dimaioksha commented May 17, 2024 •

edited

Loading

dimaioksha commented May 17, 2024 •

edited

Loading

rkooo567 left a comment •

edited

Loading

dimaioksha commented May 17, 2024 •

edited

Loading

esmeetu commented May 19, 2024

dimaioksha commented May 19, 2024 •

edited

Loading

esmeetu commented May 20, 2024 •

edited

Loading

dimaioksha commented May 20, 2024 •

edited

Loading

ywang96 commented May 25, 2024

[Bugfix] Fix with verifying model max len #4885

[Bugfix] Fix with verifying model max len #4885

Conversation

dimaioksha commented May 17, 2024 • edited Loading

dimaioksha commented May 17, 2024 • edited Loading

rkooo567 left a comment • edited Loading

Choose a reason for hiding this comment

dimaioksha commented May 17, 2024 • edited Loading

esmeetu commented May 19, 2024

dimaioksha commented May 19, 2024 • edited Loading

esmeetu commented May 20, 2024 • edited Loading

dimaioksha commented May 20, 2024 • edited Loading

ywang96 commented May 25, 2024

dimaioksha commented May 17, 2024 •

edited

Loading

dimaioksha commented May 17, 2024 •

edited

Loading

rkooo567 left a comment •

edited

Loading

dimaioksha commented May 17, 2024 •

edited

Loading

dimaioksha commented May 19, 2024 •

edited

Loading

esmeetu commented May 20, 2024 •

edited

Loading

dimaioksha commented May 20, 2024 •

edited

Loading