Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

early_stopping potentially not working via api request #2938

Closed
Maxusmusti opened this issue Feb 20, 2024 · 6 comments
Closed

early_stopping potentially not working via api request #2938

Maxusmusti opened this issue Feb 20, 2024 · 6 comments

Comments

@Maxusmusti
Copy link
Contributor

Maxusmusti commented Feb 20, 2024

While using v0.3.1, early_stopping will not toggle to True due to an omission in the protocol definition (see below comments). I am prompting like this:

headers = {
    'Content-Type': 'application/json',
}

json_data = {
    'model': '/mnt/models/',
    'prompt': ['Something', 'Something'],
    'max_tokens': 128,
    'use_beam_search': True,
    'best_of': 4,
    'temperature': 0,
    'early_stopping': True,
    ###'min_tokens': 30
    ###'stop_token_ids': [50256],
}

response = requests.post(f'{api_server}/v1/completions', headers=headers, json=json_data, verify=False)
print(json.loads(response.text))

and this is what I get on server side:

INFO 02-20 21:36:30 async_llm_engine.py:433] Received request cmpl-0a8493bd1a77481fb2396bb42c6bd9af-1: prompt: None, prefix_pos: None,sampling_params: SamplingParams(n=1, best_of=4, presence_penalty=0.0, frequency_penalty=0.0, repetition_penalty=1.0, temperature=0.0, top_p=1.0, top_k=-1, min_p=0.0, use_beam_search=True, length_penalty=1.0, early_stopping=False, stop=[], stop_token_ids=[], include_stop_str_in_output=False, ignore_eos=False, max_tokens=128, logprobs=None, prompt_logprobs=None, skip_special_tokens=True, spaces_between_special_tokens=True), prompt_token_ids: [5195, 407, 30], lora_request: None.

Everything else I set is there, but the early stopping isn't, even though none of my other options should be incompatible:

early_stopping: Union[bool, str] = False,

@Maxusmusti
Copy link
Contributor Author

Additionally, I can set the value of early_stopping to any random value, and it will throw no error, so it seems like it isn't even being passed to the SamplingParameters

@Maxusmusti
Copy link
Contributor Author

Upon further inspection, it looks like it has just been left out of the completion/chat-completion apis, possibly an oversight?: https://github.com/vllm-project/vllm/blob/main/vllm/entrypoints/openai/protocol.py#L56

@Maxusmusti
Copy link
Contributor Author

Opened a PR with a potential quick 4-line fix, let me know if it looks like anything is missing! #2939

@simon-mo
Copy link
Collaborator

It's not an oversight originally because it is not part of the official API https://platform.openai.com/docs/api-reference/chat/create

But it does seem needed. Thank you for your PR

@njhill
Copy link
Member

njhill commented Feb 22, 2024

use_beam_search and length_penalty also aren't part of the official API, it goes along with those I guess

@hmellor
Copy link
Collaborator

hmellor commented Mar 6, 2024

Closed by #2939

@hmellor hmellor closed this as completed Mar 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants