`early_stopping` potentially not working via api request #2938

Maxusmusti · 2024-02-20T21:45:15Z

While using v0.3.1, early_stopping will not toggle to True due to an omission in the protocol definition (see below comments). I am prompting like this:

headers = {
    'Content-Type': 'application/json',
}

json_data = {
    'model': '/mnt/models/',
    'prompt': ['Something', 'Something'],
    'max_tokens': 128,
    'use_beam_search': True,
    'best_of': 4,
    'temperature': 0,
    'early_stopping': True,
    ###'min_tokens': 30
    ###'stop_token_ids': [50256],
}

response = requests.post(f'{api_server}/v1/completions', headers=headers, json=json_data, verify=False)
print(json.loads(response.text))

and this is what I get on server side:

INFO 02-20 21:36:30 async_llm_engine.py:433] Received request cmpl-0a8493bd1a77481fb2396bb42c6bd9af-1: prompt: None, prefix_pos: None,sampling_params: SamplingParams(n=1, best_of=4, presence_penalty=0.0, frequency_penalty=0.0, repetition_penalty=1.0, temperature=0.0, top_p=1.0, top_k=-1, min_p=0.0, use_beam_search=True, length_penalty=1.0, early_stopping=False, stop=[], stop_token_ids=[], include_stop_str_in_output=False, ignore_eos=False, max_tokens=128, logprobs=None, prompt_logprobs=None, skip_special_tokens=True, spaces_between_special_tokens=True), prompt_token_ids: [5195, 407, 30], lora_request: None.

Everything else I set is there, but the early stopping isn't, even though none of my other options should be incompatible:

vllm/vllm/sampling_params.py

Line 106 in 264017a

early_stopping: Union[bool, str] = False,

The text was updated successfully, but these errors were encountered:

Maxusmusti · 2024-02-20T21:48:19Z

Additionally, I can set the value of early_stopping to any random value, and it will throw no error, so it seems like it isn't even being passed to the SamplingParameters

Maxusmusti · 2024-02-20T22:10:46Z

Upon further inspection, it looks like it has just been left out of the completion/chat-completion apis, possibly an oversight?: https://github.com/vllm-project/vllm/blob/main/vllm/entrypoints/openai/protocol.py#L56

Maxusmusti · 2024-02-20T22:17:27Z

Opened a PR with a potential quick 4-line fix, let me know if it looks like anything is missing! #2939

simon-mo · 2024-02-22T01:22:15Z

It's not an oversight originally because it is not part of the official API https://platform.openai.com/docs/api-reference/chat/create

But it does seem needed. Thank you for your PR

njhill · 2024-02-22T01:51:48Z

use_beam_search and length_penalty also aren't part of the official API, it goes along with those I guess

hmellor · 2024-03-06T17:00:41Z

Closed by #2939

Maxusmusti mentioned this issue Feb 20, 2024

Added early stopping to completion APIs #2939

Merged

hmellor closed this as completed Mar 6, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`early_stopping` potentially not working via api request #2938

`early_stopping` potentially not working via api request #2938

Maxusmusti commented Feb 20, 2024 •

edited

Loading

Maxusmusti commented Feb 20, 2024

Maxusmusti commented Feb 20, 2024

Maxusmusti commented Feb 20, 2024

simon-mo commented Feb 22, 2024

njhill commented Feb 22, 2024

hmellor commented Mar 6, 2024

early_stopping potentially not working via api request #2938

early_stopping potentially not working via api request #2938

Comments

Maxusmusti commented Feb 20, 2024 • edited Loading

Maxusmusti commented Feb 20, 2024

Maxusmusti commented Feb 20, 2024

Maxusmusti commented Feb 20, 2024

simon-mo commented Feb 22, 2024

njhill commented Feb 22, 2024

hmellor commented Mar 6, 2024

`early_stopping` potentially not working via api request #2938

`early_stopping` potentially not working via api request #2938

Maxusmusti commented Feb 20, 2024 •

edited

Loading