Skip to content
This repository has been archived by the owner on Oct 11, 2024. It is now read-only.

Commit

Permalink
[Bugfix] Fixing max token error message for openai compatible server (v…
Browse files Browse the repository at this point in the history
  • Loading branch information
jgordley authored and robertgshaw2-redhat committed Apr 26, 2024
1 parent dd092dd commit 650eca0
Showing 1 changed file with 6 additions and 0 deletions.
6 changes: 6 additions & 0 deletions vllm/entrypoints/openai/serving_engine.py
Original file line number Diff line number Diff line change
Expand Up @@ -214,6 +214,12 @@ def _validate_prompt_and_tokenize(
token_num = len(input_ids)

if request.max_tokens is None:
if token_num >= self.max_model_len:
raise ValueError(
f"This model's maximum context length is "
f"{self.max_model_len} tokens. However, you requested "
f"{token_num} tokens in the messages, "
f"Please reduce the length of the messages.", )
request.max_tokens = self.max_model_len - token_num

if token_num + request.max_tokens > self.max_model_len:
Expand Down

0 comments on commit 650eca0

Please sign in to comment.