[BugFix][Frontend] Use correct, shared tokenizer in OpenAI server #3512

njhill · 2024-03-19T21:39:24Z

The front-end server code currently doesn't use lora-specific tokenizers.

It also won't make use of the recently introduced parallel async tokenization if enabled.

njhill · 2024-03-20T00:07:19Z

Test failures look unrelated (network blips).

Yard1

Could we add a test? We can mock some stuff - just to make sure that if we go through the OpenAI server with different lora requests, they are tokenized correctly.

The front-end server code currently doesn't use lora-specific tokenizers. It also won't make use of the recently introduced parallel async tokenization if enabled.

DarkLight1337 · 2024-05-03T08:06:55Z

Can the same tokenizer be used to apply the chat template as well?

dtrifiro · 2024-06-26T16:41:50Z

vllm/engine/async_llm_engine.py

+        else:
+            return self.engine.get_tokenizer_group()


nit:

Suggested change

else:

return self.engine.get_tokenizer_group()

return self.engine.get_tokenizer_group()

Currently the LoRA tokenizers aren't used in the OpenAI APIs, meaning the behaviour won't be correct if adapters are used that have custom added tokens. This PR includes changes to address that. It mostly replaces vllm-project#3512. More work is needed to address remaining inconsistencies in tokenization behaviour between the OpenAI front-end and standalone LLMEngine/AsyncLLMEngine use, including: - Standalone cases don't honor truncation and add_special_tokens request parameters - OpenAI API cases don't make use of TokenizerGroups for possible parallelization of tokenization As well as some other inefficiencies. But these are to be addressed in follow-on PRs.

DarkLight1337 · 2024-07-18T07:13:50Z

Closing as superseded by #6227.

njhill mentioned this pull request Mar 19, 2024

[Core] Support thread-based async tokenizer pools #3449

Open

zhuohan123 requested a review from Yard1 March 19, 2024 22:51

Yard1 reviewed Mar 20, 2024

View reviewed changes

njhill force-pushed the openai-tokenizer branch 2 times, most recently from 0aa9277 to 06188e7 Compare March 29, 2024 17:30

[BugFix][Frontend] Use correct, shared tokenizer in OpenAI server

1db1b92

The front-end server code currently doesn't use lora-specific tokenizers. It also won't make use of the recently introduced parallel async tokenization if enabled.

njhill force-pushed the openai-tokenizer branch from 06188e7 to 1db1b92 Compare April 10, 2024 13:18

njhill mentioned this pull request Apr 12, 2024

Fix echo/logprob OpenAI completion bug #3441

Merged

This was referenced Apr 12, 2024

[Frontend] Refactor prompt processing #4028

Merged

[Bugfix] Fix LoRA bug #4032

Merged

DarkLight1337 mentioned this pull request May 3, 2024

[Bug]: truncate_prompt_tokens in SamplingParams only available for openai entrypoints, not for offline vLLM engine #4507

Open

tdoublep mentioned this pull request May 6, 2024

[Bugfix] add truncate_prompt_tokens to work offline, directly from LLM class. #4598

Closed

joerunde mentioned this pull request May 28, 2024

Add guided decoding to TGIS gRPC API IBM/vllm#31

Merged

dtrifiro reviewed Jun 26, 2024

View reviewed changes

njhill mentioned this pull request Jul 8, 2024

[BugFix][Frontend] Use LoRA tokenizer in OpenAI APIs #6227

Merged

DarkLight1337 closed this Jul 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BugFix][Frontend] Use correct, shared tokenizer in OpenAI server #3512

[BugFix][Frontend] Use correct, shared tokenizer in OpenAI server #3512

njhill commented Mar 19, 2024

njhill commented Mar 20, 2024

Yard1 left a comment

DarkLight1337 commented May 3, 2024

dtrifiro Jun 26, 2024

DarkLight1337 commented Jul 18, 2024

	else:
	return self.engine.get_tokenizer_group()
	return self.engine.get_tokenizer_group()

[BugFix][Frontend] Use correct, shared tokenizer in OpenAI server #3512

[BugFix][Frontend] Use correct, shared tokenizer in OpenAI server #3512

Conversation

njhill commented Mar 19, 2024

njhill commented Mar 20, 2024

Yard1 left a comment

Choose a reason for hiding this comment

DarkLight1337 commented May 3, 2024

dtrifiro Jun 26, 2024

Choose a reason for hiding this comment

DarkLight1337 commented Jul 18, 2024