We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Currently the generate method supports inference based on prompt_token_ids:
generate
prompt_token_ids
def generate( self, prompts: Optional[Union[str, List[str]]] = None, sampling_params: Optional[SamplingParams] = None, prompt_token_ids: Optional[List[List[int]]] = None, use_tqdm: bool = True, lora_request: Optional[LoRARequest] = None, ) -> List[RequestOutput]:
that means tokenizer is optional to the LLM engine.
However, to initiate an LLM engine, it always calls _init_tokenizer , which effectively makes tokenizer required.
_init_tokenizer
The LLM engine cannot be initialized without a valid tokenizer argument.
In our application, we would love to use LLM's powerful engine for inference, but want to keep tokenizer as a separate service.
No response
The text was updated successfully, but these errors were encountered:
I think the main blocker is tokenizer is also used during decode. See #3635
Sorry, something went wrong.
Tensor
Successfully merging a pull request may close this issue.
🚀 The feature, motivation and pitch
Currently the
generate
method supports inference based onprompt_token_ids
:that means tokenizer is optional to the LLM engine.
However, to initiate an LLM engine, it always calls
_init_tokenizer
, which effectively makes tokenizer required.The LLM engine cannot be initialized without a valid tokenizer argument.
In our application, we would love to use LLM's powerful engine for inference, but want to keep tokenizer as a separate service.
Alternatives
No response
Additional context
No response
The text was updated successfully, but these errors were encountered: