Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

vLLM/OpenAI Compatible Endpoint #6968

Open
Elsayed91 opened this issue Mar 10, 2024 · 5 comments
Open

vLLM/OpenAI Compatible Endpoint #6968

Elsayed91 opened this issue Mar 10, 2024 · 5 comments
Assignees
Labels
enhancement New feature or request

Comments

@Elsayed91
Copy link

Is your feature request related to a problem? Please describe.
vLLM backend works well and is easy to set up, compared to TensorRT which had me pulling my hair.

However it lacks the OpenAI compatible endpoint that ships with vLLM itself.

The /generate endpoint on its own requires work to setup for chat applications (that I honestly don't know how to do).

In essence, just by adopting vLLM triton instead of vLLM, you have to develop classes and interfaces for all these things.

Not to mention that LangChain has no LLM implementation and LlamaIndex's is a bit primitive, undocumented and bugs out.

Describe the solution you'd like
Include vLLM's OpenAI compatible endpoint as an endpoint while using Triton.

Additional context
Pros:

  • Better integration with Langchain (through ChatOpenAI) and LlamaIndex
  • Triton becomes orders of magnitude easier to setup, run and migrate to (i.e you don't have to rebuild your whole toolset to accommodate Triton)
  • Better out-of-the-box integration with a ton of tools in the market that integrate with OpenAI compatible endpoints (eg. Langfuse, Langsmith)

It would be wonderful if it existed as a feature for all backends, but for now, with vLLM's implementation as reference, maybe that is the best starting point.

https://github.com/vllm-project/vllm/blob/main/vllm/entrypoints/openai/serving_chat.py
https://github.com/vllm-project/vllm/blob/main/vllm/entrypoints/openai/api_server.py
https://github.com/npuichigo/openai_trtllm/tree/main

@lkomali lkomali added the enhancement New feature or request label Mar 11, 2024
@lkomali
Copy link
Contributor

lkomali commented Mar 13, 2024

@Elsayed91 I filed a feature request to the team.
DLIS-6323

@gongyifeiisme
Copy link

Not supporting openai style made me abandon it outright

@panpan0000
Copy link

any update or progress on this ?

@nnshah1 nnshah1 self-assigned this Apr 26, 2024
@nnshah1
Copy link
Contributor

nnshah1 commented Apr 26, 2024

@panpan0000 , @Elsayed91 is improved integration with llamaindex / langchain the goal or is direct support?

Would support via the python in process api be sufficient or is c/ c++ implementation required?

@panpan0000
Copy link

panpan0000 commented May 14, 2024

@panpan0000 , @Elsayed91 is improved integration with llamaindex / langchain the goal or is direct support?

Would support via the python in process api be sufficient or is c/ c++ implementation required?

I don't quite understand what you mentioned ..sorry @nnshah1
this is a similar issue which may help to clarify
#6583

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Development

No branches or pull requests

5 participants