vLLM/OpenAI Compatible Endpoint #6968

Elsayed91 · 2024-03-10T14:21:23Z

Is your feature request related to a problem? Please describe.
vLLM backend works well and is easy to set up, compared to TensorRT which had me pulling my hair.

However it lacks the OpenAI compatible endpoint that ships with vLLM itself.

The /generate endpoint on its own requires work to setup for chat applications (that I honestly don't know how to do).

In essence, just by adopting vLLM triton instead of vLLM, you have to develop classes and interfaces for all these things.

Not to mention that LangChain has no LLM implementation and LlamaIndex's is a bit primitive, undocumented and bugs out.

Describe the solution you'd like
Include vLLM's OpenAI compatible endpoint as an endpoint while using Triton.

Additional context
Pros:

Better integration with Langchain (through ChatOpenAI) and LlamaIndex
Triton becomes orders of magnitude easier to setup, run and migrate to (i.e you don't have to rebuild your whole toolset to accommodate Triton)
Better out-of-the-box integration with a ton of tools in the market that integrate with OpenAI compatible endpoints (eg. Langfuse, Langsmith)

It would be wonderful if it existed as a feature for all backends, but for now, with vLLM's implementation as reference, maybe that is the best starting point.

https://github.com/vllm-project/vllm/blob/main/vllm/entrypoints/openai/serving_chat.py
https://github.com/vllm-project/vllm/blob/main/vllm/entrypoints/openai/api_server.py
https://github.com/npuichigo/openai_trtllm/tree/main

The text was updated successfully, but these errors were encountered:

lkomali · 2024-03-13T21:29:02Z

@Elsayed91 I filed a feature request to the team.
DLIS-6323

gongyifeiisme · 2024-03-21T08:04:02Z

Not supporting openai style made me abandon it outright

panpan0000 · 2024-04-16T07:43:52Z

any update or progress on this ?

nnshah1 · 2024-04-26T05:52:49Z

@panpan0000 , @Elsayed91 is improved integration with llamaindex / langchain the goal or is direct support?

Would support via the python in process api be sufficient or is c/ c++ implementation required?

panpan0000 · 2024-05-14T08:34:14Z

@panpan0000 , @Elsayed91 is improved integration with llamaindex / langchain the goal or is direct support?

Would support via the python in process api be sufficient or is c/ c++ implementation required?

I don't quite understand what you mentioned ..sorry @nnshah1
this is a similar issue which may help to clarify
#6583

lkomali added the enhancement New feature or request label Mar 11, 2024

nnshah1 self-assigned this Apr 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

vLLM/OpenAI Compatible Endpoint #6968

vLLM/OpenAI Compatible Endpoint #6968

Elsayed91 commented Mar 10, 2024

lkomali commented Mar 13, 2024 •

edited

Loading

gongyifeiisme commented Mar 21, 2024

panpan0000 commented Apr 16, 2024

nnshah1 commented Apr 26, 2024

panpan0000 commented May 14, 2024 •

edited

Loading

vLLM/OpenAI Compatible Endpoint #6968

vLLM/OpenAI Compatible Endpoint #6968

Comments

Elsayed91 commented Mar 10, 2024

lkomali commented Mar 13, 2024 • edited Loading

gongyifeiisme commented Mar 21, 2024

panpan0000 commented Apr 16, 2024

nnshah1 commented Apr 26, 2024

panpan0000 commented May 14, 2024 • edited Loading

lkomali commented Mar 13, 2024 •

edited

Loading

panpan0000 commented May 14, 2024 •

edited

Loading