Support for vLLM and TRT-LLM running in OpenAI compatible mode #6583

vecorro · 2023-11-15T19:25:59Z

Is your feature request related to a problem? Please describe.
I'd like to be able to run vLLM emulating the OpenAI compatible API to use vLLM as a drop-in replacement of ChatGPT.

Describe the solution you'd like
I'd like Triton allow me run vLLM as indicated in vLLM documentation

Example:

python -m vllm.entrypoints.openai.api_server --model facebook/opt-125m

Describe alternatives you've considered
It is possible to use the REST API, however, for developers already leveraging OpenAI and serving open-source LLMs using the OpenAI API would allow a faster path for replacement of OpenAI

The text was updated successfully, but these errors were encountered:

krishung5 · 2023-11-16T19:48:02Z

Thanks for submitting feature request! CC @nnshah1 on the request for starting server with the OpenAI compatible API.

For the client side, we have introduced the generate endpoint. which is an OpenAI-like endpoint support and can improve Triton adoption for LLM use cases.

nnshah1 · 2023-11-16T20:01:10Z

@vecorro It's a bit challenging as the OpenAI API is a moving target and already the completion api is legacy and out of date with many applications. As triton as an inference server serves many types of inference we've focused on providing ways to customize endpoints also with the expectation that many actual deployments would use an additional api gateway to translate from a service api to the inference api (for example langchain).

that being said - your perspective is important. Is the legacy completion api enough or would we need to support the newer apis as well? Do you think integration with a project like langchain or other llm projects would be a viable approach or would triton need to provide the interface directly?

chymian · 2023-11-18T09:32:45Z

@nnshah1

that being said - your perspective is important. Is the legacy completion api enough or would we need to support the newer apis as well? Do you think integration with a project like langchain or other llm projects would be a viable approach or would triton need to provide the interface directly?

The key here, to be usefull to the community, is actuality & compatibility. every other often used tool, like LiteLLM (middlware/API-Proxy) or loaders, like ooba's TGI, vLLM, FastChat are on the run to implement functions and to pick up with v1, since this is essential for local LLMs to be a replacement for the costly openai-stuff.
There are a few very popular projects which use the very latest features heavly, like Mircosofts Autogen and MemGPT.

IMHO: I have been testing all these and just had a peek into triton, but already assume, it would be the much supirior solution of running multiple LLMs locally - IF there would be an out of the box compatibility with the openAI-API.
Maybe teaming up with https://litellm.ai/ is a quick solution.

ishaan-jaff · 2023-11-20T23:50:05Z

@chymian @nnshah1 im the litellm maintainer - what do you need from us ?

vecorro · 2023-11-21T13:00:23Z

I agree with @nnshah1; the key is actuality and compatibility, and I would add ease of use. For experimentation/ded purposes, vLLM is excellent as it allows you to try an LLM without the complexity and time required by TensorRT-LLM. Making the existing vLLM implementation of the legacy OpenAI API can be good enough; meanwhile, vLLM evolves to support newer versions of the OpenAI API. I'm on the VMware team working with NVIDIA on the private AI initiative. From conversations with customers looking for ways to run LLMs on-premises, they are getting started on that journey, so a quick drop-in replacement of OpenAI's models would facilitate things for them. Thanks all!

nnshah1 · 2023-12-06T14:31:06Z

@npuichigo has mentioned an integration that could be useful here:

NVIDIA/TensorRT-LLM#591

npuichigo · 2023-12-07T08:09:40Z

@npuichigo has mentioned an integration that could be useful here:

NVIDIA/TensorRT-LLM#591

https://github.com/npuichigo/openai_trtllm provides an OpenAI-like API for trtllm triton backend, but I think vllm in triton would be something alike.

dyastremsky · 2024-02-20T17:28:26Z

Closing issue due to inactivity. Please reopen if you would like to follow up with this issue.

BodhiHu · 2024-03-22T11:24:43Z

Hello,

Seems it's supporting vllm now:
https://github.com/triton-inference-server/tutorials/blob/main/Quick_Deploy/vLLM/README.md#deploying-a-vllm-model-in-triton

But can we use the vllm OpenAI APIs ?

Thanks a lot

nnshah1 · 2024-03-22T19:20:33Z

We don't currently support it directly - but are still thinking through the best ways to add a compatible API in a way that we can maintain.

BodhiHu · 2024-03-23T02:42:02Z

Thank you for the clearance~~
though I think we could simply adapte the vLLM vllm.entrypoints.openai.api_server to the triton http endpoint.

panpan0000 · 2024-04-16T08:52:53Z

is there any workaround so far ?

Thank you for the clearance~~ though I think we could simply adapte the vLLM vllm.entrypoints.openai.api_server to the triton http endpoint.

nnshah1 · 2024-04-27T12:11:11Z

@chymian @nnshah1 im the litellm maintainer - what do you need from us ?

@ishaan-jaff

Is there a guide on adding an llm provider to litellm or a proscribed starting point / skeleton?

anubhav-agrawal-mu-sigma · 2024-08-12T14:23:59Z

Is this planned for any specific release?

nnshah1 · 2024-08-12T14:26:56Z

@anubhav-agrawal-mu-sigma - we are currently planning a tutorial showcasing how to create an open ai compatible triton server using triton's in-process python api - that tutorial is planned for September time frame.

catle2aurecon · 2024-09-05T06:32:45Z

@nnshah1 : I am interested in the mentioned tutorial, please let me know when you have it online. Many thanks !

agladboy · 2024-09-30T18:29:48Z

@nnshah1 I’m excited to see the tutorial! Would you please to let me know when it will be available? Thanks in advance!

BrandesDenis · 2024-10-02T06:05:10Z

@nnshah1 I'm interested too!

nnshah1 · 2024-10-02T14:33:56Z

You can follow along the current preview PR here:

#7561

Some things may change- so please consider this as BETA until we finalize a few of the internal structure - but this will be the basis.

nnshah1 added the enhancement New feature or request label Nov 16, 2023

nnshah1 self-assigned this Nov 21, 2023

peterableda mentioned this issue Dec 3, 2023

Triton LangChain Provider #6655

Closed

dyastremsky closed this as completed Feb 20, 2024

nnshah1 reopened this Mar 22, 2024

nnshah1 changed the title ~~Support for vLLM running in OpenAI compatible mode~~ Support for vLLM and TRT-LLM running in OpenAI compatible mode Mar 22, 2024

panpan0000 mentioned this issue May 14, 2024

vLLM/OpenAI Compatible Endpoint #6968

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support for vLLM and TRT-LLM running in OpenAI compatible mode #6583

Support for vLLM and TRT-LLM running in OpenAI compatible mode #6583

vecorro commented Nov 15, 2023

krishung5 commented Nov 16, 2023

nnshah1 commented Nov 16, 2023

chymian commented Nov 18, 2023

ishaan-jaff commented Nov 20, 2023

vecorro commented Nov 21, 2023

nnshah1 commented Dec 6, 2023

npuichigo commented Dec 7, 2023

dyastremsky commented Feb 20, 2024

BodhiHu commented Mar 22, 2024

nnshah1 commented Mar 22, 2024

BodhiHu commented Mar 23, 2024

panpan0000 commented Apr 16, 2024

nnshah1 commented Apr 27, 2024 •

edited

Loading

anubhav-agrawal-mu-sigma commented Aug 12, 2024

nnshah1 commented Aug 12, 2024

catle2aurecon commented Sep 5, 2024

agladboy commented Sep 30, 2024

BrandesDenis commented Oct 2, 2024

nnshah1 commented Oct 2, 2024

Support for vLLM and TRT-LLM running in OpenAI compatible mode #6583

Support for vLLM and TRT-LLM running in OpenAI compatible mode #6583

Comments

vecorro commented Nov 15, 2023

krishung5 commented Nov 16, 2023

nnshah1 commented Nov 16, 2023

chymian commented Nov 18, 2023

ishaan-jaff commented Nov 20, 2023

vecorro commented Nov 21, 2023

nnshah1 commented Dec 6, 2023

npuichigo commented Dec 7, 2023

dyastremsky commented Feb 20, 2024

BodhiHu commented Mar 22, 2024

nnshah1 commented Mar 22, 2024

BodhiHu commented Mar 23, 2024

panpan0000 commented Apr 16, 2024

nnshah1 commented Apr 27, 2024 • edited Loading

anubhav-agrawal-mu-sigma commented Aug 12, 2024

nnshah1 commented Aug 12, 2024

catle2aurecon commented Sep 5, 2024

agladboy commented Sep 30, 2024

BrandesDenis commented Oct 2, 2024

nnshah1 commented Oct 2, 2024

nnshah1 commented Apr 27, 2024 •

edited

Loading