Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: OpenAI Compatible Frontend #7561

Merged
merged 81 commits into from
Oct 11, 2024
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
81 commits
Select commit Hold shift + click to select a range
637db32
Initial code migration, start the testing structure
rmccorm4 Jul 31, 2024
e14128b
Restructure to recommended FastAPI project structure, add simple test…
rmccorm4 Aug 2, 2024
a37b0b3
Start a CONTRIBUTING.md
rmccorm4 Aug 2, 2024
7eb1ffc
Add simple /completions endpoint test
rmccorm4 Aug 3, 2024
530c871
Add some plumbing for /v1/models routes, add mock_llm python model to…
rmccorm4 Aug 6, 2024
9eba9c3
Add simple tests for /v1/models and remove chat_completions test unti…
rmccorm4 Aug 6, 2024
fb7ce72
Add some basic chat completions support and testing
rmccorm4 Aug 7, 2024
0cf8fae
WIP: Add OpenAI client test that works when server is already running…
rmccorm4 Aug 7, 2024
3d227dd
Flesh out /completions tests more, refactor to class fixture for runn…
rmccorm4 Aug 8, 2024
4c1ac55
Update chat completions schema to enforce max_tokens >= 0, and lower …
rmccorm4 Aug 8, 2024
5b15877
Add more tests around max_tokens and temperature behavior, as well as…
rmccorm4 Aug 9, 2024
f9f4b07
Remove unused parts from tokenizer.py
rmccorm4 Aug 9, 2024
773aee0
All existing tests passing for both TRT-LLM and vLLM, updated model l…
rmccorm4 Aug 14, 2024
567abf3
Add streaming test placeholders, add test where no tokenizer is defined
rmccorm4 Aug 14, 2024
6e1bfaf
Add OpenAI Python Client tests, add streaming chat completions test, …
rmccorm4 Aug 16, 2024
4e3a441
Add 'echo' parameter test, but skip it for TRT-LLm due to only suppor…
rmccorm4 Aug 16, 2024
523f369
Fix issue with finish_reason for non-streaming completion when using …
rmccorm4 Aug 16, 2024
75f71ce
Move triton response validation into common triton utils
rmccorm4 Aug 16, 2024
118887c
Reduce code copying and global variables, use conftest.py for shared …
rmccorm4 Aug 16, 2024
6cf2e77
Split Dockefile in 2 to capture llama3.1 requirement for vllm
rmccorm4 Aug 16, 2024
66afc48
Split Dockerfile in 2 to capture llama3.1 requirement for vllm
rmccorm4 Aug 16, 2024
0bbd248
Add configurable model parameter to examples
rmccorm4 Aug 16, 2024
6e59f6e
Fix streaming for genai-perf by setting the content-type to text/even…
rmccorm4 Aug 19, 2024
763b3a4
Update examples to default to vllm model for simplicity
rmccorm4 Aug 19, 2024
0328ea6
Start high level README for other developers
rmccorm4 Aug 19, 2024
43dd329
Move openai source code into server/python/openai folder, and flesh o…
rmccorm4 Aug 19, 2024
363b40e
Move openai code to server/python folder
rmccorm4 Aug 19, 2024
d35d336
Add disclaimer for TRT-LLM to README
rmccorm4 Aug 19, 2024
63fc4a7
Fix README typos
rmccorm4 Aug 19, 2024
4a729c0
Fix relative path for OpenAI server helper after moving locations
rmccorm4 Aug 19, 2024
0f459b1
Add placeholder L0_openai test folder back
rmccorm4 Aug 19, 2024
0b3def0
Add transformers upgrade for Llama3.1 in vllm
rmccorm4 Aug 20, 2024
2e897b9
Add requirements.txt files for use in testing
rmccorm4 Aug 20, 2024
f54a4fa
Add placeholder test script
rmccorm4 Aug 20, 2024
c2786b2
Cleanup test script for local file reference
rmccorm4 Aug 21, 2024
021c577
Fix paths and empty function
rmccorm4 Aug 21, 2024
a69bfd1
Install tritonserver python wheel
rmccorm4 Aug 21, 2024
6361bd1
Add TRT-LLM detection and model repo generation
rmccorm4 Aug 21, 2024
c096ba5
Fix trtllm model count comparison to 4, excluding ensemble
rmccorm4 Aug 21, 2024
5631231
Fail on pytest errors
rmccorm4 Aug 21, 2024
e77f85c
Try copying engines out of NFS mount for faster test I/O
rmccorm4 Aug 21, 2024
b41a6f7
Use model var
rmccorm4 Aug 21, 2024
8251923
Time the duration of copying from nfs mount
rmccorm4 Aug 21, 2024
f928a81
Try rsync over cp
rmccorm4 Aug 21, 2024
81ef479
Remove use of NFS mount due to slow I/O for now
rmccorm4 Aug 21, 2024
42676da
Propagate test failure to job failure and log collection
rmccorm4 Aug 21, 2024
cacaf0b
Add xml files to gitignore
rmccorm4 Aug 21, 2024
b6c3f9e
Test /v1/models with multiple models and remove TODOs
rmccorm4 Aug 21, 2024
5cc80fe
Add openai folder copy to gitignore in testing
rmccorm4 Aug 21, 2024
9f70a1d
Add streaming completion test, remove trtllm models from git repo
rmccorm4 Aug 21, 2024
d00d237
Remove unnecessary TODOs
rmccorm4 Aug 22, 2024
ae2fcd6
Add copyrights and replace dupe test model
rmccorm4 Aug 22, 2024
fc4c15a
Add disclaimer around application state and multiprocessing
rmccorm4 Aug 22, 2024
1ca9889
Address CodeQL warnings
rmccorm4 Aug 22, 2024
92a27e5
Add quickstart vllm dockerfile for sharing purposes
rmccorm4 Aug 23, 2024
9c3ee15
Remove workspace mount mention
rmccorm4 Aug 23, 2024
886ee7d
Review feedback: rename package, move tests out of package, remove ne…
rmccorm4 Aug 23, 2024
21c0996
Review feedback: naming nits, more type hints, helper functions
rmccorm4 Aug 24, 2024
f84aec4
Fix CodeQL import warning
rmccorm4 Aug 24, 2024
b230697
refactor: Use thinner API server with an engine interface (#7570)
rmccorm4 Aug 29, 2024
ea23eeb
Update dockerfile branch, fix CodeQL error
rmccorm4 Aug 29, 2024
156535c
Add tests for custom tokenizers by local file path
rmccorm4 Aug 29, 2024
9b7dc59
Expose --backend request format override to main.py, and expose env v…
rmccorm4 Aug 31, 2024
a1484e4
Fix tokenizer test, remove TODO
rmccorm4 Sep 4, 2024
33eee48
perf: Improve chat completions performance at high concurrency (#7653)
rmccorm4 Sep 25, 2024
0882b60
review feedback: use _to_string helper function, add some clarifying …
rmccorm4 Sep 25, 2024
f073fbf
feat: KServe Bindings to start tritonfrontend (#7662)
KrishnanPrash Sep 26, 2024
2d0f7e6
chore: Fix argparse typo, cleanup argparse groups, make kserve fronte…
rmccorm4 Sep 27, 2024
78e571d
fix: Support sampling parameters of type List for vLLM backend (stop …
rmccorm4 Oct 7, 2024
579ad63
Review feedback: remove examples/ and docker/ folders, update README …
rmccorm4 Oct 9, 2024
815eebe
Add a few FIXMEs for follow-up
rmccorm4 Oct 9, 2024
8f92734
Add requirements.txt back in, fix test and docs accordingly
rmccorm4 Oct 9, 2024
5c0b2e6
Fix TRT-LLM model repo test path
rmccorm4 Oct 9, 2024
44b2282
Explicitly return error on unknown fields not defined in schema, excl…
rmccorm4 Oct 9, 2024
dc7bdf4
Merge branch 'main' of github.com:triton-inference-server/server into…
rmccorm4 Oct 10, 2024
49162be
Add missing copyright headers
rmccorm4 Oct 10, 2024
fe45d39
Review feedback: split app and test requirements to 2 requirements files
rmccorm4 Oct 10, 2024
2261d13
Fix whitespace pre-commit, remove auto 'git add' from copyright tool
rmccorm4 Oct 10, 2024
2e2a190
Disable copyright pre-commit hook until fixed on GitHub Actions side
rmccorm4 Oct 10, 2024
cc8657d
Fix attribution for tokenizer util
rmccorm4 Oct 10, 2024
fa9501e
Fix copyright header on copyright tool, remove unused import
rmccorm4 Oct 10, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
22 changes: 21 additions & 1 deletion python/openai/openai_frontend/engine/utils/triton.py
Original file line number Diff line number Diff line change
Expand Up @@ -36,14 +36,34 @@ def _create_vllm_inference_request(
model, prompt, request: CreateChatCompletionRequest | CreateCompletionRequest
):
inputs = {}
excludes = {"model", "stream", "messages", "prompt", "echo"}
# Exclude non-sampling parameters so they aren't passed to vLLM
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NOTE: May make more sense to explicitly "include" support sampling parameters, but will be of a similar length or longer. Both approaches likely require periodic updates to either include a new sampling field, or exclude a non-sampling field.

excludes = {
"model",
"stream",
"messages",
"prompt",
"echo",
"store",
"metadata",
"response_format",
"service_tier",
"stream_options",
"tools",
"tool_choice",
"parallel_tool_calls",
"user",
"function_call",
"functions",
"suffix",
}

# NOTE: The exclude_none is important, as internals may not support
# values of NoneType at this time.
sampling_parameters = request.model_dump_json(
exclude=excludes,
exclude_none=True,
)

exclude_input_in_output = True
echo = getattr(request, "echo", None)
if echo is not None:
Expand Down
6 changes: 6 additions & 0 deletions python/openai/openai_frontend/schemas/openai.py
Original file line number Diff line number Diff line change
Expand Up @@ -42,6 +42,9 @@ class PromptItem(RootModel):


class CreateCompletionRequest(BaseModel):
# Explicitly return errors for unknown fields.
model_config: ConfigDict = ConfigDict(extra="forbid")

model: Union[str, Model1] = Field(
...,
description="ID of the model to use. You can use the [List models](/docs/api-reference/models/list) API to see all of your available models, or see our [Model overview](/docs/models/overview) for descriptions of them.\n",
Expand Down Expand Up @@ -776,6 +779,9 @@ def content(self):


class CreateChatCompletionRequest(BaseModel):
# Explicitly return errors for unknown fields.
model_config: ConfigDict = ConfigDict(extra="forbid")

messages: List[ChatCompletionRequestMessage] = Field(
...,
description="A list of messages comprising the conversation so far. [Example Python code](https://cookbook.openai.com/examples/how_to_format_inputs_to_chatgpt_models).",
Expand Down
Loading