-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: OpenAI Compatible Frontend #7561
Merged
Merged
Changes from 1 commit
Commits
Show all changes
81 commits
Select commit
Hold shift + click to select a range
637db32
Initial code migration, start the testing structure
rmccorm4 e14128b
Restructure to recommended FastAPI project structure, add simple test…
rmccorm4 a37b0b3
Start a CONTRIBUTING.md
rmccorm4 7eb1ffc
Add simple /completions endpoint test
rmccorm4 530c871
Add some plumbing for /v1/models routes, add mock_llm python model to…
rmccorm4 9eba9c3
Add simple tests for /v1/models and remove chat_completions test unti…
rmccorm4 fb7ce72
Add some basic chat completions support and testing
rmccorm4 0cf8fae
WIP: Add OpenAI client test that works when server is already running…
rmccorm4 3d227dd
Flesh out /completions tests more, refactor to class fixture for runn…
rmccorm4 4c1ac55
Update chat completions schema to enforce max_tokens >= 0, and lower …
rmccorm4 5b15877
Add more tests around max_tokens and temperature behavior, as well as…
rmccorm4 f9f4b07
Remove unused parts from tokenizer.py
rmccorm4 773aee0
All existing tests passing for both TRT-LLM and vLLM, updated model l…
rmccorm4 567abf3
Add streaming test placeholders, add test where no tokenizer is defined
rmccorm4 6e1bfaf
Add OpenAI Python Client tests, add streaming chat completions test, …
rmccorm4 4e3a441
Add 'echo' parameter test, but skip it for TRT-LLm due to only suppor…
rmccorm4 523f369
Fix issue with finish_reason for non-streaming completion when using …
rmccorm4 75f71ce
Move triton response validation into common triton utils
rmccorm4 118887c
Reduce code copying and global variables, use conftest.py for shared …
rmccorm4 6cf2e77
Split Dockefile in 2 to capture llama3.1 requirement for vllm
rmccorm4 66afc48
Split Dockerfile in 2 to capture llama3.1 requirement for vllm
rmccorm4 0bbd248
Add configurable model parameter to examples
rmccorm4 6e59f6e
Fix streaming for genai-perf by setting the content-type to text/even…
rmccorm4 763b3a4
Update examples to default to vllm model for simplicity
rmccorm4 0328ea6
Start high level README for other developers
rmccorm4 43dd329
Move openai source code into server/python/openai folder, and flesh o…
rmccorm4 363b40e
Move openai code to server/python folder
rmccorm4 d35d336
Add disclaimer for TRT-LLM to README
rmccorm4 63fc4a7
Fix README typos
rmccorm4 4a729c0
Fix relative path for OpenAI server helper after moving locations
rmccorm4 0f459b1
Add placeholder L0_openai test folder back
rmccorm4 0b3def0
Add transformers upgrade for Llama3.1 in vllm
rmccorm4 2e897b9
Add requirements.txt files for use in testing
rmccorm4 f54a4fa
Add placeholder test script
rmccorm4 c2786b2
Cleanup test script for local file reference
rmccorm4 021c577
Fix paths and empty function
rmccorm4 a69bfd1
Install tritonserver python wheel
rmccorm4 6361bd1
Add TRT-LLM detection and model repo generation
rmccorm4 c096ba5
Fix trtllm model count comparison to 4, excluding ensemble
rmccorm4 5631231
Fail on pytest errors
rmccorm4 e77f85c
Try copying engines out of NFS mount for faster test I/O
rmccorm4 b41a6f7
Use model var
rmccorm4 8251923
Time the duration of copying from nfs mount
rmccorm4 f928a81
Try rsync over cp
rmccorm4 81ef479
Remove use of NFS mount due to slow I/O for now
rmccorm4 42676da
Propagate test failure to job failure and log collection
rmccorm4 cacaf0b
Add xml files to gitignore
rmccorm4 b6c3f9e
Test /v1/models with multiple models and remove TODOs
rmccorm4 5cc80fe
Add openai folder copy to gitignore in testing
rmccorm4 9f70a1d
Add streaming completion test, remove trtllm models from git repo
rmccorm4 d00d237
Remove unnecessary TODOs
rmccorm4 ae2fcd6
Add copyrights and replace dupe test model
rmccorm4 fc4c15a
Add disclaimer around application state and multiprocessing
rmccorm4 1ca9889
Address CodeQL warnings
rmccorm4 92a27e5
Add quickstart vllm dockerfile for sharing purposes
rmccorm4 9c3ee15
Remove workspace mount mention
rmccorm4 886ee7d
Review feedback: rename package, move tests out of package, remove ne…
rmccorm4 21c0996
Review feedback: naming nits, more type hints, helper functions
rmccorm4 f84aec4
Fix CodeQL import warning
rmccorm4 b230697
refactor: Use thinner API server with an engine interface (#7570)
rmccorm4 ea23eeb
Update dockerfile branch, fix CodeQL error
rmccorm4 156535c
Add tests for custom tokenizers by local file path
rmccorm4 9b7dc59
Expose --backend request format override to main.py, and expose env v…
rmccorm4 a1484e4
Fix tokenizer test, remove TODO
rmccorm4 33eee48
perf: Improve chat completions performance at high concurrency (#7653)
rmccorm4 0882b60
review feedback: use _to_string helper function, add some clarifying …
rmccorm4 f073fbf
feat: KServe Bindings to start tritonfrontend (#7662)
KrishnanPrash 2d0f7e6
chore: Fix argparse typo, cleanup argparse groups, make kserve fronte…
rmccorm4 78e571d
fix: Support sampling parameters of type List for vLLM backend (stop …
rmccorm4 579ad63
Review feedback: remove examples/ and docker/ folders, update README …
rmccorm4 815eebe
Add a few FIXMEs for follow-up
rmccorm4 8f92734
Add requirements.txt back in, fix test and docs accordingly
rmccorm4 5c0b2e6
Fix TRT-LLM model repo test path
rmccorm4 44b2282
Explicitly return error on unknown fields not defined in schema, excl…
rmccorm4 dc7bdf4
Merge branch 'main' of github.com:triton-inference-server/server into…
rmccorm4 49162be
Add missing copyright headers
rmccorm4 fe45d39
Review feedback: split app and test requirements to 2 requirements files
rmccorm4 2261d13
Fix whitespace pre-commit, remove auto 'git add' from copyright tool
rmccorm4 2e2a190
Disable copyright pre-commit hook until fixed on GitHub Actions side
rmccorm4 cc8657d
Fix attribution for tokenizer util
rmccorm4 fa9501e
Fix copyright header on copyright tool, remove unused import
rmccorm4 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NOTE: May make more sense to explicitly "include" support sampling parameters, but will be of a similar length or longer. Both approaches likely require periodic updates to either include a new sampling field, or exclude a non-sampling field.