-
-
Notifications
You must be signed in to change notification settings - Fork 4.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
LM Format Enforcer Guided Decoding Support #3868
Conversation
# Conflicts: # requirements-rocm.txt # vllm/config.py # vllm/engine/arg_utils.py
# Conflicts: # requirements-common.txt # requirements-rocm.txt
The checks caused me to realize some gaps in my original PR. They have all been addressed, this PR is 100% ready to be reviewed. |
# Conflicts: # requirements-common.txt
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is pretty good! However, it would be even better if you can help separate out the tests in tests/test_entrypoints/test_openai_server that tests guided decoding into a separate file and we can create a fixture over the server (perhaps different ports) to run end to end test cases. In that case we would have a lot of confidence in recommending both solutions.
Co-authored-by: Simon Mo <[email protected]>
Co-authored-by: Simon Mo <[email protected]>
…rmal state after using it
In order to make the tests simpler, I ended up adding a per-request option as well (same idea |
This is great, it is working exactly as expected for me |
Anything else you think is blocking a merge @simon-mo? |
The failing test is due to a temporary networking issue during the run.
|
@simon-mo any update on the review? |
Co-authored-by: Simon Mo <[email protected]>
Co-authored-by: Simon Mo <[email protected]>
Co-authored-by: Simon Mo <[email protected]>
Co-authored-by: Simon Mo <[email protected]>
FIX #3713
This PR implements the issue raised by simon-mo to add LMFE decoding support to the OpenAI server via a command line argument.
It introduces a new command line arg
--guided-decoding-backend
which defaults to outlines (current implementation) but also support lm-format-enforcer.Example curl command to test with:
If you start the server with
You can then run
And get a proper json response.