Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LM Format Enforcer Guided Decoding Support #3868

Merged
merged 29 commits into from
Apr 16, 2024

Conversation

noamgat
Copy link
Contributor

@noamgat noamgat commented Apr 5, 2024

FIX #3713
This PR implements the issue raised by simon-mo to add LMFE decoding support to the OpenAI server via a command line argument.

It introduces a new command line arg
--guided-decoding-backend which defaults to outlines (current implementation) but also support lm-format-enforcer.

Example curl command to test with:

If you start the server with

python -m vllm.entrypoints.openai.api_server --guided-decoding-backend lm-format-enforcer --model meta-llama/Llama-2-7b-chat-hf

You can then run

curl http://localhost:8000/v1/completions \
-H "Content-Type: application/json" \
-d '{
"model": "meta-llama/Llama-2-7b-chat-hf",
"prompt": "Please create a random person's name.",
"max_tokens": 100,
"temperature": 0,
"guided_json": {  "$schema": "http://json-schema.org/draft-04/schema#",  "description": "",  "type": "object",  "properties": {    "name": {      "type": "object",      "properties": {        "first": {          "type": "string",          "minLength": 1        },        "middle": {          "type": "string",          "minLength": 1        },        "last": {          "type": "string",          "minLength": 1        }      },      "required": [        "first",        "middle",        "last"      ]    }  },  "required": [    "name"  ]}
}'

And get a proper json response.

@noamgat
Copy link
Contributor Author

noamgat commented Apr 5, 2024

The checks caused me to realize some gaps in my original PR. They have all been addressed, this PR is 100% ready to be reviewed.

# Conflicts:
#	requirements-common.txt
@simon-mo simon-mo self-assigned this Apr 5, 2024
Copy link
Collaborator

@simon-mo simon-mo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is pretty good! However, it would be even better if you can help separate out the tests in tests/test_entrypoints/test_openai_server that tests guided decoding into a separate file and we can create a fixture over the server (perhaps different ports) to run end to end test cases. In that case we would have a lot of confidence in recommending both solutions.

vllm/config.py Outdated Show resolved Hide resolved
vllm/config.py Outdated Show resolved Hide resolved
@noamgat
Copy link
Contributor Author

noamgat commented Apr 6, 2024

In order to make the tests simpler, I ended up adding a per-request option as well (same idea guided_decoding_backend alongside guided_json etc), and run the tests for both outlines and LMFE.
It required a small change to the outlines implementation, the current one modified the input tokenizer in a breaking fashion, I modified it to work on a copy of it (and added lru_cache to avoid a performance hit)

@jamestwhedbee
Copy link
Contributor

This is great, it is working exactly as expected for me

@jamestwhedbee
Copy link
Contributor

Anything else you think is blocking a merge @simon-mo?

@noamgat
Copy link
Contributor Author

noamgat commented Apr 10, 2024

The failing test is due to a temporary networking issue during the run.

E       socket.gaierror: [Errno -3] Temporary failure in name resolution
 ```

@noamgat
Copy link
Contributor Author

noamgat commented Apr 12, 2024

@simon-mo any update on the review?

@simon-mo simon-mo enabled auto-merge (squash) April 16, 2024 05:03
@simon-mo simon-mo merged commit 0543476 into vllm-project:main Apr 16, 2024
46 checks passed
robertgshaw2-neuralmagic pushed a commit to neuralmagic/nm-vllm that referenced this pull request Apr 21, 2024
z103cb pushed a commit to z103cb/opendatahub_vllm that referenced this pull request Apr 22, 2024
robertgshaw2-neuralmagic pushed a commit to neuralmagic/nm-vllm that referenced this pull request Apr 26, 2024
Temirulan pushed a commit to Temirulan/vllm-whisper that referenced this pull request Sep 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Feature]: Integrate with lm-format-enforcer
3 participants