LM Format Enforcer Guided Decoding Support #3868

noamgat · 2024-04-05T07:05:05Z

FIX #3713
This PR implements the issue raised by simon-mo to add LMFE decoding support to the OpenAI server via a command line argument.

It introduces a new command line arg
--guided-decoding-backend which defaults to outlines (current implementation) but also support lm-format-enforcer.

Example curl command to test with:

If you start the server with

python -m vllm.entrypoints.openai.api_server --guided-decoding-backend lm-format-enforcer --model meta-llama/Llama-2-7b-chat-hf

You can then run

curl http://localhost:8000/v1/completions \
-H "Content-Type: application/json" \
-d '{
"model": "meta-llama/Llama-2-7b-chat-hf",
"prompt": "Please create a random person's name.",
"max_tokens": 100,
"temperature": 0,
"guided_json": {  "$schema": "http://json-schema.org/draft-04/schema#",  "description": "",  "type": "object",  "properties": {    "name": {      "type": "object",      "properties": {        "first": {          "type": "string",          "minLength": 1        },        "middle": {          "type": "string",          "minLength": 1        },        "last": {          "type": "string",          "minLength": 1        }      },      "required": [        "first",        "middle",        "last"      ]    }  },  "required": [    "name"  ]}
}'

And get a proper json response.

# Conflicts: # requirements-rocm.txt # vllm/config.py # vllm/engine/arg_utils.py

# Conflicts: # requirements-common.txt # requirements-rocm.txt

noamgat · 2024-04-05T17:12:27Z

The checks caused me to realize some gaps in my original PR. They have all been addressed, this PR is 100% ready to be reviewed.

# Conflicts: # requirements-common.txt

simon-mo

This is pretty good! However, it would be even better if you can help separate out the tests in tests/test_entrypoints/test_openai_server that tests guided decoding into a separate file and we can create a fixture over the server (perhaps different ports) to run end to end test cases. In that case we would have a lot of confidence in recommending both solutions.

vllm/config.py

Co-authored-by: Simon Mo <[email protected]>

… basis

…rmal state after using it

noamgat · 2024-04-06T16:52:36Z

In order to make the tests simpler, I ended up adding a per-request option as well (same idea guided_decoding_backend alongside guided_json etc), and run the tests for both outlines and LMFE.
It required a small change to the outlines implementation, the current one modified the input tokenizer in a breaking fashion, I modified it to work on a copy of it (and added lru_cache to avoid a performance hit)

jamestwhedbee · 2024-04-08T20:51:27Z

This is great, it is working exactly as expected for me

jamestwhedbee · 2024-04-09T17:16:17Z

Anything else you think is blocking a merge @simon-mo?

noamgat · 2024-04-10T19:52:06Z

The failing test is due to a temporary networking issue during the run.

E       socket.gaierror: [Errno -3] Temporary failure in name resolution
 ```

noamgat · 2024-04-12T18:35:07Z

@simon-mo any update on the review?

Co-authored-by: Simon Mo <[email protected]>

noamgat added 14 commits March 31, 2024 17:47

WIP: Refactoring before lm-format-enforcer integration

ff46b1d

Integrated LM Format enforcer decoding

8fa1d0d

Merge branch 'main' into lmfe-generation

d2ef17c

# Conflicts: # requirements-rocm.txt # vllm/config.py # vllm/engine/arg_utils.py

format.sh

0c3ab87

Making pip install work

04c457f

LMFE integration fixes

3c2f45b

Ruff error

657a5d5

isort fix

2b797ce

Fixing test import

d637a3e

Merge branch 'main' into lmfe-generation

bb078dc

# Conflicts: # requirements-common.txt # requirements-rocm.txt

Refactor : introduced DecodingConfig

f39ec42

Added tests for get_guided_processors wrapper and LMFE implementation

04062b5

format.sh

11e11be

Fixing test_guided_logits_processor_black_box()

ae21046

Merge branch 'main' into lmfe-generation

13d792d

# Conflicts: # requirements-common.txt

simon-mo self-assigned this Apr 5, 2024

simon-mo approved these changes Apr 6, 2024

View reviewed changes

vllm/config.py Outdated Show resolved Hide resolved

vllm/config.py Outdated Show resolved Hide resolved

noamgat and others added 11 commits April 6, 2024 08:14

Update vllm/config.py based on PR review

0cfbb1b

Co-authored-by: Simon Mo <[email protected]>

Update vllm/config.py based on PR review

11cb2db

Co-authored-by: Simon Mo <[email protected]>

format.sh

8b9bfc1

Added the option to override guided_decoding_backend on a per-request…

ddb1a8b

… basis

Modifying outlines logits processor to return the tokenizer to its no…

dc6c919

…rmal state after using it

Ruff fixes

bee7147

yapf fixes

87bb454

Test fixes

88a200a

Ruff fixes

1baedd3

yapf

2d35e61

isort

041a16b

Merge branch 'main' into lmfe-generation

3e5eaa0

Doc change for CI retrigger

105c3d0

noamgat mentioned this pull request Apr 12, 2024

Deploying a vLLM server with LMFE support noamgat/lm-format-enforcer#65

Closed

simon-mo mentioned this pull request Apr 14, 2024

[Bug]: sending request using response_format json twice breaks vLLM #4070

Open

Merge branch 'main' of github.com:vllm-project/vllm into lmfe-generation

79f060a

simon-mo enabled auto-merge (squash) April 16, 2024 05:03

simon-mo merged commit 0543476 into vllm-project:main Apr 16, 2024
46 checks passed

robertgshaw2-neuralmagic pushed a commit to neuralmagic/nm-vllm that referenced this pull request Apr 21, 2024

LM Format Enforcer Guided Decoding Support (vllm-project#3868)

723e328

Co-authored-by: Simon Mo <[email protected]>

z103cb pushed a commit to z103cb/opendatahub_vllm that referenced this pull request Apr 22, 2024

LM Format Enforcer Guided Decoding Support (vllm-project#3868)

b84a19f

Co-authored-by: Simon Mo <[email protected]>

AlpinDale mentioned this pull request Apr 25, 2024

feat: LM Format Enforcer support PygmalionAI/aphrodite-engine#428

Merged

robertgshaw2-neuralmagic pushed a commit to neuralmagic/nm-vllm that referenced this pull request Apr 26, 2024

LM Format Enforcer Guided Decoding Support (vllm-project#3868)

bc92515

Co-authored-by: Simon Mo <[email protected]>

simon-mo mentioned this pull request May 7, 2024

[Misc]: Throughput/Latency for guided_json with ~100% GPU cache utilization #3567

Open

dtrifiro mentioned this pull request May 15, 2024

bump ubi base image tag opendatahub-io/vllm#24

Merged

Temirulan pushed a commit to Temirulan/vllm-whisper that referenced this pull request Sep 6, 2024

LM Format Enforcer Guided Decoding Support (vllm-project#3868)

fa8564e

Co-authored-by: Simon Mo <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LM Format Enforcer Guided Decoding Support #3868

LM Format Enforcer Guided Decoding Support #3868

noamgat commented Apr 5, 2024 •

edited

Loading

noamgat commented Apr 5, 2024

simon-mo left a comment

noamgat commented Apr 6, 2024

jamestwhedbee commented Apr 8, 2024

jamestwhedbee commented Apr 9, 2024

noamgat commented Apr 10, 2024

noamgat commented Apr 12, 2024

LM Format Enforcer Guided Decoding Support #3868

LM Format Enforcer Guided Decoding Support #3868

Conversation

noamgat commented Apr 5, 2024 • edited Loading

noamgat commented Apr 5, 2024

simon-mo left a comment

Choose a reason for hiding this comment

noamgat commented Apr 6, 2024

jamestwhedbee commented Apr 8, 2024

jamestwhedbee commented Apr 9, 2024

noamgat commented Apr 10, 2024

noamgat commented Apr 12, 2024

noamgat commented Apr 5, 2024 •

edited

Loading