-
-
Notifications
You must be signed in to change notification settings - Fork 4.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature]: Integrate with lm-format-enforcer #3713
Comments
I think I'll be able to execute this integration rather quickly, if we agree on the way the user chooses which decoding backend to use. Are you OK with the flag that you suggested (guided-decoding-backend)? |
Yes the flag sounds natural to me. A more complicate change here will be while outlines fsm is compiling, use lmformatenforcer. But just using flags should be fine as a first step. |
I've started working on it, this commit shows the way information propagates. |
very excited for this |
@noamgat engine args is the right place to put it. Do we need it in model config still? |
The reason I also put it in ModelConfig and not just in EngineArgs is because In the meantime I'm continuing to the actual LMFE integration. |
Submitted a pull request! |
Note: I ended up removing the argument from ModelConfig and added a new DecodingConfig class |
Updated the pull request with a per-request decoding backend param, to make testing easier (both for unit tests and for people evaluating the different options) |
Can we have this support for local serving using AsyncLLMEngine. My scenario/use-case is as follows: I am using Mixtral 8x7b on g5.12xlarge ec2 instance and serving it locally using Python. I want my model to generate strict json schema using guided_json feature and also use the AsyncLLMEngine for faster responses. Can someone tell me how can I do that? This is what I do:
And then for serving:
` |
The code you are using is not 'serving', but rather 'script-based inference'. |
🚀 The feature, motivation and pitch
While existing Outline state machine provide great state of the art performance, it is trading off a one-off compile time when working with the schema. For endpoint products running model as a service with customers supplying many different schemas, the cost might not be acceptable. In that case, we should integrate with lm-format-enforcer from @noamgat.
We already have an existing logits processor interface and guided decoding tested. It should be quite straightforward to add it integration for it. In the end it should be some flag choosing
--guided-decoding-backend=...
.Alternatives
No response
Additional context
No response
The text was updated successfully, but these errors were encountered: