-
-
Notifications
You must be signed in to change notification settings - Fork 5.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug]: get the degree of the outlines FSM
compilation progress from vlllm0.5.0 engine (via a route)
#5436
Comments
cc @br3no do you have suggestion here? |
Maybe Guide is a part of FSM in outlines? Another question: FSM is compiled every time the engine starts and receives a guided json request. Could we precompile it before starting engine or before receiving a guided json request? |
There are still FSMs in Outlines underneath the Guide API. The FSMs are dependent on the tokenizer’s vocabulary and the guide rule. Therefore, precomputing them is not really possible, since you would need to know ahead of time, what guide rules the requests will contain. Outlines uses a file system cache to avoid recomputing FSMs across different runs. Are you running vLLM on Docker? |
I found the same issue, and it seems that the output is completely messed up due this. I fallacked to vllm==0.4.3 and now everything works as expected |
outlines FSM
compilation progress from vlllm0.5.0 engine (via a route)
@m0g1cian can you give more details about what you mean with “output is completely messed up”? We have just updated to the latest Outlines version, so there should be no problems in generation quality, only improvements due to bug fixes in the latest Outlines version. |
I am not sure if the problem comes from vLLM or SGLang. I am going to make a minimum reproducable demo to find this out. However, I guess it might be SGLang's issue after all. |
@syGOAT it would not be economical to create an extra API for checking the state of compilation of a particular guide. As I said, the guides are dependent on the request parameters. As a pragmatic solution to your use-case it might help to add a warm-up routine to your system, by sending requests to vLLM with the guides you expect to get. That way, once the requests arrive, they will be served swiftly. If you don't know a priori what guides to expect, there is unfortunately nothing to do to improve things using the Outlines engine at the moment. You can try using lm-format-enforcer instead. AFAIK lm-format-enforcer keeps two data-structures; one for the tokenizer vocabulary and one for the guide. Chances are it will be faster than Outlines compiling short-lived guides. This will not work with CFG guides though, as they are not supported by lm-format-enforcer. |
It's really a good idea! Thanks for your suggestion. |
@br3no I posted a request with one json schema. Then I shut down vllm engine. And I restarted the engine and posted a request with the same json schema. But FSM was compiled again. Why the outlines cache didn't work? |
@syGOAT i have same issue, how to use cache when I restarted the engine and posted a request with the same json schema |
@syGOAT i try outline.get_cache, but it not work |
@ericperfect me 2 |
When I use --engine-use-ray with v0.5.0.post1 I get the "Compiling FSM index" message all the time, and the inference performance gets really slow. Without --engine-use-ray is a bit better, but still getting Compiling FSM index from time to time, not just at startup. this is with --engine-use-ray argument (it never stops): INFO: 127.0.0.1:39166 - "POST /v1/chat/completions HTTP/1.1" 200 OK |
I’ll try to have a look in the coming days! I have been pretty busy the last weeks. |
…with the same json schema. But FSM was compiled again, outlines cache didn't work
…with the same json schema. But FSM was compiled again, outlines cache didn't work
…with the same json schema. But FSM was compiled again, outlines cache didn't work
…with the same json schema. But FSM was compiled again, outlines cache didn't work
I have fixed the issue #6203 |
…-project#6203) Signed-off-by: Alvant <[email protected]>
Your current environment
🐛 Describe the bug
According to https://github.com/vllm-project/vllm/releases/tag/v0.5.0, vllm update Outlines Integration from FSM to Guide.
But when I used this command to start the engine:
and posted a guided json request, I found the output below from the vllm server:
Why did vllm compile FSM? Wasn't it replaced by Guide?
The text was updated successfully, but these errors were encountered: