-
-
Notifications
You must be signed in to change notification settings - Fork 5.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Error in benchmark model with vllm backend #3230
Comments
@hahmad2008 Hello! It seems that the none of the request goes through as you can tell from Are you benchmarking on the vllm API server instead of OpenAI server? |
@ywang96 Thanks for your response. I run vllm service
seems the request are sent but the benchmark script doesn't wait the response. |
Could you clarify on It's possible that there's additional layer of output processing on top of the actual vllm container running inside this service, so the benchmark script doesn't know how to process the output coming out from this service. Also I would suggest using the openai server if possible since the |
it define a vllm class to serve LLM. I can access the service while it is running on this link 'http://0.0.0:8000/generate?prompt=REQUESTED_PROMPT' |
My suggestion is adding a line vllm/benchmarks/backend_request_func.py Line 107 in a33ce60
It would also help if you can add vllm/benchmarks/backend_request_func.py Line 120 in a33ce60
|
@ywang96 I print the response itself before this vllm/benchmarks/backend_request_func.py Line 106 in a33ce60
I got:
|
@ywang96
|
|
this work for me and I get a response
but this way, it doesn't work:
|
Yep - so your service indeed behaves very differently from the vanilla vllm server and does some input parsing from the API URL itself, which is an anti-pattern imo, so this should not be supported by the benchmark script we have in the first place. I'd suggest you take a look at #2992 is a good example on how to add your own request function. |
@ywang96
and I got a response for this
|
Does your service support streaming at all? This seems to me that the server does not support streaming, but we do need it for calculate TTFT (time to first token). You can look at the request function for deepspeed-mii (that doesnt support streaming either, so TTFT is always set to 0) vllm/benchmarks/backend_request_func.py Lines 175 to 215 in a33ce60
|
This issue has been automatically marked as stale because it has not had any activity within 90 days. It will be automatically closed if no further activity occurs within 30 days. Leave a comment if you feel this issue should remain open. Thank you! |
This issue has been automatically closed due to inactivity. Please feel free to reopen if you feel it is still relevant. Thank you! |
I couldn't benchmark my model, seems the benchmark send requests without wait for the response, so the following error is raised:
I also run the service and I can access it thought http://0.0.0.0:8000
but I got this error:
Do I miss anything here?
The text was updated successfully, but these errors were encountered: