-
-
Notifications
You must be signed in to change notification settings - Fork 5.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature]: Online Inference on local model with OpenAI Python SDK #8631
Comments
Seems to me here are some bots with suspect links 👀 |
cc @wuisawesome @pooyadavoodi since you two have worked on batch API |
Ooc can you say more about your docker setup? Would it unblock you to mount the directory with data into your docker container? Fwiw, the reason I didn't implement this API originally was that I couldn't think of a way to implement the job management without either introducing foot-guns or the first stateful endpoint. This is not the first time we've heard this request though, and it is probably worth thinking more about if it becomes a recurring theme. |
Hey, Okay, I see I don't know exactly how to implement it, but I think it would improve the usability of vllm in general. |
Totally agree that this is a interesting feature. Would be super nice to have something standardized here (mentioned something similar here). Our current approach is as follows:
I can share some WIP code on how to parse the JSONL files with Ray, loading it as |
This issue has been automatically marked as stale because it has not had any activity within 90 days. It will be automatically closed if no further activity occurs within 30 days. Leave a comment if you feel this issue should remain open. Thank you! |
boop |
@wuisawesome @pooyadavoodi I'm also interested in seeing this feature implemented! Having an online OpenAI Batch API endpoint on top of vLLM would be super cool. Is there any WIP PR to check out? |
🚀 The feature, motivation and pitch
OpenAI recently provided a new endpoint batch inference (https://platform.openai.com/docs/guides/batch/overview?lang=curl). It would be nice if it works using the batch format from OpenAI but with a local model.
I created an usage Issue for that before (#8567)
Something like that:
At the moment there will be an error:
NotFoundError: Error code: 404 - {'detail': 'Not Found'}
Advantages for the implementation:
Alternatives
Internal Implementation:
There was a feature implemented using
python -m vllm.entrypoints.openai_batch
as described here (#4777), but that is not compatible with the OpenAI SDK and also not compatible with the docker setup.Additional context
No response
Before submitting a new issue...
The text was updated successfully, but these errors were encountered: