-
Notifications
You must be signed in to change notification settings - Fork 974
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add support for batched inference #818
Comments
Also, looks like easy integration here is potentially contingent on this issue: ggerganov/llama.cpp#3478 |
See #771 – it's one of the pinned issues in this repo. |
@abetlen How to run batch inference of T5 model with llama-cpp-python? As per docs, if i run following code,
This error - |
Did you find any solutions? @yugaljain1999 |
@SylvainVerdy Not yet, so I started using ctranslate2 for batch inference. |
as llama.cpp supports batched inference in 8.26, can llama-cpp-python supports for batched inference? I just tried the latest version 0.2.11 and found that it is not possible to make multiple requests simultaneously.
The text was updated successfully, but these errors were encountered: