-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Explore integration options with Ollama and other backends #6
Comments
hi @mcharytoniuk - thanks for this interesting project ! we use a combination of llama-cpp -server and ollama - both running on dockers and have implemented our ow python based proxy/LB. looking to move to something specialist like paddler. Can we do this today with paddler? |
@aiseei Thank you for reaching out! You can absolutely use Paddler with your llama.cpp setup in production. Personally, I am using it with Auto Scaling groups with llama.cpp. When it comes to Ollama, not at the moment. The issue is that Ollama potentially starts and manages multiple llamas.cpp servers internally on its own and does not expose some llama.cpp internal endpoints (like I might try to get it to work for just OpenAPI-like endpoints if there is some interest in having Ollama integration, though. However, that would have some limitations compared to balancing based on slots (slots allow us to predict how many requests a server can handle at most, so that allows predictable buffering). Do you think that would be ok for your use case? |
@mcharytoniuk hi - sorry for the late reply. Yes , supporting the OPENA AI API style would work. Btw came across this issue tiday ollama/ollama#6492 might be relevant as u support ollama. |
Bringing issues and news like that help me with maintaining the package, it is easier for me to follow what is relevant int the ecosystem. Thank you! |
llama.cpp exposes the
/health
endpoint, which makes it easy to deal with slots. What about other similar solutions?The text was updated successfully, but these errors were encountered: