Explore integration options with Ollama and other backends #6

mcharytoniuk · 2024-06-05T01:02:36Z

llama.cpp exposes the /health endpoint, which makes it easy to deal with slots. What about other similar solutions?

The text was updated successfully, but these errors were encountered:

aiseei · 2024-08-19T23:13:40Z

hi @mcharytoniuk - thanks for this interesting project ! we use a combination of llama-cpp -server and ollama - both running on dockers and have implemented our ow python based proxy/LB. looking to move to something specialist like paddler. Can we do this today with paddler?

mcharytoniuk · 2024-08-20T10:58:56Z

@aiseei Thank you for reaching out!

You can absolutely use Paddler with your llama.cpp setup in production. Personally, I am using it with Auto Scaling groups with llama.cpp.

When it comes to Ollama, not at the moment.

The issue is that Ollama potentially starts and manages multiple llamas.cpp servers internally on its own and does not expose some llama.cpp internal endpoints (like /health: ollama/ollama#1378), and statuses; currently, it does not allow hooking into some llama.cpp APIs that Paddler requires to function.

I might try to get it to work for just OpenAPI-like endpoints if there is some interest in having Ollama integration, though. However, that would have some limitations compared to balancing based on slots (slots allow us to predict how many requests a server can handle at most, so that allows predictable buffering). Do you think that would be ok for your use case?

mcharytoniuk · 2024-08-21T08:55:55Z

@aiseei I think I have a few ideas on how to handle the issue. I will add Ollama, and other OpenaAI-style APIs support to paddler. See also: #18

aiseei · 2024-09-05T09:18:05Z

@mcharytoniuk hi - sorry for the late reply. Yes , supporting the OPENA AI API style would work. Btw came across this issue tiday ollama/ollama#6492 might be relevant as u support ollama.

mcharytoniuk · 2024-09-05T13:11:10Z

@mcharytoniuk hi - sorry for the late reply. Yes , supporting the OPENA AI API style would work. Btw came across this issue tiday ollama/ollama#6492 might be relevant as u support ollama.

Bringing issues and news like that help me with maintaining the package, it is easier for me to follow what is relevant int the ecosystem. Thank you!

mcharytoniuk added the good first issue Good for newcomers label Jun 5, 2024

mcharytoniuk closed this as completed Jul 4, 2024

mcharytoniuk mentioned this issue Aug 21, 2024

Support OpenAI-style endpoints #18

Closed

mcharytoniuk reopened this Aug 21, 2024

mcharytoniuk added the enhancement New feature or request label Aug 21, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Explore integration options with Ollama and other backends #6

Explore integration options with Ollama and other backends #6

mcharytoniuk commented Jun 5, 2024

aiseei commented Aug 19, 2024

mcharytoniuk commented Aug 20, 2024 •

edited

Loading

mcharytoniuk commented Aug 21, 2024

aiseei commented Sep 5, 2024

mcharytoniuk commented Sep 5, 2024

Explore integration options with Ollama and other backends #6

Explore integration options with Ollama and other backends #6

Comments

mcharytoniuk commented Jun 5, 2024

aiseei commented Aug 19, 2024

mcharytoniuk commented Aug 20, 2024 • edited Loading

mcharytoniuk commented Aug 21, 2024

aiseei commented Sep 5, 2024

mcharytoniuk commented Sep 5, 2024

mcharytoniuk commented Aug 20, 2024 •

edited

Loading