Examples/kubernetes dev with model downloading functionality #7

OmegAshEnr01n · 2024-07-21T11:59:35Z

Hi,

I have built the helm chart according to the template you had provided earlier. I think this can still be improved in some ways. Any comments are welcome.

Feature set for the Helm chart

High availability
Multi models
Support of embeddings and completions models
Load balancing
Auto scaling
CUDA support
Downloading functionality
Redownload on upgrade hook. (Currently the models are downloaded only on the first deployment, there is no redownload functionality on upgrade if required)

Pending testing

Load balancing
multi GPU support using MiG for kubernetes docs & microk8s

…testing.

phymbert

Thanks for the effort, this is a good start. We need to bring it to the original repo. Let's merge it then we can discuss there

phymbert · 2024-07-22T16:13:03Z

examples/kubernetes/llamacpp/charts/embedding/values.yaml

+
+livenessProbe:
+  httpGet:
+    path: /


We have health endpoints

You mean you want to remove this?

No the path must be /health

phymbert · 2024-07-22T16:13:38Z

examples/kubernetes/llamacpp/charts/modelRunner/Chart.yaml

+name: modelRunner
+description: A Helm chart for Kubernetes
+
+# A chart can be either an 'application' or a 'library' chart.


Can be deleted

phymbert · 2024-07-22T16:14:14Z

examples/kubernetes/llamacpp/charts/modelRunner/templates/PersistentVolume.yaml

+
+---
+
+{{- end}}


Mind that each file must end with an empty line

phymbert · 2024-07-22T16:18:13Z

examples/kubernetes/llamacpp/charts/embedding/templates/deployment.yaml

+          - -c
+          - |
+            set -e
+            if curl -L {{ $modelConfig.url }} --output /models/{{ $modelName }}/{{ $modelName }}.gguf; then


It will not support sharded model files. Better to let llama.cpp server handles the initial download

Ok but then we wont be able to have a job running it. This will prevent us from updating it using kubectl apply. Also i dont believe llama server supports autodownload? I know Ollama does. When llamacpp server container tries to start it needs a model file to point to or else it errors out.

No I developed that feature some time ago, see the doc.

phymbert · 2024-07-22T16:22:21Z

Maybe it would be easier if I push the base branch to the original repo ?

OmegAshEnr01n · 2024-07-22T19:53:39Z

Yes, Ideally we merge here first and once finalized we can push example/kubernetes to the main repo.

ceddybi · 2024-07-26T22:56:47Z

@phymbert @OmegAshEnr01n Awesome work you've done here, small question, when this chart is deployed are the models's api compatible with Open IA api, like the way together ai works, where i just change the OPENAI_API_KEY and OPENAI_BASE_URL (https://api.together.xyz/v1)

OmegAshEnr01n · 2024-08-06T02:49:24Z

Hi @ceddybi,

Please check the server API docs from llama.cpp.

POST /v1/chat/completions: OpenAI-compatible Chat Completions API. Given a ChatML-formatted json description in messages, it returns the predicted completion. Both synchronous and streaming mode are supported, so scripted and interactive applications work fine. While no strong claims of compatibility with OpenAI API spec is being made, in our experience it suffices to support many apps. Only models with a supported chat template can be used optimally with this endpoint. By default, the ChatML template will be used.

lee-b · 2024-09-20T14:36:18Z

Is it necessary to limit to MiG here? llama.cpp supports pre-ampere GPUs, so it would be nice to use more standard multi-GPU container techniques.

Shobhit added 2 commits July 21, 2024 19:52

Added demo chart. Version is functional on single GPU system pending …

8029c40

…testing.

Updated readme with feature set

0579fbe

phymbert approved these changes Jul 22, 2024

View reviewed changes

phymbert mentioned this pull request Jul 22, 2024

kubernetes example ggerganov/llama.cpp#6546

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Examples/kubernetes dev with model downloading functionality #7

Examples/kubernetes dev with model downloading functionality #7

OmegAshEnr01n commented Jul 21, 2024 •

edited

Loading

phymbert left a comment

phymbert Jul 22, 2024

OmegAshEnr01n Jul 22, 2024

phymbert Jul 24, 2024

phymbert Jul 22, 2024

phymbert Jul 22, 2024

phymbert Jul 22, 2024

OmegAshEnr01n Jul 22, 2024

phymbert Jul 24, 2024

phymbert commented Jul 22, 2024

OmegAshEnr01n commented Jul 22, 2024

ceddybi commented Jul 26, 2024

OmegAshEnr01n commented Aug 6, 2024

lee-b commented Sep 20, 2024


		---

		{{- end}}

Examples/kubernetes dev with model downloading functionality #7

Are you sure you want to change the base?

Examples/kubernetes dev with model downloading functionality #7

Conversation

OmegAshEnr01n commented Jul 21, 2024 • edited Loading

Feature set for the Helm chart

Pending testing

phymbert left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

phymbert commented Jul 22, 2024

OmegAshEnr01n commented Jul 22, 2024

ceddybi commented Jul 26, 2024

OmegAshEnr01n commented Aug 6, 2024

lee-b commented Sep 20, 2024

OmegAshEnr01n commented Jul 21, 2024 •

edited

Loading