-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Examples/kubernetes dev with model downloading functionality #7
base: example/kubernetes
Are you sure you want to change the base?
Examples/kubernetes dev with model downloading functionality #7
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the effort, this is a good start. We need to bring it to the original repo. Let's merge it then we can discuss there
|
||
livenessProbe: | ||
httpGet: | ||
path: / |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have health endpoints
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You mean you want to remove this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No the path must be /health
name: modelRunner | ||
description: A Helm chart for Kubernetes | ||
|
||
# A chart can be either an 'application' or a 'library' chart. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can be deleted
|
||
--- | ||
|
||
{{- end}} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mind that each file must end with an empty line
- -c | ||
- | | ||
set -e | ||
if curl -L {{ $modelConfig.url }} --output /models/{{ $modelName }}/{{ $modelName }}.gguf; then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It will not support sharded model files. Better to let llama.cpp server handles the initial download
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok but then we wont be able to have a job running it. This will prevent us from updating it using kubectl apply. Also i dont believe llama server supports autodownload? I know Ollama does. When llamacpp server container tries to start it needs a model file to point to or else it errors out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No I developed that feature some time ago, see the doc.
Maybe it would be easier if I push the base branch to the original repo ? |
Yes, Ideally we merge here first and once finalized we can push |
@phymbert @OmegAshEnr01n Awesome work you've done here, small question, when this chart is deployed are the models's api compatible with Open IA api, like the way together ai works, where i just change the OPENAI_API_KEY and OPENAI_BASE_URL (https://api.together.xyz/v1) |
Hi @ceddybi, Please check the server API docs from llama.cpp.
|
Is it necessary to limit to MiG here? llama.cpp supports pre-ampere GPUs, so it would be nice to use more standard multi-GPU container techniques. |
Hi,
I have built the helm chart according to the template you had provided earlier. I think this can still be improved in some ways. Any comments are welcome.
Feature set for the Helm chart
Pending testing