Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Add LlamaEdge sysext
This PR creates a sysext for running LlamaEdge on Flatcar. It will allow users to deploy their own LLM on the cluster.
How to use
Run
create_llamaedge_sysext.sh
for building the.raw
file.Then, use the following config:
Testing done
I've verified the behavior on my Digital Ocean instance.
Configuration
Yaml
JSON
Prepare the model
Depending on the hardware used, I chose a smaller model due to the limitations of my Digital Ocean instance.
Start the server
The WASM is provided inside the sysext image. Please use the following path,
/usr/lib/wasmedge/wasm/llama-api-server.wasm
.You can also reduce the
CONTEXT_SIZE
if running on a small memory instance.It will start to load the model into memory and start the OpenAI compatible API server.
The expected output should be:
Interact with the API server
Please check the llamaedge document for more option details: https://github.com/LlamaEdge/LlamaEdge/tree/main/llama-api-server
Get model list
curl -X GET http://localhost:8080/v1/models -H 'accept:application/json'
Expected output:
Chat completion
Expected output: