sysext: add llamaedge recipe #103

hydai · 2024-11-28T09:20:47Z

Add LlamaEdge sysext

This PR creates a sysext for running LlamaEdge on Flatcar. It will allow users to deploy their own LLM on the cluster.

How to use

Run create_llamaedge_sysext.sh for building the .raw file.

Then, use the following config:

variant: flatcar
version: 1.0.0
storage:
  files:
    - path: /opt/extensions/wasmedge-0.14.1-x86-64.raw
      mode: 0420
      contents:
        source: https://github.com/flatcar/sysext-bakery/releases/download/latest/wasmaedge-0.14.1-x86-64.raw
    - path: /opt/extensions/llamaedge-0.14.16-x86-64.raw
      mode: 0420
      contents:
        source: https://github.com/flatcar/sysext-bakery/releases/download/latest/llamaedge-0.14.16-x86-64.raw
  links:
    - target: /opt/extensions/llamaedge-0.14.16-x86-64.raw
      path: /etc/extensions/llamaedge.raw
      hard: false
    - target: /opt/extensions/wasmedge-0.14.1-x86-64.raw
      path: /etc/extensions/wasmedge.raw
      hard: false

Testing done

I've verified the behavior on my Digital Ocean instance.

Configuration

Yaml

variant: flatcar
version: 1.0.0
storage:
  files:
    - path: /opt/extensions/wasmedge-0.14.1-x86-64.raw
      mode: 0420
      contents:
        source: https://github.com/second-state/flatcar-sysext-bakery/releases/download/0.0.3/wasmedge-0.14.1-x86-64.raw
    - path: /opt/extensions/llamaedge-0.14.16-x86-64.raw
      mode: 0420
      contents:
        source: https://github.com/second-state/flatcar-sysext-bakery/releases/download/0.0.3/llamaedge-0.14.16-x86-64.raw
  links:
    - target: /opt/extensions/llamaedge-0.14.16-x86-64.raw
      path: /etc/extensions/llamaedge.raw
      hard: false
    - target: /opt/extensions/wasmedge-0.14.1-x86-64.raw
      path: /etc/extensions/wasmedge.raw
      hard: false

JSON

{
   "ignition":{
      "version":"3.3.0"
   },
   "storage":{
      "files":[
         {
            "path":"/opt/extensions/wasmedge-0.14.1-x86-64.raw",
            "contents":{
               "source":"https://github.com/second-state/flatcar-sysext-bakery/releases/download/0.0.3/wasmedge-0.14.1-x86-64.raw"
            },
            "mode":272
         },
         {
            "path":"/opt/extensions/llamaedge-0.14.16-x86-64.raw",
            "contents":{
               "source":"https://github.com/second-state/flatcar-sysext-bakery/releases/download/0.0.3/llamaedge-0.14.16-x86-64.raw"
            },
            "mode":272
         }
      ],
      "links":[
         {
            "path":"/etc/extensions/llamaedge.raw",
            "hard":false,
            "target":"/opt/extensions/llamaedge-0.14.16-x86-64.raw"
         },
         {
            "path":"/etc/extensions/wasmedge.raw",
            "hard":false,
            "target":"/opt/extensions/wasmedge-0.14.1-x86-64.raw"
         }
      ]
   }
}

Prepare the model

Depending on the hardware used, I chose a smaller model due to the limitations of my Digital Ocean instance.

wget https://huggingface.co/second-state/Llama-3.2-1B-Instruct-GGUF/resolve/main/Llama-3.2-1B-Instruct-Q2_K.gguf

Start the server

The WASM is provided inside the sysext image. Please use the following path, /usr/lib/wasmedge/wasm/llama-api-server.wasm.

You can also reduce the CONTEXT_SIZE if running on a small memory instance.

MODEL_FILE="Llama-3.2-1B-Instruct-Q2_K.gguf"
API_SERVER_WASM="/usr/lib/wasmedge/wasm/llama-api-server.wasm"
PROMPT_TEMPLATE="llama-3-chat"
CONTEXT_SIZE=128
MODEL_NAME="llama-3.2-1B"

wasmedge \
  --dir .:. \
  --nn-preload default:GGML:AUTO:${MODEL_FILE} \
  ${API_SERVER_WASM} \
  --prompt-template ${PROMPT_TEMPLATE} \
  --ctx-size ${CONTEXT_SIZE} \
  --model-name ${MODEL_NAME}

It will start to load the model into memory and start the OpenAI compatible API server.

The expected output should be:

..omitted..
[2024-11-28 09:09:08.909] [info] [WASI-NN] GGML backend: llama_system_info: AVX = 1 | AVX_VNNI = 0 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | AVX512_BF16 = 0 | AMX_INT8 = 0 | FMA = 1 | NEON = 0 | SVE = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | RISCV_VECT = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | MATMUL_INT8 = 0 | LLAMAFILE = 1 |
[2024-11-28 09:09:08.920] [info] llama_core in crates/llama-core/src/lib.rs:128: running mode: chat
[2024-11-28 09:09:08.923] [info] llama_core in crates/llama-core/src/lib.rs:140: The core context has been initialized
[2024-11-28 09:09:08.923] [info] llama_core in crates/llama-core/src/lib.rs:230: Getting the plugin info
[2024-11-28 09:09:08.923] [info] llama_core in crates/llama-core/src/lib.rs:418: Get the running mode.
[2024-11-28 09:09:08.923] [info] llama_core in crates/llama-core/src/lib.rs:443: running mode: chat
[2024-11-28 09:09:08.923] [info] llama_core in crates/llama-core/src/lib.rs:312: Getting the plugin info by the graph named llama-3.2-1B
[2024-11-28 09:09:08.923] [info] llama_core::utils in crates/llama-core/src/utils.rs:175: Get the output buffer generated by the model named llama-3.2-1B
[2024-11-28 09:09:08.924] [info] llama_core::utils in crates/llama-core/src/utils.rs:193: Output buffer size: 95
[2024-11-28 09:09:08.924] [info] llama_core in crates/llama-core/src/lib.rs:372: Plugin info: b4067(commit 54ef9cfc)
[2024-11-28 09:09:08.924] [info] llama_api_server in llama-api-server/src/main.rs:459: plugin_ggml_version: b4067 (commit 54ef9cfc)
[2024-11-28 09:09:08.930] [info] llama_api_server in llama-api-server/src/main.rs:504: Listening on 0.0.0.0:8080

Interact with the API server

Please check the llamaedge document for more option details: https://github.com/LlamaEdge/LlamaEdge/tree/main/llama-api-server

Get model list

curl -X GET http://localhost:8080/v1/models -H 'accept:application/json'

Expected output:

{
   "object":"list",
   "data":[
      {
         "id":"llama-3.2-1B",
         "created":1732784948,
         "object":"model",
         "owned_by":"Not specified"
      }
   ]
}

Chat completion

curl -X POST http://localhost:8080/v1/chat/completions \
    -H 'accept:application/json' \
    -H 'Content-Type: application/json' \
    -d '{"messages":[{"role":"system", "content": "You are a helpful assistant. Reply in short sentence"}, {"role":"user", "content": "What is the capital of Japan?"}], "model":"llama-3.2-1B"}'

Expected output:

{
   "id":"chatcmpl-cdf8f57f-70ec-4cb3-b1f3-e60054f64981",
   "object":"chat.completion",
   "created":1732785197,
   "model":"llama-3.2-1B",
   "choices":[
      {
         "index":0,
         "message":{
            "content":"The capital of Japan is Tokyo.",
            "role":"assistant"
         },
         "finish_reason":"stop",
         "logprobs":null
      }
   ],
   "usage":{
      "prompt_tokens":33,
      "completion_tokens":9,
      "total_tokens":42
   }
}

create_llamaedge_sysext.sh

Signed-off-by: hydai <[email protected]>

tormath1

Thanks for this contribution, that's exciting to see this running on Flatcar :)

tormath1 reviewed Nov 29, 2024

View reviewed changes

create_llamaedge_sysext.sh Show resolved Hide resolved

sysext: add llamaedge recipe

8952fca

Signed-off-by: hydai <[email protected]>

hydai force-pushed the add_llamaedge branch from dd934fc to 8952fca Compare November 29, 2024 12:07

tormath1 approved these changes Dec 3, 2024

View reviewed changes

tormath1 merged commit dd38a27 into flatcar:main Dec 3, 2024

hydai deleted the add_llamaedge branch December 3, 2024 14:56

github-actions bot mentioned this pull request Dec 22, 2024

Monthly contributions report 2024-11-22 - 2024-12-21 flatcar/Flatcar#1603

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sysext: add llamaedge recipe #103

sysext: add llamaedge recipe #103

hydai commented Nov 28, 2024 •

edited

Loading

tormath1 left a comment

sysext: add llamaedge recipe #103

sysext: add llamaedge recipe #103

Conversation

hydai commented Nov 28, 2024 • edited Loading

Add LlamaEdge sysext

How to use

Testing done

Configuration

Yaml

JSON

Prepare the model

Start the server

Interact with the API server

Get model list

Chat completion

tormath1 left a comment

Choose a reason for hiding this comment

hydai commented Nov 28, 2024 •

edited

Loading