Skip to content

2.2.12 Backend: SGLang

av edited this page Sep 14, 2024 · 1 revision

Handle: sglang URL: http://localhost:34091

logo

PyPI PyPI - Downloads license issue resolution open issues

SGLang is a fast serving framework for large language models and vision language models.

Starting

# [Optional] Pre-pull the image
harbor pull sglang

Configuration

SGLang is similar to vLLM in the models it can run, so the configuration is similar.

# Quickly lookup some of the compatible quants
harbor hf find awq
harbor hf find gptq

# Download with HF CLI
harbor hf download bartowski/Meta-Llama-3.1-70B-Instruct-GGUF

# Set the model to run using HF specifier
harbor sglang model google/gemma-2-2b-it

# To run a gated model, ensure that you've
# also set your Huggingface API Token
harbor hf token <your-token>

You can specify additional args via harbor sglang args:

# See original CLI help for available options
harbor run sglang --help

# Set the extra arguments via "harbor args"
harbor sglang args --context-length 2048 --disable-cuda-graph
Clone this wiki locally