Rust LLM Serving Framework

Features

Paged Attention
Continuous Batch
Quantization
- awq
- squeezellm
Models
- llama
- gemma
- chatglm

Getting Started

Examples

$ cargo run --release --example llm_engine_example -- --model <llma model dir> --gpu-memory-utilization 0.95 --block-size 8 --max-model-len 1024

API Server

$ cargo build --release
$ ./target/release/entrypoints --model <llma model dir> --gpu-memory-utilization 0.95 --block-size 8 --max-model-len 1024 --host 0.0.0.0 --port 8000

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Features

Getting Started

Files

README.md

Latest commit

History

README.md

File metadata and controls

Features

Getting Started