HybridLLM

Mixture-of-Depths Across Edge and API Models.

The goal is to use entropy and varentropy to dynamically swap between different models on a token-by-token basis. When the model is confident, we can rely on the outputs from a smaller model running locally, when entropy is high and confidence is low, we can switch into using a larger model (presumably running behind an API) for the following tokens until confidence is high and we can consider switching back to the smaller, local model.

It is inspired by Entropix Entropy Based Sampling.

This is a research project and a work in progress. It is absolutely not optimized for running in production and will likely not have much to show for itself in terms of latency gains. It is meant to be a starting point and a proof of concept.

Getting Started

Install requirements

pip install -r requirements.txt

CD into the hybrid_llm directory

Running through the CLI

python main.py

Running with the GUI

Running the API model

python serve_cloud_model.py

Running the main server

python server.py

Running the GUI

CD into the gui directory

npm run dev

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
gui		gui
hybrid_llm		hybrid_llm
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

HybridLLM

Getting Started

Running through the CLI

Running with the GUI

Running the GUI

About

Releases

Packages

Languages

ricardo-agz/hybrid-llm

Folders and files

Latest commit

History

Repository files navigation

HybridLLM

Getting Started

Running through the CLI

Running with the GUI

Running the GUI

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages