Skip to content

Entropy-Based Routing Between Models on a Token-by-Token Basis

Notifications You must be signed in to change notification settings

ricardo-agz/hybrid-llm

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

HybridLLM

Mixture-of-Depths Across Edge and API Models.

The goal is to use entropy and varentropy to dynamically swap between different models on a token-by-token basis. When the model is confident, we can rely on the outputs from a smaller model running locally, when entropy is high and confidence is low, we can switch into using a larger model (presumably running behind an API) for the following tokens until confidence is high and we can consider switching back to the smaller, local model.

It is inspired by Entropix Entropy Based Sampling.

This is a research project and a work in progress. It is absolutely not optimized for running in production and will likely not have much to show for itself in terms of latency gains. It is meant to be a starting point and a proof of concept.

Getting Started

Install requirements

pip install -r requirements.txt

CD into the hybrid_llm directory

Running through the CLI

python main.py

Running with the GUI

Running the API model

python serve_cloud_model.py

Running the main server

python server.py

Running the GUI

CD into the gui directory

npm run dev

About

Entropy-Based Routing Between Models on a Token-by-Token Basis

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published