LLaMa.cpp Gemma Web-UI

This project uses llama.cpp to load model from a local file, delivering fast and memory-efficient inference.
The project is currently designed for Google Gemma, and will support more models in the future.

Deployment

Prerequisites

Download Gemma model from Google repository (https://huggingface.co/google/gemma-2b-it).
Quantize the Gemma model (highly recommended if target machine has limited memory).

Installation

Download Gemma model from Google repository.
Edit the model-path config.yaml, this should point to the actual model path.
Start the web-ui by command:
```
screen -S "webui" bash ./start-ui.sh
```