-
Notifications
You must be signed in to change notification settings - Fork 3
Running LLM Locally: Ollama
(Image credit: Google DeepMind, Unsplash.com.)
Tip
Please see this slide presentation
Ollama is a powerful open-source software that allows you to run large language models (LLMs) locally on your hardware. This means you can enjoy the capabilities of advanced AI models without relying on cloud-based services.
Ollama allows you to fully control and customize your language model setup, supports various models, runs efficiently on different hardware, and is easy to use for everyone.
Ollama can help you with creative writing, translating languages, and answering questions by using large language models.
- Keeping your data private is easier when you run a language model on your computer, as it prevents sensitive information from being sent to outside servers.
- You can access the model anytime, even without the internet, which helps maintain your workflow whether in the office or on the go.
- Using a local model can save money since you avoid subscription fees for cloud services, and it also provides quicker responses since everything is processed on your device.
Ollama's library provides a large group of versions of different LLM families for different purposes. We will mention a few of them. Please feel free to explore more in detail.
LLM Model | Model Version | Developer |
---|---|---|
llama | llama3.2 | Meta AI |
Gemini/gemma | gemma2 | Google DeepMind |
qwen | qwen2.5 | Alibaba Group |
phi | phi3.5 | Microsoft Research |
nemotron | nemotron-mini | NVIDIA |
mistral | mistral | Mistral AI |
mixtral | mixtral | Mistral AI |
deepseek | deepseek-coder-v2 | DeepSeek |
llava | llava | llava |
openchat | openchat | OpenChat |
There are at least 2 ways to install Ollama.
- One is to download it from the Ollama site. The site will recognize your operating system MacOS, Windows, or Linux, plus showing the minimal requirements for OS versions.
- The second method is to do a manual installation for Linux systems (HPC, CyVerse, other).
We will do a manual installation to show the process, that works for Linux and MacOS.
1. MacOS or Windows: Download and use the automatic installers.
2. Linux system: We will follow the manual installation notes. The first thing is to download the compressed tar binary files. The curl command needs to be installed.
We will create a special directory in your HOME to work with ollama
.
mkdir ~/Ollama
cd ~/Ollama
curl -L https://ollama.com/download/ollama-linux-amd64.tgz -o ollama-linux-amd64.tgz
then extract the compressed tar file
tar -C ~/Ollama -xzf ollama-linux-amd64.tgz
All the programs will be installed in the ~/Ollama
directory.
Tip
When working in HPC, it is useful to take advantage of the tmux command for CLI management.
A bare minimum tmux
commands
Command | Description |
---|---|
tmux new -s s1 -n l1 |
Creates a new terminal session s1 with label l1
|
Ctrl + b " |
splits the current tmux terminal in two |
Ctrl + b down-arrow |
navigate to next window down |
Ctrl + b up-arrow |
navigate to next window up |
tmux kill-session -t s1 |
Terminates the tmux session |
Once you have installed ollama
, you need to initialize the server by running (run in the background adding a trailing &
)
~/Ollama/bin/ollama serve &
Then in a different terminal window, run ollama
with some of the available models, in this case, we will use llama3.2:
~/Ollama/bin/ollama run llama3.2
and once it is running, you will note a change in the prompt, indicating that the LLM is ready for input.
You can enter your question at the ollama
prompt or use the separator """
to begin and close a multi-line message.
ollama
comes with a set of options
Command | Description |
---|---|
serve | Start ollama
|
create | Create a model from a Modelfile |
show | Show information for a model |
run | Run a model |
stop | Stop a running model |
pull | Pull a model from a registry |
push | Push a model to a registry |
list | List models |
ps | List running models |
cp | Copy a model |
rm | Remove a model |
help | Help about any command |
Flags:
-h
, --help
help for ollama
-v
, --version
Show version information
Once ollama
is running, the program has a set of available commands
Command | Description |
---|---|
/?, /help | The help command |
/set | Set session variables |
/show | Show model information |
/load | Load a session or model |
/save | Save your current session |
/clear | Clear session context |
/bye | Exit |
/? shortcuts | Help for keyboard shortcuts |
You can customize any LLM to suit your purposes
- Create a file named
Modelfile
with a FROM instruction with the local file path to the model you want to import.
FROM llama3.2
# set the temperature to 1 [higher is more creative, lower is more coherent]
PARAMETER temperature 1
# set the system message
SYSTEM """
You are Mario from Super Mario Bros. Answer as Mario, the assistant, only.
"""
- Then create the model in Ollama
ollama create mario -f Modelfile
- Run the model
ollama run mario
Tip
You can read more about on how to customize your LLM prompts
Every time you run a new LLM model with ollama
, you download a model image, which can eat your disk space.
To list which LLM models you have downloaded, you can enter
ollama list
To remove unwanted LLM models you can enter
ollama rm <model name>
There has been a collection of examples where you can install Ollama and add a GUI to interact with the model. You can check for example this ollama-chatbot-gradio.
Let's try this. We can work from the ~/Ollama
directory.
Download 2 files requirements.txt
and gradio_app_v1.py
. Install the requirements.txt
using pip3
.
pip3 install requirements.txt
It will install all the required packages to your local system (Langchain, Gradio).
Next edit the Python code file gradio_app_v1.py
, defining which model you want to use. Here we will use ollama3.2
.
ollama pull llama3.2
Execute the Python script code
python3 gradio_app_v1.py
A new web browser tab will open, and then you interact with the LLM through the new graphic interface. To stop the interface, hit Ctrl-C
in the terminal window where you launched your Web GUI.
Created: 09/20/2024 (C. Lizárraga); Last update: 09/25/2024 (C. Lizárraga)
UArizona DataLab, Data Science Institute, University of Arizona, 2024.
UArizona DataLab, Data Science Institute, University of Arizona, 2024.