Skip to content

Running LLM Locally: Ollama

Carlos Lizarraga-Celaya edited this page Sep 26, 2024 · 10 revisions

An Overview of Running LLMs locally using Ollama

(Image credit: Google DeepMind, Unsplash.com.)



1. What is Ollama?

Ollama is a powerful open-source software that allows you to run large language models (LLMs) locally on your hardware. This means you can enjoy the capabilities of advanced AI models without relying on cloud-based services.

2. Key features of Ollama

Ollama allows you to fully control and customize your language model setup, supports various models, runs efficiently on different hardware, and is easy to use for everyone.

Ollama can help you with creative writing, translating languages, and answering questions by using large language models.

3. Benefits of Using Ollama for Local LLMs

  • Keeping your data private is easier when you run a language model on your computer, as it prevents sensitive information from being sent to outside servers.
  • You can access the model anytime, even without the internet, which helps maintain your workflow whether in the office or on the go.
  • Using a local model can save money since you avoid subscription fees for cloud services, and it also provides quicker responses since everything is processed on your device.

4. Ollama Models Library

Ollama's library provides a large group of versions of different LLM families for different purposes. We will mention a few of them. Please feel free to explore more in detail.

LLM Model Model Version Developer
llama llama3.2 Meta AI
Gemini/gemma gemma2 Google DeepMind
qwen qwen2.5 Alibaba Group
phi phi3.5 Microsoft Research
nemotron nemotron-mini NVIDIA
mistral mistral Mistral AI
mixtral mixtral Mistral AI
deepseek deepseek-coder-v2 DeepSeek
llava llava llava
openchat openchat OpenChat

4. Installing Ollama

There are at least 2 ways to install Ollama.

  • One is to download it from the Ollama site. The site will recognize your operating system MacOS, Windows, or Linux, plus showing the minimal requirements for OS versions.
  • The second method is to do a manual installation for Linux systems (HPC, CyVerse, other).

We will do a manual installation to show the process, that works for Linux and MacOS.

1. MacOS or Windows: Download and use the automatic installers.

2. Linux system: We will follow the manual installation notes. The first thing is to download the compressed tar binary files. The curl command needs to be installed.

We will create a special directory in your HOME to work with ollama.

mkdir ~/Ollama
cd ~/Ollama
curl -L https://ollama.com/download/ollama-linux-amd64.tgz -o ollama-linux-amd64.tgz

then extract the compressed tar file

tar -C ~/Ollama -xzf ollama-linux-amd64.tgz

All the programs will be installed in the ~/Ollama directory.

5. Running Ollama from a terminal (CLI - Command Line Interface)

Tip

When working in HPC, it is useful to take advantage of the tmux command for CLI management.

A bare minimum tmux commands

Command Description
tmux new -s s1 -n l1 Creates a new terminal session s1 with label l1
Ctrl + b " splits the current tmux terminal in two
Ctrl + b down-arrow navigate to next window down
Ctrl + b up-arrow navigate to next window up
tmux kill-session -t s1 Terminates the tmux session

Once you have installed ollama, you need to initialize the server by running (run in the background adding a trailing &)

~/Ollama/bin/ollama serve & 

Then in a different terminal window, run ollama with some of the available models, in this case, we will use llama3.2:

~/Ollama/bin/ollama run llama3.2

and once it is running, you will note a change in the prompt, indicating that the LLM is ready for input.

You can enter your question at the ollama prompt or use the separator """ to begin and close a multi-line message.

Ollama commands

ollama comes with a set of options

Command Description
serve Start ollama
create Create a model from a Modelfile
show Show information for a model
run Run a model
stop Stop a running model
pull Pull a model from a registry
push Push a model to a registry
list List models
ps List running models
cp Copy a model
rm Remove a model
help Help about any command

Flags: -h, --help help for ollama -v, --version Show version information

Help commands

Once ollama is running, the program has a set of available commands

Command Description
/?, /help The help command
/set Set session variables
/show Show model information
/load Load a session or model
/save Save your current session
/clear Clear session context
/bye Exit
/? shortcuts Help for keyboard shortcuts

Customizing a model

You can customize any LLM to suit your purposes

  1. Create a file named Modelfile with a FROM instruction with the local file path to the model you want to import.
FROM llama3.2

# set the temperature to 1 [higher is more creative, lower is more coherent]
PARAMETER temperature 1

# set the system message
SYSTEM """
You are Mario from Super Mario Bros. Answer as Mario, the assistant, only.
"""
  1. Then create the model in Ollama
ollama create mario -f Modelfile
  1. Run the model
ollama run mario

Tip

You can read more about on how to customize your LLM prompts

Managing and cleaning downloaded LLM models

Every time you run a new LLM model with ollama, you download a model image, which can eat your disk space.

To list which LLM models you have downloaded, you can enter

ollama list

To remove unwanted LLM models you can enter

ollama rm <model name>

Installing a LLM GUI based on Langchain and Gradio.

There has been a collection of examples where you can install Ollama and add a GUI to interact with the model. You can check for example this ollama-chatbot-gradio.

Let's try this. We can work from the ~/Ollama directory.

Download 2 files requirements.txt and gradio_app_v1.py. Install the requirements.txt using pip3.

pip3 install requirements.txt

It will install all the required packages to your local system (Langchain, Gradio).

Next edit the Python code file gradio_app_v1.py, defining which model you want to use. Here we will use ollama3.2.

ollama pull llama3.2

Execute the Python script code

python3 gradio_app_v1.py

A new web browser tab will open, and then you interact with the LLM through the new graphic interface. To stop the interface, hit Ctrl-C in the terminal window where you launched your Web GUI.

6. References


Created: 09/20/2024 (C. Lizárraga); Last update: 09/25/2024 (C. Lizárraga)

CC BY-NC-SA

UArizona DataLab, Data Science Institute, University of Arizona, 2024.