Running LLM Locally: Ollama

An Overview of Running LLMs locally using Ollama

(Image credit: Google DeepMind, Unsplash.com.)

Tip

1. What is Ollama?

Ollama is a powerful open-source software that allows you to run large language models (LLMs) locally on your hardware. This means you can enjoy the capabilities of advanced AI models without relying on cloud-based services.

2. Key features of Ollama

Ollama allows you to fully control and customize your language model setup, supports various models, runs efficiently on different hardware, and is easy to use for everyone.

Ollama can help you with creative writing, translating languages, and answering questions by using large language models.

3. Benefits of Using Ollama for Local LLMs

Keeping your data private is easier when you run a language model on your computer, as it prevents sensitive information from being sent to outside servers.
You can access the model anytime, even without the internet, which helps maintain your workflow whether in the office or on the go.
Using a local model can save money since you avoid subscription fees for cloud services, and it also provides quicker responses since everything is processed on your device.

4. Ollama Models Library

Ollama's library provides a large group of versions of different LLM families for different purposes. We will mention a few of them. Please feel free to explore more in detail.

LLM Model	Model Version	Developer
llama	llama3.2	Meta AI
Gemini/gemma	gemma2	Google DeepMind
qwen	qwen2.5	Alibaba Group
phi	phi3.5	Microsoft Research
nemotron	nemotron-mini	NVIDIA
mistral	mistral	Mistral AI
mixtral	mixtral	Mistral AI
deepseek	deepseek-coder-v2	DeepSeek
llava	llava	llava
openchat	openchat	OpenChat

4. Installing Ollama

There are at least 2 ways to install Ollama.

One is to download it from the Ollama site. The site will recognize your operating system MacOS, Windows, or Linux, plus showing the minimal requirements for OS versions.
The second method is to do a manual installation for Linux systems (HPC, CyVerse, other).

We will do a manual installation to show the process, that works for Linux and MacOS.

1. MacOS or Windows: Download and use the automatic installers.

2. Linux system: We will follow the manual installation notes. The first thing is to download the compressed tar binary files. The curl command needs to be installed.

We will create a special directory in your HOME to work with ollama.

mkdir ~/Ollama
cd ~/Ollama
curl -L https://ollama.com/download/ollama-linux-amd64.tgz -o ollama-linux-amd64.tgz

then extract the compressed tar file

tar -C ~/Ollama -xzf ollama-linux-amd64.tgz

All the programs will be installed in the ~/Ollama directory.

5. Running Ollama from a terminal (CLI - Command Line Interface)

Tip

When working in HPC, it is useful to take advantage of the tmux command for CLI management.

A bare minimum tmux commands

Command	Description
`tmux new -s s1 -n l1`	Creates a new terminal session `s1` with label `l1`
`Ctrl + b "`	splits the current `tmux` terminal in two
`Ctrl + b down-arrow`	navigate to next window down
`Ctrl + b up-arrow`	navigate to next window up
`tmux kill-session -t s1`	Terminates the `tmux` session

Once you have installed ollama, you need to initialize the server by running (run in the background adding a trailing &)

~/Ollama/bin/ollama serve &

Then in a different terminal window, run ollama with some of the available models, in this case, we will use llama3.2:

~/Ollama/bin/ollama run llama3.2

and once it is running, you will note a change in the prompt, indicating that the LLM is ready for input.

You can enter your question at the ollama prompt or use the separator """ to begin and close a multi-line message.

Ollama commands

ollama comes with a set of options

Command	Description
serve	Start `ollama`
create	Create a model from a Modelfile
show	Show information for a model
run	Run a model
stop	Stop a running model
pull	Pull a model from a registry
push	Push a model to a registry
list	List models
ps	List running models
cp	Copy a model
rm	Remove a model
help	Help about any command

Flags: -h, --help help for ollama -v, --version Show version information

Help commands

Once ollama is running, the program has a set of available commands

Command	Description
/?, /help	The help command
/set	Set session variables
/show	Show model information
/load	Load a session or model
/save	Save your current session
/clear	Clear session context
/bye	Exit
/? shortcuts	Help for keyboard shortcuts

Customizing a model

You can customize any LLM to suit your purposes

Create a file named Modelfile with a FROM instruction with the local file path to the model you want to import.

FROM llama3.2

# set the temperature to 1 [higher is more creative, lower is more coherent]
PARAMETER temperature 1

# set the system message
SYSTEM """
You are Mario from Super Mario Bros. Answer as Mario, the assistant, only.
"""

Then create the model in Ollama

ollama create mario -f Modelfile

Run the model

ollama run mario

Tip

You can read more about on how to customize your LLM prompts

Managing and cleaning downloaded LLM models

Every time you run a new LLM model with ollama, you download a model image, which can eat your disk space.

To list which LLM models you have downloaded, you can enter

ollama list

To remove unwanted LLM models you can enter

ollama rm <model name>

Installing a LLM GUI based on Langchain and Gradio.

There has been a collection of examples where you can install Ollama and add a GUI to interact with the model. You can check for example this ollama-chatbot-gradio.

Let's try this. We can work from the ~/Ollama directory.

Download 2 files requirements.txt and gradio_app_v1.py. Install the requirements.txt using pip3.

pip3 install requirements.txt

It will install all the required packages to your local system (Langchain, Gradio).

Next edit the Python code file gradio_app_v1.py, defining which model you want to use. Here we will use ollama3.2.

ollama pull llama3.2

Execute the Python script code

python3 gradio_app_v1.py

A new web browser tab will open, and then you interact with the LLM through the new graphic interface. To stop the interface, hit Ctrl-C in the terminal window where you launched your Web GUI.

6. References

Created: 09/20/2024 (C. Lizárraga); Last update: 09/25/2024 (C. Lizárraga)

CC BY-NC-SA

UArizona DataLab, Data Science Institute, University of Arizona, 2024.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly