How to Run a Local LLM with Ubuntu

Last Update: Jun 24, 2024

I wrote a book! Check out A Quick Guide to Coding with AI.
Become a super programmer!
Learn how to use Generative AI coding tools as a force multiplier for your career.

If you want to run your own large language model like ChatGPT, you’re in luck. There are tons of well-rounded, easy software packages for this. Ollama is one of my favorites by far.

Video for this tutorial:

In this tutorial, we will set up Ollama with a WebUI on your Ubuntu Machine. This is a great way to run your own LLM for learning and experimenting, and it's private—all running on your own machine.

This is an updated version of this article I wrote last year on setting up an Ubuntu machine.

I’ve included some mistakes here and figured it out so you don’t have to make the same ones. Let’s jump in!

My Ubuntu System

Here’s the system I’m starting with. I have a fresh, updated Ubuntu 24.04 LTS. There isn’t much installed on it yet, so I can cover the dependencies you’ll probably need.

“How to install a local LLM in Ubuntu

I have an NVidia card in this machine, which helps tremendously but also adds complexity, so we’ll cover installing with the Nvidia card.

Check Your Drivers

If you have an Nvidia card, you must install the drivers and have it working for Ollama to utilize it.

You can check with

nvidia-smi -a

nvidia smi

to verify functionality.

“How to install a local LLM in Ubuntu

Once it’s installed, you’re good to go!

Getting Ollama

The tool we will work with today to run large language models on our machines is Ollama. It’s such a great product. It has the rare combination of being easy to install and use while being very powerful at the same time.

So we head to Ollama.com

Go to download

and for Linux, you’ll get a script:

curl -fsSL https://ollama.com/install.sh | sh

Some people get nervous about remote shell script execution. If you choose to, you can wget this script, open it up, and check it out to see if there’s anything you don’t like.

wget https://ollama.com/install.sh

“How to install a local LLM in Ubuntu

I did this the first time, and there’s nothing weird here to worry about.

Once you run the script, it should look something like this:

“How to install a local LLM in Ubuntu

Notice it says “NVIDIA GPU installed.” You should see this if you have an Nvidia card that’s properly configured.

Getting Your First Model

Let’s find a large language model to play around with.

Back on the Ollama page, we’ll click on models.

“How to install a local LLM in Ubuntu

On this page, you can choose from a wide range of models if you want to experiment and play around.

Here’s the llama3 model which I’ve tried out recently, It’s really good.

On the model pages, you can see different models available with a dropdown:

“How to install a local LLM in Ubuntu

We can see a 70B, 8B, and instruct and text models with this model.

The 70B is the number of parameters. Generally, bigger is better, but it will take far more GPU power and memory. On a laptop, 70B is possible with some models, but it is very slow. 8B is a good, fast model with a smaller footprint (4.7GB vs 40GB).

Text Model: These models are more optimized for chat and having “conversations” with you.

Instruct Model: These models are fine-tuned to follow prompted instructions and are optimized for being asked to do something.

For this test, we will use the llama3:8b model. So I’ll run

ollama run llama3:8b

At the terminal. The first time you run this, you will need to download the model. It only does this once, and then it loads much faster.

“How to install a local LLM in Ubuntu

One it’s loaded up, you can send the obligatory:

why is the sky blue?

or whatever prompt you want. You’re ready to go! This is a good prompt interface for testing models.

“How to install a local LLM in Ubuntu

But what are some other ways you can utilize this tool? Let’s find out. To exit this interface, type in

/bye

And you can close it out or run another model. Let’s try something different.

Accessing the Ollama API with CURL

With Ollama running, you have an API available. Make a node of the model you downloaded, in my case, it was the llama3:8b model. You can access it with CURL. We’ll create a simple command that calls the local webserver, and generates a request. You need the model, the prompt, and choose whether to “stream” it or not.

Should I stream?

If stream is set to TRUE, it will give you answers one token (word) at a time. You may have seen this behavior with LLMs on the web where each word comes out individually. When stream is FALSE, it returns the whole answer at once. It’s much easier when developing to deal with a single object rather than potentially thousands. But the choice is up to you. I’m choosing to not stream the answers here for simplicity.

curl http://localhost:11434/api/generate -d '
{ 
 "model": "llama3:8b", 
 "prompt": "Why is the sky blue?", 
 "stream": false 
}'

So I send this curl command and quickly get some JSON output.

“How to install a local LLM in Ubuntu

You can, of course, write the output to a text file or read it some other way. But there are also plenty of libraries for implementing it into software.

Accessing the Ollama API with Python

Accessing Ollama with Python is incredibly easy, and you’ll love it.

Create a new Python environment:

python3 -m venv ollamatest
source ollamatest/bin/activate

Then install the Ollama library

pip install ollama

Then, create simple Python Script like this:

import ollama
response = ollama.chat(model='llama3', messages=[
 {
 'role': 'user',
 'content': 'Why is the sky blue?',
 },
])
print(response['message']['content'])

And run it!

“How to install a local LLM in Ubuntu

A surprisingly low amount of code is required to get things done. Check out the Ollama API Documentation for more.

But what if you want a web interface? I covered this in my last article on running LLMs in Ubuntu. The web interface is better now, but it requires a bit more preparation work. Don’t worry; I’ll cover it.

Creating a Web Interface for Ollama

In my previous article with WSL, I showed how to set things up with the “Ollama Web UIIt has been rebranded to the.” Open WebUI. It now supports other things besides Ollama.

It’s far better but trickier to set up because it runs in a Docker container now. This is the better choice for something like this, even if it requires another 5 minutes to set things up.

If you look at the instructions on the OpenUI page, if you want to run OpenUI with Ollama built in, and Nvidia GPU support, they give you a docker command that instantly sets it up. But what if you’re on a new machine? What do you need to install for this?

Here are the steps:

1. Add Docker

sudo apt install docker.io

Add a group for Docker:

sudo groupadd docker

Add yourself to this group:

sudo usermod -aG docker ${USER}

Then log out and log back in.

2. Install the Nvidia Container Toolkit

You’ll need to install this so that applications in Docker containers can use your GPU. The full instructions are here, but here are the commands as of now to get this done.

curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg \
 && curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | \
 sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
 sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list

sed -i -e '/experimental/ s/^#//g' /etc/apt/sources.list.d/nvidia-container-toolkit.list

Update the packages list:

sudo apt-get update

then install the toolkit:

sudo apt-get install -y nvidia-container-toolkit

and configure it:

sudo nvidia-ctk runtime configure --runtime=docker

Now you can restart docker and you should be ready to go.

sudo systemctl restart docker

Awesome, if you have no errors you’re ready to go.

3. Install the Container

Now run this:

docker run -d -p 3000:8080 --gpus=all -v ollama:/root/.ollama -v open-webui:/app/backend/data --name open-webui --restart always ghcr.io/open-webui/open-webui:ollama

And it should be and running.

The Open Web UI

Once the Web UI loads up, you’ll need to create an account.

“How to install a local LLM in Ubuntu

As far as I know, it’s just a local account on the machine.

Here in the settings, you can download models from Ollama. Note that you can also put in an OpenAI key and use ChatGPT in this interface. It’s a powerful tool you should definitely check out.

“How to install a local LLM in Ubuntu

Now you have a nice chat interface!!

“How to install a local LLM in Ubuntu

Conclusion

Running your own local LLM is fun. You don’t have to worry about monthly fees; it’s totally private, and you can learn a lot about the process. Stay tuned to this blog, as I’ll do more stuff like this in the future.

Also, connect with me on LinkedIn. I’m often involved in fun discussions and share a lot of stuff there.

Published: May 27, 2024 by Jeremy Morgan. Contact me before republishing this content.

My Ubuntu System

Check Your Drivers

Getting Ollama

Getting Your First Model

Accessing the Ollama API with CURL

Should I stream?

Accessing the Ollama API with Python

Creating a Web Interface for Ollama

1. Add Docker

2. Install the Nvidia Container Toolkit

3. Install the Container

The Open Web UI

Conclusion

Stay up to date on the latest in Computer Vision and AI.Get notified when I post new articles!

Stay up to date on the latest in Computer Vision and AI.

Get notified when I post new articles!