Skip to content

Latest commit

 

History

History
98 lines (63 loc) · 4.3 KB

File metadata and controls

98 lines (63 loc) · 4.3 KB

Instructions for Self Hosting a LLM

The following steps describe how you can setup an LLM locally using containerization techniques. We will be using a Whisper model to perform speech translation task i.e translating non-English speech to English text.

Pre-Requisites

  • Podman/Docker installed
  • Streamlit installed

If you are using an Apple MacBook M-series laptop, you will probably need to do the following configurations:

  • brew tap cfergeau/crc
  • brew install vfkit
  • export CONTAINERS_MACHINE_PROVIDER=applehv
  • Edit your /Users/<your username>/.config/containers/containers.conf file to include:
[machine]
provider = "applehv"
  • Ensure you have enough resources on your Podman machine. Recommended to have atleast CPU: 8, Memory: 10 GB

Step 1: Download Model

First, we need to download the Whisper model from HuggingFace. There are various Whisper models available which vary in size and can be found here. We will be using the small model which is about 466 MB.

To download the model in the models folder of this repository:

cd ../models
wget --no-config --quiet --show-progress -O ggml-small.bin <Download URL>
cd ../

Step 2: Containerization

Next, we will containerize this application by creating a container image within which the application will run. To do this, we have created a Containerfile which:

  • Clones the Whisper repo
  • Runs the whisper.cpp service (service to serve the Whisper model locally written in C++)

You can see the entire Containerfile defined here.

To build this Containerfile, you will need to run (from this directory):

podman build -t whisper:image .

This will generate an image for your Containerfile which is nothing but an executable file to run all the code.

You should now be able to see this image created when you run podman images.

Step 3: Download audio files

Download few sample audio files (WAV, MP3, MP4, FLAC or MPEG4 formats) on your laptop which you can use for testing the model. We have also uploaded few sample audio files to this repository which are available for use here.

Step 4: Deploy Model Service

Now, let's deploy the entire application and volume mount the model binary file to the image we have created from the previous step. We are mounting the ggml-small.bin model as downloaded from above and running the image.

To do this you can run:

podman run --rm -it \
        -p 8001:8001 \
        -v /local/path/to/whisper-self-hosted-llm/models/ggml-small.bin:/models/ggml-small.bin:Z,ro \
        -e HOST=0.0.0.0 \
        -e PORT=8001 \
        whisper:image

Make sure to provide the full path to the location where you have downloaded the model binary file (from Step 1) when providing the -v argument in the above command.

When completed successfully, you will see the Whisper model running as a service at http://0.0.0.0:8001.

Step 5: Run Streamlit application

In order to interact with the application, we have created a simple Streamlit UI. Streamlit is an open-source Python framework for data scientists and AI/ML engineers to deliver interactive data apps – in only a few lines of code. Make sure you have streamlit installed on your machine.

We have a Python file defined here to build a simple Streamlit UI for this application. To launch the streamlit application, run the following:

streamlit run streamlit/whisper_client.py

This will open the streamlit UI as a localhost URL in your web browser.

Congrats! You have now successfully setup a LLM locally 🥳

Now, try uploading your audio files and see if the model is able to translate it to English as text like so: