This file is focused on the current stable version of PyTorch. There is another variation of these instructions for the development / nightly version(s) here : https://github.com/nktice/AMD-AI/blob/main/dev.md
2023-07 - I have composed this collection of instructions as they are my notes. I use this setup my own Linux system with AMD parts. I've gone over these doing many re-installs to get them all right. This is what I had hoped to find when I had search for install instructions - so I'm sharing them in the hopes that they save time for other people. There may be in here extra parts that aren't needed but this works for me. Originally text, with comments like a shell script that I cut and paste.
2023-09-09 - I had a report that this doesn't work in virtual machines (virtualbox) as the system there cannot see the hardware, it can't load drivers, etc. While this is not a guide about Windows, Windows users may find it more helpful to try DirectML - https://rocm.docs.amd.com/en/latest/deploy/windows/quick_start.html / https://github.com/lshqqytiger/stable-diffusion-webui-directml
[ ... updates abridged ... ]
2024-07-24 - PyTorch has updated with 2.4 now stable and referring to ROCm 6.1, so there's updates here to reflect those changes.
2024-08-04 - ROCm 6.2 is out, including support for the current version of Ubuntu (24.04 / Noble) so this revision includes changes to emphasize use of the new version. Previous stable has been set aside here - https://github.com/nktice/AMD-AI/blob/main/ROCm-6.1.3-Stable.md - Note I'm getting errors with the 2nd GPU with the new ROCm, bug report is filed, here is a link to that thread so you can follow : ROCm/ROCm#3518
2024-09-11 - Ubuntu 24.04 has been transitioned to 24.04.1... with that they introduced Linux Kernel 6.8.0-44 Generic, it turns out this kernel is incompatible with amdgpu-dkms . I had done the normal (daily) sudo apt update -y && sudo apt upgrade -y
and got errors about amdgpu-dkms not installing, and then in the next reboot Ubuntu wouldn't start (black screen at boot). So beware of this upgrade, as things are disasterously broken at the present time. Bug report here : ROCm/ROCm#3701 As a solution, I've worked with the current dev version of Ubuntu 24.10 - please see notes below about it.
2024-09-12 - As a work-around given Ubuntu's issues with kernel updates, I've been attempting to use the new dev version 24.10 ( with an old kernel ). The notes below are my work-in-progress for that at the present time. Stable Diffusion, and ComfyUI function, but not TGW - where Oobabooga has loaders throwing errors, I've yet to resolve. I need a break now, so I am posting these here with this added note. Errors are noted, until I find solutions to get them resolved.
ROCm 6.2 includes support for Ubuntu 24.10 (Oracular).
At this point we assume you've done the system install and you know what that is, have a user, root, etc.
2024-09-12 Prevent an update to a new kernel ( as the new kernels have broken with AMD's drivers, leading to a black screen at boot ) until that resolves it is prudent to prevent automatic kernel upgrades.
sudo apt-mark hold linux-image-6.8.0-31-generic
I am assuming that is enough to prevent automatic upgrades... it is however possible that the other parts of the kernel may need holding too... those could be found with apt-cache search 6.8.0-31
and holding those other packages too.
# update system packages
sudo apt update -y && sudo apt upgrade -y
#turn on devel and sources.
sudo apt-add-repository -y -s -s
sudo apt install -y "linux-headers-$(uname -r)" \
"linux-modules-extra-$(uname -r)"
This allows calls to older versions of Python by using "deadsnakes"
sudo add-apt-repository ppa:deadsnakes/ppa -y
sudo apt update -y
If you are using Ubuntu 24.10 and the above has issues becase deadsnakes doesn't support oracular, here is a command to have it point to noble instead... this is what I needed to get it working on 24.10.
sudo sed -i "s@oracular@noble@g" /etc/apt/sources.list.d/deadsnakes-ubuntu-ppa-oracular.sources
Make the directory if it doesn't exist yet. This location is recommended by the distribution maintainers.
sudo mkdir --parents --mode=0755 /etc/apt/keyrings
Download the key, convert the signing-key to a full Keyring required by apt and store in the keyring directory
wget https://repo.radeon.com/rocm/rocm.gpg.key -O - | \
gpg --dearmor | sudo tee /etc/apt/keyrings/rocm.gpg > /dev/null
amdgpu repository
echo 'deb [arch=amd64 signed-by=/etc/apt/keyrings/rocm.gpg] https://repo.radeon.com/amdgpu/6.2/ubuntu noble main' \
| sudo tee /etc/apt/sources.list.d/amdgpu.list
sudo apt update -y
AMDGPU DKMS
sudo apt install -y amdgpu-dkms
https://rocmdocs.amd.com/en/latest/deploy/linux/os-native/install.html
echo "deb [arch=amd64 signed-by=/etc/apt/keyrings/rocm.gpg] https://repo.radeon.com/rocm/apt/6.2 noble main" \
| sudo tee --append /etc/apt/sources.list.d/rocm.list
echo -e 'Package: *\nPin: release o=repo.radeon.com\nPin-Priority: 600' \
| sudo tee /etc/apt/preferences.d/rocm-pin-600
sudo apt update -y
This is lots of stuff, but comparatively small so worth including, as some stuff later may want as dependencies without much notice.
# ROCm...
sudo apt install -y rocm-dev rocm-libs rocm-hip-sdk rocm-libs
# ld.so.conf update
sudo tee --append /etc/ld.so.conf.d/rocm.conf <<EOF
/opt/rocm/lib
/opt/rocm/lib64
EOF
sudo ldconfig
# update path
echo "PATH=/opt/rocm/bin:/opt/rocm/opencl/bin:$PATH" >> ~/.profile
sudo /opt/rocm/bin/rocminfo | grep gfx
My 6900 reported as gfx1030, and my 7900 XTX show up as gfx1100
Of course note to change the user name to match your user.
sudo adduser `whoami` video
sudo adduser `whoami` render
# git and git-lfs (large file support
sudo apt install -y git git-lfs
# development tool may be required later...
sudo apt install -y libstdc++-12-dev
# stable diffusion likes TCMalloc...
sudo apt install -y libtcmalloc-minimal4
This section is optional, and as such has been moved to performance-tuning
nvtop Note : I have had issues with the distro version crashes with 2 GPUs, installing new version from sources works fine. Instructions for that are included at the bottom, as they depend on things installed between here and there. Project website : https://github.com/Syllo/nvtop
sudo apt install -y nvtop
sudo apt install -y radeontop rovclock
sudo reboot
This system is built to use its own venv ( rather than Conda )...
https://github.com/AUTOMATIC1111/stable-diffusion-webui Get the files...
cd
git clone https://github.com/AUTOMATIC1111/stable-diffusion-webui.git
cd stable-diffusion-webui
The 1.9.x+ release series breaks the API so that it won't work with Oobabooga's TGW - so the following resets to use the 1.8.0 relaase that does work with Oobabooga.
2024-07-04 - Oobabooga 1.9 resolves this issue - these lines are remarked out for now, but preserved in case someone wants to see how to do something similar in the future...
# git checkout bef51ae
# git reset --hard
sudo apt install -y wget git python3.10 python3.10-venv libgl1
python3.10 -m venv venv
source venv/bin/activate
python3.10 -m pip install -U pip
deactivate
tee --append webui-user.sh <<EOF
# specify compatible python version
python_cmd="python3.10"
## Torch for ROCm
# workaround for ROCm + Torch > 2.4.x - https://github.com/comfyanonymous/ComfyUI/issues/3698
export TORCH_BLAS_PREFER_HIPBLASLT=0
# generic import...
# export TORCH_COMMAND="pip install torch torchvision --index-url https://download.pytorch.org/whl/nightly/rocm6.2"
# use specific versions to avoid downloading all the nightlies... ( update dates as needed )
export TORCH_COMMAND="pip install --pre torch torchvision --index-url https://download.pytorch.org/whl/nightly/rocm6.2"
## And if you want to call this from other programs...
export COMMANDLINE_ARGS="--api"
## crashes with 2 cards, so to get it to run on the second card (only), unremark the following
# export CUDA_VISIBLE_DEVICES="1"
EOF
If you don't do this, it will install a default to get you going. Note that these start files do include things that it needs you'll want to copy into the folder where you have other models ( to avoid issues )
#mv models models.1
#ln -s /path/to/models models
Note that the first time it starts it may take it a while to go and get things it's not always good about saying what it's up to.
./webui.sh
- variation of https://raw.githubusercontent.com/ltdrdata/ComfyUI-Manager/main/scripts/install-comfyui-venv-linux.sh Includes ComfyUI-Manager
Same install of packages here as for Stable Diffusion ( included here in case you're not installed SD and just want ComfyUI... )
sudo apt install -y wget git python3 python3-venv libgl1
cd
git clone https://github.com/comfyanonymous/ComfyUI
cd ComfyUI/custom_nodes
git clone https://github.com/ltdrdata/ComfyUI-Manager
cd ..
python3 -m venv venv
source venv/bin/activate
# pre-install torch and torchvision from nightlies - note you may want to update versions...
python3 -m pip install --pre torch torchvision --index-url https://download.pytorch.org/whl/nightly/rocm6.2
python3 -m pip install -r requirements.txt --extra-index-url https://download.pytorch.org/whl/nightly/rocm6.2
# the last time I ran this, it installed a bunch of nvidia stuff and then complained I didn't have one of their cards -
# so I needed to go back and re-install the previously installed versions of torch and torchvision to get it working
python3 -m pip install --pre torch torchvision -U --index-url https://download.pytorch.org/whl/nightly/rocm6.2
python3 -m pip install -r custom_nodes/ComfyUI-Manager/requirements.txt --extra-index-url https://download.pytorch.org/whl/rocm6.1
# end vend if needed...
deactivate
Scripts for running the program...
# run_gpu.sh
tee --append run_gpu.sh <<EOF
#!/bin/bash
source venv/bin/activate
python3 main.py --preview-method auto
EOF
chmod +x run_gpu.sh
#run_cpu.sh
tee --append run_cpu.sh <<EOF
#!/bin/bash
source venv/bin/activate
python3 main.py --preview-method auto --cpu
EOF
chmod +x run_cpu.sh
Update the config file to point to Stable Diffusion (presuming it's installed...)
# config file - connecto stable-diffusion-webui
cp extra_model_paths.yaml.example extra_model_paths.yaml
sed -i "s@path/to@`echo ~`@g" extra_model_paths.yaml
# edit config file to point to your checkpoints etc
#vi extra_model_paths.yaml
Project Website : https://github.com/oobabooga/text-generation-webui.git
First we'll need Conda ... Required for pytorch... Conda provides virtual environments for python, so that programs with different dependencies can have different environments. Here is more info on managing conda : https://docs.conda.io/projects/conda/en/latest/user-guide/getting-started.html# Other notes : https://docs.conda.io/projects/conda/en/latest/user-guide/install/linux.html Download info : https://www.anaconda.com/download/
Anaconda ( if you prefer this to miniconda below )
#cd ~/Downloads/
#wget https://repo.anaconda.com/archive/Anaconda3-2023.09-0-Linux-x86_64.sh
#bash Anaconda3-2023.09-0-Linux-x86_64.sh -b
#cd ~
#ln -s anaconda3 conda
Miniconda ( if you prefer this to Anaconda above... ) [ https://docs.conda.io/projects/miniconda/en/latest/ ]
cd ~/Downloads/
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
bash Miniconda3-latest-Linux-x86_64.sh -b
cd ~
ln -s miniconda3 conda
echo "PATH=~/conda/bin:$PATH" >> ~/.profile
source ~/.profile
conda update -y -n base -c defaults conda
conda install -y cmake ninja
conda init
source ~/.profile
sudo apt install -y pip
pip3 install --upgrade pip
## show outdated packages...
#pip list --outdated
## check dependencies
#pip check
## install specified bersion
#pip install <packagename>==<version>
conda create -n textgen python=3.11 -y
conda activate textgen
# pre-install
pip install --pre cmake colorama filelock lit numpy Pillow Jinja2 \
mpmath fsspec MarkupSafe certifi filelock networkx \
sympy packaging requests \
--index-url https://download.pytorch.org/whl/nightly/rocm6.2
There's version conflicts, so we specify versions that we want installed -
#pip install --pre torch torchvision torchtext torchaudio triton pytorch-triton-rocm \
#pip install --pre torch==2.3.1+rocm6.0 torchvision==0.18.1+rocm6.0 torchaudio==2.3.1 triton pytorch-triton-rocm \
# --index-url https://download.pytorch.org/whl/rocm6.0
#pip install --pre torch==2.4.0+rocm6.1 torchvision==0.19.0+rocm6.1 torchaudio==2.4.0 triton pytorch-triton-rocm \
# --index-url https://download.pytorch.org/whl/rocm6.1
pip install --pre torch torchvision torchaudio triton pytorch-triton-rocm \
--index-url https://download.pytorch.org/whl/nightly/rocm6.2
2024-05-12 For some odd reason, torchtext isn't recognized, even though it's there... so we specify it using it's URL to be explicit.
pip install https://download.pytorch.org/whl/cpu/torchtext-0.18.0%2Bcpu-cp311-cp311-linux_x86_64.whl#sha256=c760e672265cd6f3e4a7c8d4a78afe9e9617deacda926a743479ee0418d4207d
2024-04-24 - AMD's own ROCm version of bitsandbytes has been updated! - https://github.com/ROCm/bitsandbytes ( ver 0.44.0.dev0 at time of writing )
cd
git clone https://github.com/ROCm/bitsandbytes.git
cd bitsandbytes
pip install .
cd
git clone https://github.com/oobabooga/text-generation-webui
cd text-generation-webui
2024-07-26 Oobabooga release 1.12 changed how requirements are done, including calls that refer to old versions of PyTorch which didn't work for me... So the usual command here is remarked out, and I have instead offered a replacement requirements.txt with minimal includes, that combined with what else is here gets it up and running ( for me ), using more recent versions of packages.
#pip install -r requirements_amd.txt
tee --append requirements_amdai.txt <<EOF
# alternate simplified requirements from https://github.com/nktice/AMD-AI
accelerate>=0.32
colorama
datasets
einops
gradio>=4.26
hqq>=0.1.7.post3
jinja2>=3.1.4
lm_eval>=0.3.0
markdown
numba>=0.59
numpy>=1.26
optimum>=1.17
pandas
peft>=0.8
Pillow>=9.5.0
psutil
pyyaml
requests
rich
safetensors>=0.4
scipy
sentencepiece
tensorboard
transformers>=4.43
tqdm
wandb
# API
SpeechRecognition>=3.10.0
flask_cloudflared>=0.0.14
sse-starlette>=1.6.5
tiktoken
EOF
pip install -r requirements_amdai.txt --extra-index-url https://download.pytorch.org/whl/nightly/rocm6.2
git clone https://github.com/turboderp/exllamav2 repositories/exllamav2
cd repositories/exllamav2
## Force collection back to base 0.0.11
## git reset --hard a4ecea6
pip install -r requirements.txt --extra-index-url https://download.pytorch.org/whl/nightly/rocm6.2
pip install . --extra-index-url https://download.pytorch.org/whl/nightly/rocm6.2
cd ../..
2024-06-18 - Llama-cpp-python - Another loader, that is highly efficient in resource use, but not very fast. https://github.com/abetlen/llama-cpp-python It may need models in GGUF format ( and not other types ).
## remove old versions
pip uninstall llama_cpp_python -y
pip uninstall llama_cpp_python_cuda -y
## install llama-cpp-python
git clone --recurse-submodules https://github.com/abetlen/llama-cpp-python.git repositories/llama-cpp-python
cd repositories/llama-cpp-python
CC='/opt/rocm/llvm/bin/clang' CXX='/opt/rocm/llvm/bin/clang++' CFLAGS='-fPIC' CXXFLAGS='-fPIC' CMAKE_PREFIX_PATH='/opt/rocm' ROCM_PATH="/opt/rocm" HIP_PATH="/opt/rocm" CMAKE_ARGS="-GNinja -DLLAMA_HIPBLAS=ON -DLLAMA_AVX2=on " pip install --no-cache-dir .
cd ../..
Models : If you're new to this - new models can be downloaded from the shell via a python script, or from a form in the interface. There are lots of them - http://huggingface.co Generally the GPTQ models by TheBloke are likely to load... https://huggingface.co/TheBloke The 30B/33B models will load on 24GB of VRAM, but may error, or run out of memory depending on usage and parameters. Worthy of mention, TurboDerp ( author of the exllama loaders ) has been posting exllamav2 ( exl2 ) processed versions of models - https://huggingface.co/turboderp ( for use with exllamav2 loader ) - when downloading, note the --branch option.
To get new models note the ~/text-generation-webui directory has a program " download-model.py " that is made for downloading models from HuggingFace's collection.
If you have old models, link pre-stored models into the models
# cd ~/text-generation-webui
# mv models models.1
# ln -s /path/to/models models
Let's create a script (run.sh) to run the program...
tee --append run.sh <<EOF
#!/bin/bash
## activate conda
conda activate textgen
## command to run server...
python server.py --extensions sd_api_pictures send_pictures gallery
# if you want the server to listen on the local network so other machines can access it, add --listen.
#python server.py --listen --extensions sd_api_pictures send_pictures gallery
conda deactivate
EOF
chmod u+x run.sh
Note that to run the script :
source run.sh
2024-09-13 - Notes... unfortunately the above at present is giving me errors when I try to load models.
Exllamav2 error :
19:51:48-149210 ERROR Failed to load the model.
Traceback (most recent call last):
File "/home/n/text-generation-webui/modules/ui_model_menu.py", line 231, in load_model_wrapper
shared.model, shared.tokenizer = load_model(selected_model, loader)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/n/text-generation-webui/modules/models.py", line 93, in load_model
output = load_func_map[loader](model_name)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/n/text-generation-webui/modules/models.py", line 312, in ExLlamav2_loader
from modules.exllamav2 import Exllamav2Model
File "/home/n/text-generation-webui/modules/exllamav2.py", line 5, in <module>
from exllamav2 import (
File "/home/n/miniconda3/envs/textgen/lib/python3.11/site-packages/exllamav2/__init__.py", line 3, in <module>
from exllamav2.model import ExLlamaV2
File "/home/n/miniconda3/envs/textgen/lib/python3.11/site-packages/exllamav2/model.py", line 35, in <module>
from exllamav2.config import ExLlamaV2Config
File "/home/n/miniconda3/envs/textgen/lib/python3.11/site-packages/exllamav2/config.py", line 5, in <module>
from exllamav2.fasttensors import STFile, cleanup_stfiles
File "/home/n/miniconda3/envs/textgen/lib/python3.11/site-packages/exllamav2/fasttensors.py", line 6, in <module>
from exllamav2.ext import exllamav2_ext as ext_c
File "/home/n/miniconda3/envs/textgen/lib/python3.11/site-packages/exllamav2/ext.py", line 114, in <module>
raise e
File "/home/n/miniconda3/envs/textgen/lib/python3.11/site-packages/exllamav2/ext.py", line 106, in <module>
import exllamav2_ext
ImportError: /home/n/miniconda3/envs/textgen/lib/python3.11/site-packages/exllamav2_ext.cpython-311-x86_64-linux-gnu.so: undefined symbol: __cxa_call_terminate
Llama.cpp error:
Traceback (most recent call last):
File "/home/n/text-generation-webui/modules/ui_model_menu.py", line 231, in load_model_wrapper
shared.model, shared.tokenizer = load_model(selected_model, loader)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/n/text-generation-webui/modules/models.py", line 93, in load_model
output = load_func_map[loader](model_name)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/n/text-generation-webui/modules/models.py", line 275, in llamacpp_loader
model_file = sorted(Path(f'{shared.args.model_dir}/{model_name}').glob('*.gguf'))[0]
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^
IndexError: list index out of range
Here's an example, nvtop, sd console, tgw console... this screencap taken using ROCm 6.1.3 - under this config : https://github.com/nktice/AMD-AI/blob/main/ROCm-6.1.3-Dev.md
( As one from packages crashes on 2 GPUs, while this never version from sources works fine. ) project website : https://github.com/Syllo/nvtop optional - tool for displaying gpu / memory usage info The package for this crashes with 2 gpu's, here it is from source.
sudo apt install -y libdrm-dev libsystemd-dev libudev-dev
cd
git clone https://github.com/Syllo/nvtop.git
mkdir -p nvtop/build && cd nvtop/build
cmake .. -DNVIDIA_SUPPORT=OFF -DAMDGPU_SUPPORT=ON -DINTEL_SUPPORT=OFF
make
sudo make install