-
Notifications
You must be signed in to change notification settings - Fork 525
Running on AMD GPU
Following the great instructions from August and using the docker image, this runs on the 7900 XTX with a few changes, most notably
export HSA_OVERRIDE_GFX_VERSION=11.0.0 #7900 xtx natively works with the gfx1100 driver
make hip ROCM_TARGET=gfx1100
The rest of the steps are the same
Due to the great work of Odonata (Discord, github @edt-xx), the hardware of oceanmasterza (Discord), and the help of epicx (Discord, GitHub @bennmann), we have the below AMD instructions.
According the the author of the bitsandbytes ROCM port @arlo-phoenix, using a Docker image is recommended (both rocm/pytorch and rocm/pytorch-nightly should work). See port discussion here.
On host machine, run:
docker pull rocm/pytorch-nightly
sudo docker run -it --network=host --device=/dev/kfd --device=/dev/dri --group-add=video --ipc=host --cap-add=SYS_PTRACE --security-opt seccomp=unconfined rocm/pytorch-nightly
In the running image, run:
cd /home
export HSA_OVERRIDE_GFX_VERSION=10.3.0
# Install bitsandbytes with ROCM support
git clone https://github.com/arlo-phoenix/bitsandbytes-rocm-5.6.git bitsandbytes
cd bitsandbytes
make hip ROCM_TARGET=gfx1030
pip install pip --upgrade
pip install .
# Install Petals
cd ..
pip install --upgrade git+https://github.com/bigscience-workshop/petals
# Run server
python -m petals.cli.run_server petals-team/StableBeluga2 --port <an open port> --torch_dtype float16
Running the model in bfloat16
is also supported but slower than in float16
.
Multi-GPU process (--tensor_parallel_devices
) is still not tested (docker --gpu
flag may not function at this time and other virtualization tools may be necessary).
Contributed by: @edt-xx, @bennmann
Tested on:
- AMD 6600 XT tested July 24th, 2023 on Arch Linux with Rocm 5.6.0, mesa 22.1.4
- AMD 6900 XT tested April 18th, 2023 on bare metal Ubuntu 22.04 (no docker/anaconda/container). Tested with ROCM 5.4.2
- Untested on 7000 series, however 7000s may have much better performance as AMD added machine learning tensor library and better hardware support (vs ray tracing only on 6000 series)
Guide:
-
use the mesa-clover and mesa-rusticl opencl variants
-
add
export HSA_OVERRIDE_GFX_VERSION=10.3.0
to your environment (put it to/home/user/.bashrc
on ubuntu - this tricks ROCM to work on more consumer based cards like the 6000 series) -
install ROCM. Use this tutorial for Arch Linux: https://wiki.archlinux.org/title/GPGPU
-
create and activate a venv for petals using python 3.11
- python -m venv <yourvenvpath>
- cd <yourvenvpath>
- source bin/activate
-
in the venv install pytorch, nightly version, with the command generated on by the website: https://pytorch.org/get-started/locally/
-
install the Petals version with AMD GPU support:
pip install git+https://github.com/bigscience-workshop/petals@amd-gpus
This branch uses an older version of
bitsandbytes
patched to have AMD GPU support (developed by @brontoc and Titaniumtown). This means that you won't be able to use the 4-bit qunatization (--quant_type nf4
) and LoRA adapters (the--adapters
argument). The server will use 8-bit quantization (int8) for all models by default.Tip: You can set your fans to full speed or close to it before starting Petals (the default Linux fan profile for AMD GPUs is not good on some cards):
rocm-smi --setfan 99%
-
run petals using:
python -m petals.cli.run_server petals-team/StableBeluga2
Tip: You can monitor temperature and woltage by running this:
rocm-smi && rocm-smi -t