Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Radeon VII #175

Open
commandline-be opened this issue Nov 4, 2024 · 15 comments
Open

Radeon VII #175

commandline-be opened this issue Nov 4, 2024 · 15 comments

Comments

@commandline-be
Copy link

owner of a Radeon VII card, if i can help testing code to run well on it, let me know

@lamikr
Copy link
Owner

lamikr commented Nov 7, 2024

Hi, that would be really interesting!

Most of the rocm components seems to had by default the support for in code for the Radeon VII/gfx906 in place by default and couple of weeks ago I went through all the common places typically requiring patching. I have tested that everything should now build for these cards but not been able to test functionality with VII.

That said, if you have time it would be great if you could try to make the build and test it. These steps should help you to get started.

git clone [email protected]:lamikr/rocm_sdk_builder.git 
cd rocm_sdk_builder
./install_deps.sh
./babs.sh -c (choose gfx906)
./babs.sh -b

And once the build has been progressed to building rocminfo and amd-smi, those command would be good way to start checking the build.


source /opt/rocm/bin/env_rocm.sh
rocminfo
amd-smi metrics

Hip and opencl -compiler tests should be also doable pretty soon (no need to wait whole build to finish)

source /opt/rocm/bin/env_rocm.sh
cd /opt/rocm_sdk_612/docs/examples/hipcc/hello_world
./build.sh
cd /opt/rocm_sdk_612/docs/examples/opencl/hello_world
./build.sh

Once the build has finished, if things works well, then also pytorch should have the support for you gpu.
Some basic benchmarks are done by

cd benchmarks
./run_and_save_benchmarks.sh

If these works, then you can also build the llama_cpp, stable-diffusion-webui and vllm with command:

./babs.sh -b binfo/extra/ai_tools.blist

All of those have also own example apps you can run either on console or by starting their web-server and then connecting to it via browser. (I can help more later if needed)

@commandline-be
Copy link
Author

Thanks :-) I'll try that asap

@Said-Akbar
Copy link

Hello @lamikr ,

Thank you for your amazing work! I am really glad I found this repo.

I have two AMD MI60 cards (gfx906). I will also compile this repo and share test results with you!

I am specifically interested in VLLM batch/concurrent inference speeds. So far, I was not able to compile VLLM with default installations of ROCM 6.2.2 and VLLM.
Another issue I faced was lack of flash attention support. I see this repo has aotriton with support for gfx906. I hope aotriton implementation of flash attention works with this repo. Reference: ROCm/aotriton#39

There is also composable_kernel based flash attention implementation here - https://github.com/ROCm/flash-attention (v2.6.3). This FA compiles fine with default ROCM 6.2.2 in Ubuntu 22.04 but exllamav2 backend with llama3 8B started generating gibberish text (exllamav2 works fine without FA2; but exllamav2 is very slow without FA2). I hope this repo fixes this gibberish text generation problem with FA2.

Thanks again!

@Said-Akbar
Copy link

Quick update. I did a fresh installation of Ubuntu 24.04.1 today which takes around 6.5GB SSD storage. It installs Nvidia GPU drivers by default. I assumed this repo would install AMD GPU drivers but no, it did not. Probably, this should be mentioning in README with a brief description of how to install GPU drivers. So, I installed AMD GPU drivers as follows:

sudo apt update
sudo apt install "linux-headers-$(uname -r)" "linux-modules-extra-$(uname -r)"
sudo usermod -a -G render,video $LOGNAME # Add the current user to the render and video groups
wget https://repo.radeon.com/amdgpu-install/6.2.4/ubuntu/noble/amdgpu-install_6.2.60204-1_all.deb
sudo apt install ./amdgpu-install_6.2.60204-1_all.deb
sudo apt update

Also, there were several packages missing in Ubuntu which I had to install after I saw error messages in ./install_deps.sh.

sudo apt install rpm
sudo apt install python3-pip
sudo apt install git-lfs

Only after that, I was able to run ./install_deps.sh without errors.
I selected gfx906 for ./babs.sh -c and now I'm waiting for ./babs.sh -b to finish. So, far it has been running for 1.5 hours on my AMD 5950x CPU with 96GB DDR4 3200Mhz.
Currently, the script is installing flang_libpgmath.

Another feedback. Can you please include a global progress bar that says how many packages were built and the total number of packages remaining in terminal logs?

@Said-Akbar
Copy link

Said-Akbar commented Nov 20, 2024

ok, I want to report an error that occurred while building the source code.
I ran ./babs.sh -b and after 1.5 hours, this is the error message I see:

-- LIBOMPTARGET: Not building hostexec for NVPTX because cuda not found
   -- Building hostexec with LLVM 17.0.0git found with CLANG_TOOL /opt/rocm_sdk_612/bin/clang
-- LIBOMPTARGET: Building the llvm-omp-device-info tool
-- LIBOMPTARGET: Building the llvm-omp-kernel-replay tool
-- LIBOMPTARGET: Building DeviceRTL. Using clang: /opt/rocm_sdk_612/bin/clang, llvm-link: /opt/rocm_sdk_612/bin/llvm-link and opt: /opt/rocm_sdk_612/bin/opt
-- LIBOMPTARGET: DeviceRTLs gfx906: Getting ROCm device libs from /opt/rocm_sdk_612/lib64/cmake/AMDDeviceLibs
 ===================> bc_files: Configuration.cpp-400-gfx906.bc;Debug.cpp-400-gfx906.bc;Kernel.cpp-400-gfx906.bc;LibC.cpp-400-gfx906.bc;Mapping.cpp-400-gfx906.bc;Misc.cpp-400-gfx906.bc;Parallelism.cpp-400-gfx906.bc;Reduction.cpp-400-gfx906.bc;State.cpp-400-gfx906.bc;Synchronization.cpp-400-gfx906.bc;Tasking.cpp-400-gfx906.bc;Utils.cpp-400-gfx906.bc;Workshare.cpp-400-gfx906.bc;ExtraMapping.cpp-400-gfx906.bc;Xteamr.cpp-400-gfx906.bc;Memory.cpp-400-gfx906.bc;Xteams.cpp-400-gfx906.bc;/home/saidp/Downloads/rocm_sdk_builder/builddir/016_03_llvm_project_openmp/libm-400-gfx906.bc;/home/saidp/Downloads/rocm_sdk_builder/builddir/016_03_llvm_project_openmp/libomptarget/hostexec/libhostexec-400-gfx906.bc;/opt/rocm_sdk_612/amdgcn/bitcode/ocml.bc;/opt/rocm_sdk_612/amdgcn/bitcode/ockl.bc;/opt/rocm_sdk_612/amdgcn/bitcode/oclc_wavefrontsize64_on.bc;/opt/rocm_sdk_612/amdgcn/bitcode/oclc_isa_version_906.bc;/opt/rocm_sdk_612/amdgcn/bitcode/oclc_abi_version_400.bc ========================
-- LIBOMPTARGET: DeviceRTLs gfx906: Getting ROCm device libs from /opt/rocm_sdk_612/lib64/cmake/AMDDeviceLibs
 ===================> bc_files: Configuration.cpp-500-gfx906.bc;Debug.cpp-500-gfx906.bc;Kernel.cpp-500-gfx906.bc;LibC.cpp-500-gfx906.bc;Mapping.cpp-500-gfx906.bc;Misc.cpp-500-gfx906.bc;Parallelism.cpp-500-gfx906.bc;Reduction.cpp-500-gfx906.bc;State.cpp-500-gfx906.bc;Synchronization.cpp-500-gfx906.bc;Tasking.cpp-500-gfx906.bc;Utils.cpp-500-gfx906.bc;Workshare.cpp-500-gfx906.bc;ExtraMapping.cpp-500-gfx906.bc;Xteamr.cpp-500-gfx906.bc;Memory.cpp-500-gfx906.bc;Xteams.cpp-500-gfx906.bc;/home/saidp/Downloads/rocm_sdk_builder/builddir/016_03_llvm_project_openmp/libm-500-gfx906.bc;/home/saidp/Downloads/rocm_sdk_builder/builddir/016_03_llvm_project_openmp/libomptarget/hostexec/libhostexec-500-gfx906.bc;/opt/rocm_sdk_612/amdgcn/bitcode/ocml.bc;/opt/rocm_sdk_612/amdgcn/bitcode/ockl.bc;/opt/rocm_sdk_612/amdgcn/bitcode/oclc_wavefrontsize64_on.bc;/opt/rocm_sdk_612/amdgcn/bitcode/oclc_isa_version_906.bc;/opt/rocm_sdk_612/amdgcn/bitcode/oclc_abi_version_500.bc ========================


CMake Error at cmake/OpenMPTesting.cmake:209 (add_custom_target):
  add_custom_target cannot create target
  "check-libomptarget-nvptx64-nvidia-cuda" because another target with the
  same name already exists.  The existing target is a custom target created
  in source directory
  "/home/saidp/Downloads/rocm_sdk_builder/src_projects/llvm-project/openmp/libomptarget/test".
  See documentation for policy CMP0002 for more details.
Call Stack (most recent call first):
  libomptarget/test/CMakeLists.txt:23 (add_openmp_testsuite)


CMake Error at cmake/OpenMPTesting.cmake:209 (add_custom_target):
  add_custom_target cannot create target
  "check-libomptarget-nvptx64-nvidia-cuda-LTO" because another target with
  the same name already exists.  The existing target is a custom target
  created in source directory
  "/home/saidp/Downloads/rocm_sdk_builder/src_projects/llvm-project/openmp/libomptarget/test".
  See documentation for policy CMP0002 for more details.
Call Stack (most recent call first):
  libomptarget/test/CMakeLists.txt:23 (add_openmp_testsuite)

Attaching the full error output ./babs.sh -b >>error_output.txt 2>&1 after running it the second time for reference:
error_output.txt

Short info about my PC:
OS: Ubuntu 24.04.1
CPU: AMD 5950x
RAM: 96GB DDR4 3200Mhz
Storage: SSD 1TB + HDD
GPUs: RTX 3090 (for Video output), 2xAMD MI60 (gfx906).


I ran the following commands and they worked.

source /opt/rocm_sdk_612/bin/env_rocm.sh
rocminfo
amd-smi metric
cd /opt/rocm_sdk_612/docs/examples/hipcc/hello_world
./build.sh
cd /opt/rocm_sdk_612/docs/examples/opencl/hello_world
./build.sh

rocminfo correctly showed those two MI60 cards. hipcc and opencl examples worked without errors.
Only ./run_and_save_benchmarks.sh did not work due to missing torch library.


Please, let me know if I need to install Cuda libraries or else, how I fix the error above.

Thanks!

@Said-Akbar
Copy link

@lamikr , I think the error I am seeing might be related to spack/spack#45411 but not sure how I implement the fix here. Let me know. thanks!

@Said-Akbar
Copy link

Said-Akbar commented Nov 21, 2024

Quick update. Installation is working after I remove all nvidia drivers and restart my PC.

sudo apt-get purge nvidia*
sudo apt-get autoremove
sudo apt-get autoclean

Now, Ubuntu is using X.Org Server Nouveau drivers.

@Said-Akbar
Copy link

Finally, ROCM SDK was installed on my PC after 5 hours. It takes ~90GB of space in rocm_sdk_builder, 8.5GB in the triton folder, ~2GB in the lib/x86_64-linux-gnu folder (mostly LLVM) and ~20GB in opt/rocm_sdk_612 folder. Total of 120GB of files! Is there a way to create an installable version of my current setup (all 120GB)? It is huge and time-consuming. For comparison, rocm installation from binaries takes around 30GB.

@Said-Akbar
Copy link

here are the benchmark results. I think the flash attention test failed.

./run_and_save_benchmarks.sh
Timestamp for benchmark results: 20241121_190404
Saving to file: 20241121_190404_cpu_vs_gpu_simple.txt
Benchmarking CPU and GPUs
Pytorch version: 2.4.1
ROCM HIP version: 6.1.40093-61a06a2f8
       Device:  AMD Ryzen 9 5950X 16-Core Processor
    'CPU time: 26.503 sec
       Device: AMD Radeon Graphics
    'GPU time: 0.399 sec
       Device: AMD Radeon Graphics
    'GPU time: 0.353 sec
Benchmark ready

Saving to file: 20241121_190404_pytorch_dot_products.txt
Pytorch version: 2.4.1
dot product calculation test
tensor([[[ 0.2042, -0.5683,  0.5711,  1.5666, -0.8859, -0.4255, -0.6103,
          -0.5932],
         [-0.1816, -1.0552,  0.3676,  2.1399, -0.8622,  0.1185, -0.4614,
          -0.4577],
         [ 0.2491, -0.5238,  0.5873,  1.5027, -0.8808, -0.4906, -0.6309,
          -0.6083]],

        [[-0.0812,  0.5027, -0.0134, -0.1771, -1.6389,  0.0154, -1.1964,
          -0.3948],
         [-0.3459, -0.4265,  0.0969,  0.0608, -0.9923, -0.4199, -0.7190,
          -0.0208],
         [-0.2615, -0.6958,  0.1066, -0.1948, -1.2152, -0.1223, -0.6278,
           0.1627]]], device='cuda:0')

Benchmarking cuda and cpu with Default, Math, Flash Attention amd Memory pytorch backends
Device: AMD Radeon Graphics / cuda:0
    Default benchmark:
:0:/home/saidp/Downloads/rocm_sdk_builder/src_projects/clr/hipamd/src/hip_global.cpp:114 : 8471950880 us: [pid:454884 tid:0x7ad2a9db0b80] Cannot find Symbol with name: Cijk_Alik_Bljk_HHS_BH_MT128x64x16_SE_APM1_AF0EM2_AF1EM1_AMAS3_ASAE01_ASCE01_ASEM2_BL1_BS1_DTLA0_DTLB0_EPS1_FL1_GLVWA4_GLVWB4_GRVW4_GSU1_GSUASB_ISA906_IU1_K1_KLA_LPA0_LPB0_LDL1_LRVW4_MDA1_MMFGLC_NLCA1_NLCB1_ONLL1_PK0_PGR1_PLR1_SIA1_SU32_SUM0_SUS256_SVW4_SNLL0_TT8_4_USFGROn1_VAW2_VSn1_VW4_VWB4_WG16_16_1_WGM1

@Said-Akbar
Copy link

that error above is causing llama.cpp not to run any models on GPU. Let me file a bug.

@commandline-be
Copy link
Author

@lamikr finally got round to do the testing

initially the build went smooth-ish
then i noticed something failed

after doing a ./bass.sh --clean and starting ./babs.sh -b again i now get an error on '

HIP_COMPILER=clang
HIP_RUNTIME=rocclr
ROCM_PATH=/opt/rocm_sdk_612
HIP_ROCCLR_HOME=/opt/rocm_sdk_612
HIP_CLANG_PATH=/opt/rocm_sdk_612/bin
HIP_INCLUDE_PATH=/opt/rocm_sdk_612/include
HIP_LIB_PATH=/opt/rocm_sdk_612/lib
DEVICE_LIB_PATH=/opt/rocm_sdk_612/amdgcn/bitcode
HIP_CLANG_RT_LIB=/opt/rocm_sdk_612/lib/clang/17/lib/linux
hipcc-args: -DENABLE_BACKTRACE -DHAVE_BACKTRACE_H -I/usr/src/rocm_sdk_builder/src_projects/roctracer/src/util -O3 -DNDEBUG -fPIC -Wall -Werror -std=gnu++17 -MD -MT src/CMakeFiles/util.dir/util/debug.cpp.o -MF CMakeFiles/util.dir/util/debug.cpp.o.d -o CMakeFiles/util.dir/util/debug.cpp.o -c /usr/src/rocm_sdk_builder/src_projects/roctracer/src/util/debug.cpp
hipcc-cmd: "/opt/rocm_sdk_612/bin/clang" -isystem "/opt/rocm_sdk_612/include" --offload-arch=gfx906 -mllvm -amdgpu-early-inline-all=true -mllvm -amdgpu-function-calls=false --hip-path="/opt/rocm_sdk_612" --hip-device-lib-path="/opt/rocm_sdk_612/amdgcn/bitcode" -DENABLE_BACKTRACE -DHAVE_BACKTRACE_H -I/usr/src/rocm_sdk_builder/src_projects/roctracer/src/util -O3 -DNDEBUG -fPIC -Wall -Werror -std=gnu++17 -MD -MT src/CMakeFiles/util.dir/util/debug.cpp.o -MF CMakeFiles/util.dir/util/debug.cpp.o.d -o "CMakeFiles/util.dir/util/debug.cpp.o" -c -x hip /usr/src/rocm_sdk_builder/src_projects/roctracer/src/util/debug.cpp
/usr/src/rocm_sdk_builder/src_projects/roctracer/src/util/debug.cpp:33:10: fatal error: 'backtrace.h' file not found
33 | #include <backtrace.h>
| ^~~~~~~~~~~~~
1 error generated when compiling for gfx906.
make[2]: *** [src/CMakeFiles/util.dir/build.make:76: src/CMakeFiles/util.dir/util/debug.cpp.o] Error 1
make[2]: Leaving directory '/usr/src/rocm_sdk_builder/builddir/011_01_roctracer'
make[1]: *** [CMakeFiles/Makefile2:220: src/CMakeFiles/util.dir/all] Error 2
make[1]: Leaving directory '/usr/src/rocm_sdk_builder/builddir/011_01_roctracer'
make: *** [Makefile:156: all] Error 2
build failed: roctracer

@lamikr
Copy link
Owner

lamikr commented Nov 24, 2024

Hi. thanks for the reports. The flash attention support for gfx906 would need to be implemented in aotriton.
As it's gfc based gpu, I need to check would the triton code there that supports newwer gfx9* cards could get to work also with gfx906.

Althought I do not have the gfx906, I will start a new build for it with ubuntu 24.04 and try to reproduce the build errors. If you have some fixes, are you able to make pull request?

@commandline-be
Copy link
Author

hey @lamikr

The build is on LinuxMint Debian Edition, if need be i can make pull requests
can you help identify the backtrace.h origin ?

@lamikr
Copy link
Owner

lamikr commented Nov 25, 2024

I have multiple versions of it under src_projects directory

$ cd src_projects/
$ find -name backtrace.h

./rocgdb/libbacktrace/backtrace.h
./rocMLIR/external/llvm-project/compiler-rt/lib/gwp_asan/optional/backtrace.h
./binutils-gdb/libbacktrace/backtrace.h
./openmpi/opal/mca/backtrace/backtrace.h
./llvm-project/compiler-rt/lib/gwp_asan/optional/backtrace.h
./pytorch/third_party/tensorpipe/third_party/libnop/include/nop/utility/backtrace.h

I am not sure what is causing it. Maybe the install directory /opt/rocm_sdk_612 should also be removed and then start a clean build. Lets try to reset everything and then start a fresh build.
(Normally this should not be needed and only command would be ./babs.sh -up and ./babs.sh -b to get only the changed projects rebuild)

./babs.sh -ca
./babs.sh -up
./babs.sh --clean
rm -rf /opt/rocm_sdk_612
./babs.sh -b

I have not solved yet the llama.cpp error with gfx906 but trying to add more debug to next build related to that.
Lets track that issue on #180

@cb88
Copy link

cb88 commented Dec 12, 2024

I can get as far as running the HIP and CL hello worlds, but cannot run the run and save benchmarks script.

-- MIGraphX is using hipRTC
-- MIGraphx is using Find-2.0 API of MIOpen
-- MIGraphx is using Find Mode API of MIOpen
-- MIGraphx is using Beta API of rocBLAS
-- MIGraphX is using Beta API of rocBLAS for FP8 computations
-- Enable test package migraphx
-- rocm-cmake: Set license file to /home/cb88/rocm_sdk_builder/src_projects/AMDMIGraphX/LICENSE.
-- Generate ctest file
-- Configuring done (1.4s)
CMake Error in src/py/CMakeLists.txt:
Imported target "pybind11::module" includes non-existent path

"/include"

in its INTERFACE_INCLUDE_DIRECTORIES. Possible reasons include:

  • The path was deleted, renamed, or moved to another location.

  • An install or uninstall procedure did not complete successfully.

  • The installation package was faulty and references files it does not
    provide.

CMake Error in src/py/CMakeLists.txt:
Imported target "pybind11::pybind11" includes non-existent path

"/include"

in its INTERFACE_INCLUDE_DIRECTORIES. Possible reasons include:

  • The path was deleted, renamed, or moved to another location.

  • An install or uninstall procedure did not complete successfully.

  • The installation package was faulty and references files it does not
    provide.

CMake Warning (dev) at /opt/rocm_sdk_612/share/rocmcmakebuildtools/cmake/ROCMTest.cmake:230 (install):
Policy CMP0095 is not set: RPATH entries are properly escaped in the
intermediary CMake install script. Run "cmake --help-policy CMP0095" for
policy details. Use the cmake_policy command to set the policy and
suppress this warning.

RPATH entries for target 'test_verify' will not be escaped in the
intermediary cmake_install.cmake script.
Call Stack (most recent call first):
test/verify/CMakeLists.txt:29 (rocm_install_test)
This warning is for project developers. Use -Wno-dev to suppress it.

-- Generating done (2.5s)
CMake Generate step failed. Build files cannot be regenerated correctly.
configure failed: AMDMIGraphX

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants