-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] Migrate away from centos7 #3
Conversation
rocm rebuild is like 30 mins not sure --link will work here could probably avoid this if llama.cpp were subtree instead of submodule, avoiding need to copy into the image build stuff that changes all the time like .git
it still compiles and runs ok on 7900xt
Notes:
|
rocky9 is going to be a little annoying b/c nvidia only publishes rocky9 images for cuda the ollama build will use cuda dlls for 12.4.0 and 11.3.1 so we'd need to either build a rocky9 + cuda 11.3.1 image or amend the rh_linux_deps script to work for both rocky8 and rocky9 (currently it only works on rocky8) |
the gcc peg to 10.2 looks like it could go away if cuda gets a version bump. currently ollama using cuda 11.3.1 whose latest supported gcc apears to be 9.x (see https://docs.nvidia.com/cuda/archive/11.3.1/cuda-installation-guide-linux/index.html) looking at cuda 11.7.1 it supports gcc 11.x, with rhel9 officially supported (see https://docs.nvidia.com/cuda/archive/11.7.1/cuda-installation-guide-linux/index.html) looking at upstream llama.cpp Dockerfiles, finding evidence it was using cuda 11.7.1 (successfully?) until like 2024-09 (see ggerganov/llama.cpp@66b039a) so I bet if we bump to cuda 11.7.1, we can migrate the builds to rocky8 without too much hastle and remove all the gcc pinning hacks |
performance tests showing no significant difference, indicating the thing works as good as it did before the changes
d727785
to
fd61e1a
Compare
based on the git blame for the gcc 10.2 pin, should probably at least do a arm64 build and make sure it builds before opening PR against upstream. I don't have HW avail to test the arm build and would prefer not to rent it, so upstream maintainers will have to do runtime checks see also dhiltgen@5dacc1e |
upstream ollama/ollama just recently made a change to 'vendor in' the upstream llama.cpp code instead of relying on pulling this via submodule before PR to upstream, make sure to merge this change in to the unit under test edit later: done |
arm64 build is really long on amd64 box and using qemu for arm emulation. > 1 hr to get only 48% through the build (with CPU pegged at 100% the whole time). if we trigger the about the gcc 10.3 + nvcc issue, this thread seems to be valuable: see also https://gcc.gnu.org/git/gitweb.cgi?p=gcc.git;h=5357ab75dedef403b0eebf9277d61d1cbeb5898f seems like the bug is segfault in gcc, something I havent been able to reproduce so far on any builds I cant find any evidence RH or others backported this to gcc 10.3.1 (the gcc version used when installing gcc-toolset-10) on rhel8. So seems best to bump to gcc 11.2+ given:
|
finding some evidence the nvcc compile bug was fixed in upstream cuda 11.6.0:
symptom presented at https://github.com/Ahdhn/nvcc_bug_maybe is similar to the segfault alpaka is talking about |
initial smokechecks showing successful rocm amd64 compile + equivalent performance in smoke tests
ARG GOLANG_VERSION=1.22.5 | ||
ARG CMAKE_VERSION=3.22.1 | ||
ARG CUDA_VERSION_11=11.3.1 | ||
ARG CUDA_VERSION_11=11.7.1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
11.7.1 was chosen for consistency with upstream llama.cpp. (see also ggerganov/llama.cpp@66b039a) 11.8.0 is actually latest available in the 11 major cuda version.
looking at https://docs.nvidia.com/cuda/archive/11.8.0/cuda-toolkit-release-notes/index.html, not seeing anything hugely compelling outside of rocky 9 support:
11.8
This release introduces support for both the Hopper and Ada Lovelace GPU families.
Added support for Rocky Linux 9.
Added support for Kylin OS.
Package upgradable CUDA is now available starting CUDA 11.8 for Jetson devices. Refer to https://docs.nvidia.com/cuda/cuda-for-tegra-appnote/index.html#upgradable-package-for-jetson for details on how to upgrade to the latest CUDA version on Jetson and the supported JetPack versions.****
supplanted by ollama#7265 |
What
try everything on rockylinux 9opting to not do this b/c the feedback loop between tests is too high given current HW availability. rocky8 good enough (.tm) for nowWhy
closes ollama#7260