Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Migrate away from centos7 #3

Closed
wants to merge 9 commits into from
Closed

Conversation

cazlo
Copy link
Owner

@cazlo cazlo commented Oct 18, 2024

What

  • migrate the rocm build to rockylinux 8, run a performance test, and observed it still runs about the same as HEAD of main
  • migrate cuda build layers to rockylinux 8
  • migrate cpu build layers to rockylinux 8
  • try everything on rockylinux 9 opting to not do this b/c the feedback loop between tests is too high given current HW availability. rocky8 good enough (.tm) for now
  • reopen against ollama/ollama if this generally works ok
    • remove all the rootless docker compose stuff added to facilitate the local testing. if this gets cleaned up and generic, it would get its own PR against upstream
    • run as many unit test on your HW as possible b4 submitting PR to upstream

Why

closes ollama#7260

rocm rebuild is like 30 mins

not sure --link will work here

could probably avoid this if llama.cpp were subtree instead of submodule, avoiding need to copy into the image build stuff that changes all the time like .git
it still compiles and runs ok on 7900xt
@cazlo
Copy link
Owner Author

cazlo commented Oct 18, 2024

Notes:

@cazlo
Copy link
Owner Author

cazlo commented Oct 18, 2024

rocky9 is going to be a little annoying b/c nvidia only publishes rocky9 images for cuda 12.3.0 and later.

the ollama build will use cuda dlls for 12.4.0 and 11.3.1

so we'd need to either build a rocky9 + cuda 11.3.1 image or amend the rh_linux_deps script to work for both rocky8 and rocky9 (currently it only works on rocky8)

@cazlo
Copy link
Owner Author

cazlo commented Oct 18, 2024

the gcc peg to 10.2 looks like it could go away if cuda gets a version bump.

currently ollama using cuda 11.3.1 whose latest supported gcc apears to be 9.x (see https://docs.nvidia.com/cuda/archive/11.3.1/cuda-installation-guide-linux/index.html)

looking at cuda 11.7.1 it supports gcc 11.x, with rhel9 officially supported (see https://docs.nvidia.com/cuda/archive/11.7.1/cuda-installation-guide-linux/index.html)

looking at upstream llama.cpp Dockerfiles, finding evidence it was using cuda 11.7.1 (successfully?) until like 2024-09 (see ggerganov/llama.cpp@66b039a)

so I bet if we bump to cuda 11.7.1, we can migrate the builds to rocky8 without too much hastle and remove all the gcc pinning hacks

performance tests showing no significant difference, indicating the thing works as good as it did before the changes
@cazlo
Copy link
Owner Author

cazlo commented Oct 18, 2024

based on the git blame for the gcc 10.2 pin, should probably at least do a arm64 build and make sure it builds before opening PR against upstream.

I don't have HW avail to test the arm build and would prefer not to rent it, so upstream maintainers will have to do runtime checks

see also dhiltgen@5dacc1e
ollama@b8c2be6

@cazlo
Copy link
Owner Author

cazlo commented Oct 18, 2024

upstream ollama/ollama just recently made a change to 'vendor in' the upstream llama.cpp code instead of relying on pulling this via submodule

before PR to upstream, make sure to merge this change in to the unit under test

edit later: done

@cazlo
Copy link
Owner Author

cazlo commented Oct 19, 2024

arm64 build is really long on amd64 box and using qemu for arm emulation. > 1 hr to get only 48% through the build (with CPU pegged at 100% the whole time).

if we trigger the runners ci step (e.g. through changes to llama dir) it seems like CI will run this on actual arm runners for much less time

about the gcc 10.3 + nvcc issue, this thread seems to be valuable:

see also https://gcc.gnu.org/git/gitweb.cgi?p=gcc.git;h=5357ab75dedef403b0eebf9277d61d1cbeb5898f

seems like the bug is segfault in gcc, something I havent been able to reproduce so far on any builds

I cant find any evidence RH or others backported this to gcc 10.3.1 (the gcc version used when installing gcc-toolset-10) on rhel8.

So seems best to bump to gcc 11.2+ given:

  • it has the bugfix in it
  • it is listed as supported by cuda and rocm docs
  • rhel8 gives us gcc 11.2.1 if we ask for gcc-toolset-11-gcc

@cazlo
Copy link
Owner Author

cazlo commented Oct 19, 2024

finding some evidence the nvcc compile bug was fixed in upstream cuda 11.6.0:
https://docs.nvidia.com/cuda/archive/11.6.0/cuda-toolkit-release-notes/index.html

An issue with the use of lambda function when an object is passed-by-value is resolved. https://github.com/Ahdhn/nvcc_bug_maybe

symptom presented at https://github.com/Ahdhn/nvcc_bug_maybe is similar to the segfault alpaka is talking about

Dockerfile Show resolved Hide resolved
initial smokechecks showing successful rocm amd64 compile + equivalent performance in smoke tests
ARG GOLANG_VERSION=1.22.5
ARG CMAKE_VERSION=3.22.1
ARG CUDA_VERSION_11=11.3.1
ARG CUDA_VERSION_11=11.7.1
Copy link
Owner Author

@cazlo cazlo Oct 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

11.7.1 was chosen for consistency with upstream llama.cpp. (see also ggerganov/llama.cpp@66b039a) 11.8.0 is actually latest available in the 11 major cuda version.

looking at https://docs.nvidia.com/cuda/archive/11.8.0/cuda-toolkit-release-notes/index.html, not seeing anything hugely compelling outside of rocky 9 support:

11.8
This release introduces support for both the Hopper and Ada Lovelace GPU families.
Added support for Rocky Linux 9.
Added support for Kylin OS.
Package upgradable CUDA is now available starting CUDA 11.8 for Jetson devices. Refer to https://docs.nvidia.com/cuda/cuda-for-tegra-appnote/index.html#upgradable-package-for-jetson for details on how to upgrade to the latest CUDA version on Jetson and the supported JetPack versions.****

@cazlo
Copy link
Owner Author

cazlo commented Oct 19, 2024

supplanted by ollama#7265

@cazlo cazlo closed this Oct 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Migrate off centos 7 for intermediate build layers in container image builds
1 participant