-
-
Notifications
You must be signed in to change notification settings - Fork 14.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Backport release-23.11] cuda-modules #272784
[Backport release-23.11] cuda-modules #272784
Conversation
(cherry picked from commit 4a25023)
(cherry picked from commit 397d95d)
cudaPackages.cuda_compat: ignore missing libs provided at runtime cudaPackages.gpus: Jetson should never build by default cudaPackages.flags: don't build Jetson capabilities by default cudaPackages: re-introduce filter for pre-existing CUDA redist packages in overrides cudaPackages: only recurseIntoAttrs for the latest of each major version cudaPackages.nvccCompatabilities: use GCC 10 through CUDA 11.5 to avoid a GLIBC incompatability cudaPackages.cutensor: acquire libcublas through cudatoolkit prior to 11.4 cudaPackages.cuda_compat: mark as broken on aarch64-linux if not targeting Jetson cudaPackages.cutensor_1_4: fix build cudaPackages: adjust use of autoPatchelfIgnoreMissingDeps cudaPackages.cuda_nvprof: remove unecessary override to add addOpenGLRunpath cudaPackages: use getExe' to avoid patchelf warning about missing meta.mainProgram cudaPackages: fix evaluation with Nix 2.3 cudaPackages: fix platform detection for Jetson/non-Jetson aarch64-linux python3Packages.tensorrt: mark as broken if required packages are missing Note: evaluating the name of the derivation will fail if tensorrt is not present, which is why we wrap the value in `lib.optionalString`. cudaPackages.flags.getNixSystem: add guard based on jetsonTargets cudaPackages.cudnn: use explicit path to patchelf cudaPackages.tensorrt: use explicit path to patchelf (cherry picked from commit 8e800ce)
(cherry picked from commit bfaefd0)
(cherry picked from commit 0a7dacf)
(cherry picked from commit aaf735e)
…helf (cherry picked from commit 6179d88)
After testing on a Jetson device, it turns out `cuda_compat` requires libnvdla_runtime.so which can't be satisfied by autoPatchElf, as it is provided by the runtime driver. This commit simply adds this library to the list of dependency to be ignored by autoPatchElf. (cherry picked from commit a3ac436)
c5a131e
to
d883056
Compare
I'm not sure I understand whether this fits the backport policy, what is the rationale here? |
Before this gets into a release, the following need to be addressed: |
(cherry picked from commit a39043f)
(cherry picked from commit 26dd975)
Some nvidia devices, such as the Jetson family, support the Nvidia compatibility package (nvidia_compat) which allows to run executables built against a higher CUDA major version on a system with an older CUDA driver. On such platforms, the consensus among CUDA maintainers is that there is no downside in always enabling it by default. This commit links to the relevant cuda_compat shared libraries by patching the CUDA core packages' runpaths when cuda_compat is available, in the same way as we do for OpenGL drivers currently. (cherry picked from commit d6c198a)
It does if there's no regressions. A lot of this work is about improving Nixpkgs + Cuda support, which can be viewed as an additive feature. It should be tested well to ensure that it doesn't introduce any new failures. |
cc @NixOS/cuda-maintainers , who would have a good idea of potential regressions |
@jonringer excellent point about regression tests -- doubly so for runtime behavior. We don't have anything in the way of tracking performance over time... we're still working on making sure we have enough builders to keep our cache somewhat populated, let alone automated testing or regression detection. The only runtime tests I have are manual: @graham33 would you happen to have anything you'd be able to use for regression testing on the 23.11 branch? You're one of the handful of people I know using Nix+CUDA and I'd like to make sure this doesn't cause fires or headaches :) |
(Switching to my other Github ID) We've started our internal upgrade process to adopt 23.11, which would at least allow us to do some basic testing. We could definitely try to build against your branch and make sure we can get some basic CUDA validation working. |
This resolves crashes in nsys-ui (cherry picked from commit a5b8caa)
I mentioned this in the matrix chat, but I'll reiterate here: I don't see a particular rush for backporting these changes to 23.11, unless there's a specific user/customer interested in the new functionality (meaning mostly jetsons) but otherwise limited to using the release branches. I haven't any objections either |
(cherry picked from commit 85bcd8c)
Hi 👋 to @ConnorBaker we can definitely help with testing if you have some specific tests to check before this PR can be merged. |
Hi @IlyaNiklyaev, I think backporting new features and breaking interface changes is far from the optimal way to go here:
I too would like to see a model more like rolling releases with merge trains, but right now it's easier to accept that "new features" take up to 6 months to "stabilize". Wdyt? |
Hi @SomeoneSerge, |
@IlyaNiklyaev I wonder, do you actually "patch" anything, or is it enough to use overlays? E.g. inputs.nixpkgs.url = "github:NixOS/nixpkgs/nixos-23.11";
inptus.nixpkgs-unstable.url = "...";
outputs = { nixpkgs, nixpkgs-unstable, ... }: {
packages.x86_64-linux = let pkgs = import nixpkgs { ...; overlays = [(final: prev: {
cudaPackages = final.callPackage "${nixpkgs-unstable}/pkgs/top-level/cuda-packages.nix" { };
cudaPackages_XX_Y = ...;
# ...
})];
}; ...might suffice to use the current cudaPackages expressions with the older nixpkgs release |
@SomeoneSerge hmm I even haven't tried that. I guess this is a tree-wide change, so it would be much more difficult to apply it as an overlay?
So |
I believe that most of the changes were actually contained to the |
@SomeoneSerge wow that really works :) Thanks, I'll try to follow this approach. It doesn't help much with cross-build though but it's not directly related to this backport I suppose. |
As NixOS 23.11 reaches end-of-life today, this PR is closed as part of the EOL cleanup. |
Important
This PR must be rebased to target
staging-23.11
after the following PRs are merged:Important
This PR is an amalgamation of changes and subsequent which landed in
master
as part of the work surrounding #256324. This PR includes:PRs to be included once merged:
Description of changes
Things done
Backport of #256324.
nix.conf
? (See Nix manual)sandbox = relaxed
sandbox = true
nix-shell -p nixpkgs-review --run "nixpkgs-review rev HEAD"
. Note: all changes have to be committed, also see nixpkgs-review usage./result/bin/
)Add a 👍 reaction to pull requests you find important.