Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rPackages.torch: use binary package for installation #328980

Open
wants to merge 3 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
48 changes: 48 additions & 0 deletions doc/languages-frameworks/r.section.md
Original file line number Diff line number Diff line change
Expand Up @@ -94,6 +94,54 @@ Executing `nix-shell` will then drop you into an environment equivalent to the
one above. If you need additional packages just add them to the list and
re-enter the shell.

## torch with GPU acceleration {#torch-gpu-accel}

The `torch` package included in `nixpkgs` is based on the binary packages by the
mlverse team. By default, the CPU version of the package is installed. On Linux
distributions other than NixOS, it is possible to enable GPU acceleration by
setting `config.cudaSupport` to true when importing `nixpkgs` in a `shell.nix`.
You will also need to use `nixglhost`. Here is an example of such a shell:

```nix
{ pkgs ? import <nixpkgs> {config.cudaSupport = true; }, lib ? pkgs.lib }:

let
nixglhost-sources = pkgs.fetchFromGitHub {
owner = "numtide";
repo = "nix-gl-host";
rev = "main";
# Replace this with the hash Nix will complain about, TOFU style.
hash = "";
};
Comment on lines +109 to +115
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be documented in its own chapter rather than in the R section: #264433 (has been stale for a while, didn't get the chance to follow up) but maybe it's ok to temporarily merge this section and then replace it with a short reference when the chapter is added


nixglhost = pkgs.callPackage "${nixglhost-sources}/default.nix" { };

rpkgs = builtins.attrValues {
inherit (pkgs.rPackages)
torch;
};

wrapped_pkgs = pkgs.rWrapper.override {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

h'm, personally, I read rWrapper.override as a "wrapped interpreter" (wrapped so it knows where to find the packages), not as "wrapped packages"; e.g. I'd write python = python3.withPackages ...; I also saw people write env = python3.withPackages ...

pkgs makes me think of a package set, which this value is not

packages = [ rpkgs ];
};
Comment on lines +119 to +126
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not just packages = with rPackages; ...?


in pkgs.mkShell {
buildInputs = [
nixglhost
wrapped_pkgs
];
}

```

After executing `nix-shell` to drop in that environment, call `nixglhost R` to
b-rodrigues marked this conversation as resolved.
Show resolved Hide resolved
start a GPU accelerated R session. Test that everything worked well by calling
`torch::cuda_memory_summary()` which should return a list of summary statistics
regarding the available GPU memory.

We are looking for contributors that could help us package the darwin binaries
as well as fix `torch` for NixOS as well.

## Updating the package set {#updating-the-package-set}

There is a script and associated environment for regenerating the package
Expand Down
31 changes: 27 additions & 4 deletions pkgs/development/r-modules/default.nix
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there much value to keeping a binary-only package in Nixpkgs rather than out-of-tree, is it a common dependency?

it wants to download and install the libtorch and liblantern libraries which doesn't work

Our python3Packages.torch can be/is consumed in C++/cmake projects, even though we haven't split libtorch out yet. Is there a way to undenvor libtorch in rPackages.torch? I'm not familiar with liblantern though

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll have to study python3Packages.torch but I'm quite a beginner with Nix and doubt I'll be able to do better than this PR 😅

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

liblantern is a intermediate API between libtorch and R, maintained by the developers of R.torch. I think it was because calling libtorch from R was too difficult, and calling libtorch from R via python was too slow.

It would be good to get everything to compile from source. I would support that effort, but I would need a Nix CUDA expert to help because I kept getting issues with CUDA version compatibility. I was not able to successfully compile liblantern and libtorch despite spending days on the problem.

This PR bypasses all the complexity of trying to compile liblantern and libtorch from source, as well as the complexity of ensuring the correct CUDA libraries are present on the system. Consuming libtorch from python3Packages.torch may be possible, but compiling liblantern against libtorch didn't work easily for @jbedo over at #273536 (comment).

I feel the value of having this binary-only package in nixpkgs is for R users who are not nix users first. torch is in CRAN, and all CRAN packages are automatically included in pkgs.rPackages, so by the principle of least surprise, it should just work. The alternative is to cause it to fail in a way that directs the user to the Nix R wiki or something similar with detailed instructions for setting up their first overlay.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Compiling liblantern dosen't seem that difficult in itself, the main issue is it needs and old version of torch and doesn't work with python3Packages.torch. Having it work out of the box, even if we need a binary, is a goal since we want to maximise CRAN/BioC functionality. We do need to patch these binaries though to at least work with NixOS out of the box without tricks.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do need to patch these binaries though to at least work with NixOS out of the box without tricks.

Would a wrapper do the trick? Wrap the binary to use nixglhost ootb?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would a wrapper do the trick? Wrap the binary to use nixglhost ootb?

If wrapping liblantern and libtorch worked, that sounds fine to me, but I am not sure we should wrap R in nixglhost by default on NixOS. I don't know whether it would cause any problems, it just seems excessive.

You could have a look at how I patched the binaries over at #273536. The important parts are the various patchelf commands and pkgs.cudaPackages.autoAddOpenGLRunpathHook.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be good to get everything to compile from source. I would support that effort, but I would need a Nix CUDA expert to help because I kept getting issues with CUDA version compatibility

We can try working together

Would a wrapper do the trick? Wrap the binary to use nixglhost ootb?

There shouldn't be any need, you just autoPatchelf and/or addDriverRunpath to link the relevant libraries directly

If wrapping liblantern and libtorch worked, that sounds fine to me, but I am not sure we should wrap R in nixglhost by default on NixOS

Correct, we shouldn't

pkgs.cudaPackages.autoAddOpenGLRunpathHook.

Note it's pkgs.autoAddDriverRunpath now

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have amended the docs to explain this is only for non-NixOS linux and that we are looking for people to fix torch for NixOS and darwin.

For fixing this for NixOS I would need help, as I'm not running NixOS and don't have experience with patching binaries. Should we merge this one and then continue with another PR, or try to fix it as part of this PR?

Original file line number Diff line number Diff line change
Expand Up @@ -1803,10 +1803,33 @@ let
'';
});

torch = old.torch.overrideAttrs (attrs: {
preConfigure = ''
patchShebangs configure
'';
Comment on lines -1806 to -1809
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Am I to understand that the original package was source-built? If so, I think we should keep a source-built torch and add a separate torch-bin attribute for the binary package?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The original R torch package was indeed source built but didn't provide the torch library, it suggested users to download and install it on first run

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

was indeed source built but didn't provide the torch library, it suggested users to download and install it on first run

oh

torch = let
# Sets the correct string to download either the binary for
# the cpu version of the torch package, or the
# the cuda-enabled version.
# To use gpu acceleration, set `config.cudaSupport = true;`
# when importing nixpkgs in your shell
accel = if pkgs.config.cudaSupport then "cu118" else "cpu";

binary_sha = if pkgs.config.cudaSupport then
"sha256-a80sG89C0svZzkjNRpY0rTR2P1JdvKAbWDGIIghsv2Y=" else
"sha256-qUn8Rot6ME7iTvtNd52iw3ebqMnpLz7kwl/9GoPHD+I=";
in
old.torch.overrideAttrs (attrs: {
nativeBuildInputs = attrs.nativeBuildInputs or [ ] ++ [
pkgs.autoPatchelfHook
pkgs.autoAddDriverRunpath
];
buildInputs = attrs.buildInputs or [ ] ++ [
"${lib.getLib R}/lib/R" # libR
pkgs.zlib
] ++ lib.optionals pkgs.config.cudaSupport [
pkgs.cudaPackages.cuda_cudart
];
Comment on lines +1819 to +1828
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These changes are sufficient to do something like

> library(torch)
> torch_sum(torch_randn(c(10, 10), device="cuda"))

However, normally libtorch_cuda links e.g. libcublas, etc, which I didn't have to explicitly mention. I suppose that means they're not listed in DT_NEEDED and are loaded by dlopen instead. Either way I expect all related functionality to be broken and I can't allocate the time to do more tests unfortunately

I also see there's a lot of garbage in the outputs, e.g. a vendored copy of libcudart.so. You could link those by telling autoPatchelf to add $ORIGIN, but I'd advise against it and recommend that you delete the vendored libraries. Cuda libraries tend to need ad hoc patchelfing as well, we've had issues with that before. Cuda libraries that originate from PyPi are usually broken, which is why we link the ones from cudaPackages instead

src = pkgs.fetchzip {
url = "https://torch-cdn.mlverse.org/packages/${accel}/0.13.0/src/contrib/torch_0.13.0_R_x86_64-pc-linux-gnu.tar.gz";
sha256 = binary_sha;
};
});

pak = old.pak.overrideAttrs (attrs: {
Expand Down