-
-
Notifications
You must be signed in to change notification settings - Fork 14.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
rPackages.torch: use binary package for installation #328980
base: master
Are you sure you want to change the base?
Conversation
This comment was marked as outdated.
This comment was marked as outdated.
When you say "building the package from source", are you referring to the R package Please compare the code between this PR and #273536. Also have a look at my other attempts to make later versions of I could not successfully build the GPU-capable C binaries from source, but I eventually managed to get the GPU-capable pre-compiled binaries to work, following the example of libtorch-bin. For non-NixOS systems, you also need https://github.com/numtide/nix-gl-host. I felt the code I have written in #273536 is very much a hack, and was not ready to push the code to nixpkgs without a few more people looking over it to make improvements. No-one commented on the draft PR until now. You shouldn't need a separate Apologies if something is unclear, I don't have a lot of time at the moment to proofread, but happy to answer questions when I am able to get around to it. |
This comment was marked as outdated.
This comment was marked as outdated.
If you can get the GPU version to run without all the messy code I added, I will be happy to close my PR and support this one. |
This does seem cleaner if GPU can be made to run. |
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
Great! 6 lines of code changed is much nicer than 70, and much more maintainable. I suggest making a wiki entry or very visible doc strings as part of this PR if possible, as using nixglhost is not obvious or necessary in all cases, and I don't feel comfortable wrapping every call to R with nixglhost silently. The user might also want to know how to download the CPU version, if hard drive space is a concern. I can't test on NixOS right now as my dev machine is in for service. @jbedo, are you able to test on NixOS? There might be some interest in compiling |
6790962
to
e69d654
Compare
thanks for your help @PhDyellow and @jbedo this PR is ready to be reviewed! |
This comment was marked as outdated.
This comment was marked as outdated.
The `torch` package included in `nixpkgs` is based on the binary packages by the | ||
mlverse team. By default, the CPU version of the package is installed. To enable | ||
GPU acceleration, set `config.cudaSupport` to true when importing `nixpkgs` in a | ||
`shell.nix`. You will also need to use `nixglhost`. Here is an example of such a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My understanding is that you only need nixglhost
if you are not on NixOS.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be the case, but I don't see any patching happening in this PR so likely it won't work on NixOS either. Should be possible to patch the binaries to operate without tricks on NixOs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it is possible to patch the liblantern and libtorch binaries, I did it successfully in #273536. After patching, the binaries ran on both NixOS and nix in CentOS. I didn't try patching the full bundle being installed in this PR though.
Whether nixglhost
works on NixOS is another thing to test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tested on NixOS and confirmed he needed nixglhost as well.
This is a bug then, the appropriate patchelfing is necessary. Nixglhost should not be required on NixOS
You can use LD_DEBUG=libs R
to find out which binaries (are loading cuda and) need the patching
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have amended the docs to explain this is only for non-NixOS linux and that we are looking for people to fix torch for NixOS and darwin.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Couldn't test functionality right now, but apart from comments I have left, everything looks like it is in order.
Documentation looks good too. It made sense to me, but I am familiar with running torch in R on a GPU, so I can't say whether it is clear to someone who has never tried before.
Maybe we leave a note in the language docs for someone to open a PR or issue if they have the hardware and can test it? I don't think it would be hard to add more logic to the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there much value to keeping a binary-only package in Nixpkgs rather than out-of-tree, is it a common dependency?
it wants to download and install the libtorch and liblantern libraries which doesn't work
Our python3Packages.torch
can be/is consumed in C++/cmake projects, even though we haven't split libtorch out yet. Is there a way to undenvor libtorch in rPackages.torch? I'm not familiar with liblantern though
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll have to study python3Packages.torch
but I'm quite a beginner with Nix and doubt I'll be able to do better than this PR 😅
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
liblantern
is a intermediate API between libtorch
and R
, maintained by the developers of R.torch
. I think it was because calling libtorch
from R
was too difficult, and calling libtorch
from R
via python
was too slow.
It would be good to get everything to compile from source. I would support that effort, but I would need a Nix CUDA expert to help because I kept getting issues with CUDA version compatibility. I was not able to successfully compile liblantern
and libtorch
despite spending days on the problem.
This PR bypasses all the complexity of trying to compile liblantern
and libtorch
from source, as well as the complexity of ensuring the correct CUDA libraries are present on the system. Consuming libtorch
from python3Packages.torch
may be possible, but compiling liblantern
against libtorch
didn't work easily for @jbedo over at #273536 (comment).
I feel the value of having this binary-only package in nixpkgs is for R users who are not nix users first. torch
is in CRAN, and all CRAN packages are automatically included in pkgs.rPackages
, so by the principle of least surprise, it should just work. The alternative is to cause it to fail in a way that directs the user to the Nix R wiki or something similar with detailed instructions for setting up their first overlay.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Compiling liblantern dosen't seem that difficult in itself, the main issue is it needs and old version of torch and doesn't work with python3Packages.torch
. Having it work out of the box, even if we need a binary, is a goal since we want to maximise CRAN/BioC functionality. We do need to patch these binaries though to at least work with NixOS out of the box without tricks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We do need to patch these binaries though to at least work with NixOS out of the box without tricks.
Would a wrapper do the trick? Wrap the binary to use nixglhost ootb?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would a wrapper do the trick? Wrap the binary to use nixglhost ootb?
If wrapping liblantern
and libtorch
worked, that sounds fine to me, but I am not sure we should wrap R
in nixglhost
by default on NixOS. I don't know whether it would cause any problems, it just seems excessive.
You could have a look at how I patched the binaries over at #273536. The important parts are the various patchelf
commands and pkgs.cudaPackages.autoAddOpenGLRunpathHook
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be good to get everything to compile from source. I would support that effort, but I would need a Nix CUDA expert to help because I kept getting issues with CUDA version compatibility
We can try working together
Would a wrapper do the trick? Wrap the binary to use nixglhost ootb?
There shouldn't be any need, you just autoPatchelf and/or addDriverRunpath to link the relevant libraries directly
If wrapping liblantern and libtorch worked, that sounds fine to me, but I am not sure we should wrap R in nixglhost by default on NixOS
Correct, we shouldn't
pkgs.cudaPackages.autoAddOpenGLRunpathHook.
Note it's pkgs.autoAddDriverRunpath
now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have amended the docs to explain this is only for non-NixOS linux and that we are looking for people to fix torch for NixOS and darwin.
For fixing this for NixOS I would need help, as I'm not running NixOS and don't have experience with patching binaries. Should we merge this one and then continue with another PR, or try to fix it as part of this PR?
ced2478
to
7c2dcd4
Compare
pkgs.rPackages.torch | ||
pkgs.R |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this be an rWrapper.override instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we drop in a shell this is not strictly required, but would be needed if calling R non-interactively afaik
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. This confuses people, let's not advertise this and use rWrapper instead: https://mastodon.acm.org/@[email protected]/112745541961800402
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks, done with latest push!
7c2dcd4
to
f990eda
Compare
f990eda
to
6b4943e
Compare
torch = old.torch.overrideAttrs (attrs: { | ||
preConfigure = '' | ||
patchShebangs configure | ||
''; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Am I to understand that the original package was source-built? If so, I think we should keep a source-built torch
and add a separate torch-bin
attribute for the binary package?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The original R torch package was indeed source built but didn't provide the torch library, it suggested users to download and install it on first run
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
was indeed source built but didn't provide the torch library, it suggested users to download and install it on first run
oh
nativeBuildInputs = attrs.nativeBuildInputs or [ ] ++ [ | ||
pkgs.autoPatchelfHook | ||
pkgs.autoAddDriverRunpath | ||
]; | ||
buildInputs = attrs.buildInputs or [ ] ++ [ | ||
"${lib.getLib R}/lib/R" # libR | ||
pkgs.zlib | ||
] ++ lib.optionals pkgs.config.cudaSupport [ | ||
pkgs.cudaPackages.cuda_cudart | ||
]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These changes are sufficient to do something like
> library(torch)
> torch_sum(torch_randn(c(10, 10), device="cuda"))
However, normally libtorch_cuda links e.g. libcublas, etc, which I didn't have to explicitly mention. I suppose that means they're not listed in DT_NEEDED and are loaded by dlopen instead. Either way I expect all related functionality to be broken and I can't allocate the time to do more tests unfortunately
I also see there's a lot of garbage in the outputs, e.g. a vendored copy of libcudart.so
. You could link those by telling autoPatchelf to add $ORIGIN
, but I'd advise against it and recommend that you delete the vendored libraries. Cuda libraries tend to need ad hoc patchelfing as well, we've had issues with that before. Cuda libraries that originate from PyPi are usually broken, which is why we link the ones from cudaPackages
instead
nixglhost-sources = pkgs.fetchFromGitHub { | ||
owner = "numtide"; | ||
repo = "nix-gl-host"; | ||
rev = "main"; | ||
# Replace this with the hash Nix will complain about, TOFU style. | ||
hash = ""; | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should be documented in its own chapter rather than in the R section: #264433 (has been stale for a while, didn't get the chance to follow up) but maybe it's ok to temporarily merge this section and then replace it with a short reference when the chapter is added
rpkgs = builtins.attrValues { | ||
inherit (pkgs.rPackages) | ||
torch; | ||
}; | ||
|
||
wrapped_pkgs = pkgs.rWrapper.override { | ||
packages = [ rpkgs ]; | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not just packages = with rPackages; ...
?
torch; | ||
}; | ||
|
||
wrapped_pkgs = pkgs.rWrapper.override { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
h'm, personally, I read rWrapper.override
as a "wrapped interpreter" (wrapped so it knows where to find the packages), not as "wrapped packages"; e.g. I'd write python = python3.withPackages ...
; I also saw people write env = python3.withPackages ...
pkgs
makes me think of a package set, which this value is not
Description of changes
With this PR, I suggest that we use the prebuild binary supplied by the mlverse team. Currently, the {torch} package installs, but at first run, it wants to download and install the libtorch and liblantern libraries which doesn't work. Instead, we could use the prebuilt binaries of the
{torch}
package which simplifies the whole process, as described in their documentation: https://torch.mlverse.org/docs/articles/installation#pre-builtThis PR handles both the CPU and GPU versions for Linux. I have also updated the documentation to explain how to use it in a second commit.
Things done
nix.conf
? (See Nix manual)sandbox = relaxed
sandbox = true
nix-shell -p nixpkgs-review --run "nixpkgs-review rev HEAD"
. Note: all changes have to be committed, also see nixpkgs-review usage./result/bin/
)Add a 👍 reaction to pull requests you find important.