rPackages.torch: use binary package for installation #328980

b-rodrigues · 2024-07-21T20:00:23Z

Description of changes

With this PR, I suggest that we use the prebuild binary supplied by the mlverse team. Currently, the {torch} package installs, but at first run, it wants to download and install the libtorch and liblantern libraries which doesn't work. Instead, we could use the prebuilt binaries of the {torch} package which simplifies the whole process, as described in their documentation: https://torch.mlverse.org/docs/articles/installation#pre-built

This PR handles both the CPU and GPU versions for Linux. I have also updated the documentation to explain how to use it in a second commit.

Things done

Add a 👍 reaction to pull requests you find important.

PhDyellow · 2024-07-23T06:16:08Z

When you say "building the package from source", are you referring to the R package torch or the C binaries libtorch and liblantern? They are separate build processes. Building the R package is trivial, and downloading the precompiled R package doesn't actually also download the C binaries as part of the R package binary. If you look at the R source code in torch, you will find that the C binaries or source are downloaded by the R package either during the installation step or after the first run.

Please compare the code between this PR and #273536. Also have a look at my other attempts to make later versions of torch work on NixOS at https://github.com/PhDyellow/nixpkgs/tree/r-torch-master and https://github.com/PhDyellow/nixpkgs/tree/r-torch-master-12. I had a major project that benefited massively from GPU acceleration, so I took the time to get it working, so my fork of nixpkgs has a functioning torch package that uses a GPU. The hard part is getting all the cuda libraries lining up with the liblantern and libtorch binaries and making sure the precompiled binaries are searching in the right places.

I could not successfully build the GPU-capable C binaries from source, but I eventually managed to get the GPU-capable pre-compiled binaries to work, following the example of libtorch-bin. For non-NixOS systems, you also need https://github.com/numtide/nix-gl-host.

I felt the code I have written in #273536 is very much a hack, and was not ready to push the code to nixpkgs without a few more people looking over it to make improvements. No-one commented on the draft PR until now.

You shouldn't need a separate torch-gpu package, as it falls back to CPU-only by default, and nixpkgs already has flags for enabling CUDA, which we can examine to give the correct version of torch. Moving forward, I suggest taking #273536, or making a new PR from one of my newer branches, and giving it a proper test and code review. It will save you a few weeks of work unless you are already a Nix CUDA master.

Apologies if something is unclear, I don't have a lot of time at the moment to proofread, but happy to answer questions when I am able to get around to it.

PhDyellow · 2024-07-23T20:04:29Z

If you can get the GPU version to run without all the messy code I added, I will be happy to close my PR and support this one.
I do not remember whether I tried to download the whole R package as a precompiled binary with GPU capabilities, but I don't think I did.

jbedo · 2024-07-23T21:45:12Z

This does seem cleaner if GPU can be made to run.

PhDyellow · 2024-07-26T21:54:08Z

Great! 6 lines of code changed is much nicer than 70, and much more maintainable.

I suggest making a wiki entry or very visible doc strings as part of this PR if possible, as using nixglhost is not obvious or necessary in all cases, and I don't feel comfortable wrapping every call to R with nixglhost silently.

The user might also want to know how to download the CPU version, if hard drive space is a concern.

I can't test on NixOS right now as my dev machine is in for service. @jbedo, are you able to test on NixOS?

There might be some interest in compiling liblantern and libtorch from source, but I think that is a separate PR. We would need help from the CUDA team, because @jbedo and I both had trouble getting them to compile.

b-rodrigues · 2024-07-28T07:50:05Z

thanks for your help @PhDyellow and @jbedo this PR is ready to be reviewed!
There are also prebuilt binary for macOS which we could provide as well, by using "cpu-intel" or "cpu-m1", but I don't have the hardware to test it.

PhDyellow · 2024-07-29T01:20:36Z

doc/languages-frameworks/r.section.md

+The `torch` package included in `nixpkgs` is based on the binary packages by the
+mlverse team. By default, the CPU version of the package is installed. To enable
+GPU acceleration, set `config.cudaSupport` to true when importing `nixpkgs` in a
+`shell.nix`. You will also need to use `nixglhost`. Here is an example of such a


My understanding is that you only need nixglhost if you are not on NixOS.

This should be the case, but I don't see any patching happening in this PR so likely it won't work on NixOS either. Should be possible to patch the binaries to operate without tricks on NixOs.

Yes, it is possible to patch the liblantern and libtorch binaries, I did it successfully in #273536. After patching, the binaries ran on both NixOS and nix in CentOS. I didn't try patching the full bundle being installed in this PR though.

Whether nixglhost works on NixOS is another thing to test.

@Kupac tested on NixOS and confirmed he needed nixglhost as well. @Kupac could you perchance leave a review and describe the process? Is the description from the documentation aligned with what you had to do?

tested on NixOS and confirmed he needed nixglhost as well.

This is a bug then, the appropriate patchelfing is necessary. Nixglhost should not be required on NixOS

You can use LD_DEBUG=libs R to find out which binaries (are loading cuda and) need the patching

I have amended the docs to explain this is only for non-NixOS linux and that we are looking for people to fix torch for NixOS and darwin.

doc/languages-frameworks/r.section.md

PhDyellow

Couldn't test functionality right now, but apart from comments I have left, everything looks like it is in order.

Documentation looks good too. It made sense to me, but I am familiar with running torch in R on a GPU, so I can't say whether it is clear to someone who has never tried before.

pkgs/development/r-modules/default.nix

PhDyellow · 2024-07-29T01:36:19Z

thanks for your help @PhDyellow and @jbedo this PR is ready to be reviewed! There are also prebuilt binary for macOS which we could provide as well, by using "cpu-intel" or "cpu-m1", but I don't have the hardware to test it.

Maybe we leave a note in the language docs for someone to open a PR or issue if they have the hardware and can test it? I don't think it would be hard to add more logic to the accel variable you have set up.

SomeoneSerge · 2024-07-31T11:50:14Z

pkgs/development/r-modules/default.nix

Is there much value to keeping a binary-only package in Nixpkgs rather than out-of-tree, is it a common dependency?

it wants to download and install the libtorch and liblantern libraries which doesn't work

Our python3Packages.torch can be/is consumed in C++/cmake projects, even though we haven't split libtorch out yet. Is there a way to undenvor libtorch in rPackages.torch? I'm not familiar with liblantern though

I'll have to study python3Packages.torch but I'm quite a beginner with Nix and doubt I'll be able to do better than this PR 😅

liblantern is a intermediate API between libtorch and R, maintained by the developers of R.torch. I think it was because calling libtorch from R was too difficult, and calling libtorch from R via python was too slow.

It would be good to get everything to compile from source. I would support that effort, but I would need a Nix CUDA expert to help because I kept getting issues with CUDA version compatibility. I was not able to successfully compile liblantern and libtorch despite spending days on the problem.

This PR bypasses all the complexity of trying to compile liblantern and libtorch from source, as well as the complexity of ensuring the correct CUDA libraries are present on the system. Consuming libtorch from python3Packages.torch may be possible, but compiling liblantern against libtorch didn't work easily for @jbedo over at #273536 (comment).

I feel the value of having this binary-only package in nixpkgs is for R users who are not nix users first. torch is in CRAN, and all CRAN packages are automatically included in pkgs.rPackages, so by the principle of least surprise, it should just work. The alternative is to cause it to fail in a way that directs the user to the Nix R wiki or something similar with detailed instructions for setting up their first overlay.

Compiling liblantern dosen't seem that difficult in itself, the main issue is it needs and old version of torch and doesn't work with python3Packages.torch. Having it work out of the box, even if we need a binary, is a goal since we want to maximise CRAN/BioC functionality. We do need to patch these binaries though to at least work with NixOS out of the box without tricks.

We do need to patch these binaries though to at least work with NixOS out of the box without tricks.

Would a wrapper do the trick? Wrap the binary to use nixglhost ootb?

Would a wrapper do the trick? Wrap the binary to use nixglhost ootb?

If wrapping liblantern and libtorch worked, that sounds fine to me, but I am not sure we should wrap R in nixglhost by default on NixOS. I don't know whether it would cause any problems, it just seems excessive.

You could have a look at how I patched the binaries over at #273536. The important parts are the various patchelf commands and pkgs.cudaPackages.autoAddOpenGLRunpathHook.

It would be good to get everything to compile from source. I would support that effort, but I would need a Nix CUDA expert to help because I kept getting issues with CUDA version compatibility

We can try working together

Would a wrapper do the trick? Wrap the binary to use nixglhost ootb?

There shouldn't be any need, you just autoPatchelf and/or addDriverRunpath to link the relevant libraries directly

If wrapping liblantern and libtorch worked, that sounds fine to me, but I am not sure we should wrap R in nixglhost by default on NixOS

Correct, we shouldn't

pkgs.cudaPackages.autoAddOpenGLRunpathHook.

Note it's pkgs.autoAddDriverRunpath now

I have amended the docs to explain this is only for non-NixOS linux and that we are looking for people to fix torch for NixOS and darwin.

For fixing this for NixOS I would need help, as I'm not running NixOS and don't have experience with patching binaries. Should we merge this one and then continue with another PR, or try to fix it as part of this PR?

SomeoneSerge · 2024-08-04T20:23:15Z

doc/languages-frameworks/r.section.md

+    pkgs.rPackages.torch
+    pkgs.R


Shouldn't this be an rWrapper.override instead?

If we drop in a shell this is not strictly required, but would be needed if calling R non-interactively afaik

I see. This confuses people, let's not advertise this and use rWrapper instead: https://mastodon.acm.org/@[email protected]/112745541961800402

thanks, done with latest push!

SomeoneSerge · 2024-08-16T13:31:50Z

pkgs/development/r-modules/default.nix

-    torch = old.torch.overrideAttrs (attrs: {
-      preConfigure = ''
-        patchShebangs configure
-      '';


Am I to understand that the original package was source-built? If so, I think we should keep a source-built torch and add a separate torch-bin attribute for the binary package?

The original R torch package was indeed source built but didn't provide the torch library, it suggested users to download and install it on first run

was indeed source built but didn't provide the torch library, it suggested users to download and install it on first run

oh

SomeoneSerge · 2024-08-16T13:46:00Z

pkgs/development/r-modules/default.nix

+       nativeBuildInputs = attrs.nativeBuildInputs or [ ] ++ [
+         pkgs.autoPatchelfHook
+         pkgs.autoAddDriverRunpath
+       ];
+      buildInputs = attrs.buildInputs or [ ] ++ [
+        "${lib.getLib R}/lib/R" # libR
+        pkgs.zlib
+      ] ++ lib.optionals pkgs.config.cudaSupport [
+        pkgs.cudaPackages.cuda_cudart
+      ];


These changes are sufficient to do something like

> library(torch) > torch_sum(torch_randn(c(10, 10), device="cuda"))

However, normally libtorch_cuda links e.g. libcublas, etc, which I didn't have to explicitly mention. I suppose that means they're not listed in DT_NEEDED and are loaded by dlopen instead. Either way I expect all related functionality to be broken and I can't allocate the time to do more tests unfortunately

I also see there's a lot of garbage in the outputs, e.g. a vendored copy of libcudart.so. You could link those by telling autoPatchelf to add $ORIGIN, but I'd advise against it and recommend that you delete the vendored libraries. Cuda libraries tend to need ad hoc patchelfing as well, we've had issues with that before. Cuda libraries that originate from PyPi are usually broken, which is why we link the ones from cudaPackages instead

SomeoneSerge · 2024-08-16T13:47:35Z

doc/languages-frameworks/r.section.md

+ nixglhost-sources = pkgs.fetchFromGitHub {
+   owner = "numtide";
+   repo = "nix-gl-host";
+   rev = "main";
+   # Replace this with the hash Nix will complain about, TOFU style.
+   hash = "";
+ };


I think this should be documented in its own chapter rather than in the R section: #264433 (has been stale for a while, didn't get the chance to follow up) but maybe it's ok to temporarily merge this section and then replace it with a short reference when the chapter is added

SomeoneSerge · 2024-08-16T13:48:17Z

doc/languages-frameworks/r.section.md

+ rpkgs = builtins.attrValues {
+   inherit (pkgs.rPackages)
+     torch;
+ };
+
+ wrapped_pkgs = pkgs.rWrapper.override {
+   packages = [ rpkgs ];
+ };


Why not just packages = with rPackages; ...?

SomeoneSerge · 2024-08-16T13:50:38Z

doc/languages-frameworks/r.section.md

+     torch;
+ };
+
+ wrapped_pkgs = pkgs.rWrapper.override {


h'm, personally, I read rWrapper.override as a "wrapped interpreter" (wrapped so it knows where to find the packages), not as "wrapped packages"; e.g. I'd write python = python3.withPackages ...; I also saw people write env = python3.withPackages ...

pkgs makes me think of a package set, which this value is not

This comment was marked as outdated.

Sign in to view

ofborg bot added 10.rebuild-darwin: 0 This PR does not cause any packages to rebuild on Darwin 10.rebuild-linux: 0 This PR does not cause any packages to rebuild on Linux labels Jul 21, 2024

This comment was marked as outdated.

Sign in to view

PhDyellow mentioned this pull request Jul 24, 2024

rPackages.torch: Fix torch #273536

Draft

13 tasks

This comment was marked as outdated.

Sign in to view

b-rodrigues closed this Jul 25, 2024

b-rodrigues reopened this Jul 26, 2024

This comment was marked as outdated.

Sign in to view

ofborg bot added 10.rebuild-linux: 1-10 10.rebuild-linux: 1 and removed 10.rebuild-linux: 0 This PR does not cause any packages to rebuild on Linux labels Jul 27, 2024

github-actions bot added the 8.has: documentation This PR adds or changes documentation label Jul 28, 2024

b-rodrigues force-pushed the r-torch-cpu branch 2 times, most recently from 6790962 to e69d654 Compare July 28, 2024 07:46

b-rodrigues marked this pull request as ready for review July 28, 2024 07:50

b-rodrigues requested a review from jbedo as a code owner July 28, 2024 07:50

b-rodrigues marked this pull request as draft July 28, 2024 07:52

This comment was marked as outdated.

Sign in to view

b-rodrigues force-pushed the r-torch-cpu branch from e69d654 to 770ad9e Compare July 28, 2024 07:59

b-rodrigues marked this pull request as ready for review July 28, 2024 08:00

PhDyellow reviewed Jul 29, 2024

View reviewed changes

doc/languages-frameworks/r.section.md Show resolved Hide resolved

PhDyellow reviewed Jul 29, 2024

View reviewed changes

pkgs/development/r-modules/default.nix Outdated Show resolved Hide resolved

SomeoneSerge reviewed Jul 31, 2024

View reviewed changes

b-rodrigues force-pushed the r-torch-cpu branch 2 times, most recently from ced2478 to 7c2dcd4 Compare August 1, 2024 19:08

SomeoneSerge reviewed Aug 4, 2024

View reviewed changes

rPackages.torch: use binary

291f72c

b-rodrigues force-pushed the r-torch-cpu branch from 7c2dcd4 to f990eda Compare August 9, 2024 13:16

doc: update R section to explain how to use gpu acceleration for torch

6b4943e

b-rodrigues force-pushed the r-torch-cpu branch from f990eda to 6b4943e Compare August 9, 2024 18:07

SomeoneSerge reviewed Aug 16, 2024

View reviewed changes

rPackages.torch: (torch-bin) link cuda libraries

3482c11

SomeoneSerge reviewed Aug 16, 2024

View reviewed changes

PhDyellow mentioned this pull request Sep 26, 2024

rPackages: compile r-torch from source #344593

Draft

20 tasks

FliegendeWurst added the awaiting_changes (old Marvin label, do not use) label Dec 4, 2024

rPackages.torch: use binary package for installation #328980

Are you sure you want to change the base?

rPackages.torch: use binary package for installation #328980

Conversation

b-rodrigues commented Jul 21, 2024 • edited Loading

Description of changes

Things done

This comment was marked as outdated.

PhDyellow commented Jul 23, 2024

This comment was marked as outdated.

PhDyellow commented Jul 23, 2024

jbedo commented Jul 23, 2024

This comment was marked as outdated.

This comment was marked as outdated.

PhDyellow commented Jul 26, 2024

b-rodrigues commented Jul 28, 2024

This comment was marked as outdated.

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

SomeoneSerge Aug 4, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

PhDyellow left a comment

Choose a reason for hiding this comment

PhDyellow commented Jul 29, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

b-rodrigues commented Jul 21, 2024 •

edited

Loading

SomeoneSerge Aug 4, 2024 •

edited

Loading