-
Notifications
You must be signed in to change notification settings - Fork 283
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
drop unnecessary CUDA stub libraries from $LIBRARY_PATH #2793
drop unnecessary CUDA stub libraries from $LIBRARY_PATH #2793
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some minor comments, will test it now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need to move the for loop to after stubs_dir
has been populated
@ocaisa Any updates? |
@Flamefire Sorry for not following up on this but I would have really liked another set of eyes to look over this and think of some of the potential unintended consequences. |
Yes would be great to have someone double-checking my assumption from above:
In other words: Why would the stub libraries be required if we have the full libraries available? The whole purpose of the former is to provide placeholders for linking purposes when the latter aren't available. FWIW: I build PyTorch and TF (with tests) with this successfully. |
Just to confirm if I understand correctly:
By this you mean that the full libraries are also available from the CUDA installation? In that case, I agree with you, I don't see the purpose. As you are no doubt aware, the stubs libraries should be built against when cross compiling on nodes that don't have the driver installed (i.e. non-GPU nodes), and should provide stubs for the libraries that are normally provided by the driver installation (such as |
Yes, see the output in the PR description: Basically all we need is
That's exactly the intention: For every library in the stubs folder, check if it is in the Result:
Note how the stubs folder is only in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Have communicated the change with the other maintainers and no-one raised concerns about modifying the default stubs library of CUDA
Test report by @ocaisa Overview of tested easyconfigs (in order)
Build succeeded for 1 out of 1 (1 easyconfigs in total) |
Test report by @ocaisa Overview of tested easyconfigs (in order)
Build succeeded for 1 out of 1 (1 easyconfigs in total) |
Checked again that it does as expected. Going in, thanks for your patience @Flamefire |
We briefly discussed this during the last EasyBuild conf call, and I agree it makes sense to make this change. It does make me wonder though: why are stubs libraries provided when the real libraries are also provided?! Maybe @ajdecon can help us out here? |
@boegel : This nerd-sniped me for a bit. 😉 My understanding is that the stub libraries are used to run binaries that are linked against CUDA, but which support running in a CPU-only mode. Those hosts can have the stub libraries installed to avoid link errors at runtime, without having to install the whole CUDA toolkit. (Or include the much larger GPU-enabled CUDA libs.) An example of this is the Triton Inference Server, where we provide container images for both GPU hosts and CPU-only hosts. The build script for the CPU build of the container copies the stub libs into the container rather than installing the full CUDA toolkit. |
(created using
eb --new-pr
)Found stuff like this in the output of PyTorch builds:
So I checked for duplicate files and found a lot, see command and output below.
Hence I added some code to remove libraries from the
stubs
folder which are also present in thelib64
folder. I can't imagine having a need for stub libraries where we have the "real" ones, so this should avoid issues.