Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Get device capability #734

Merged
merged 7 commits into from
Nov 20, 2021
Merged

Conversation

rdinnager
Copy link
Collaborator

@rdinnager rdinnager commented Nov 8, 2021

This PR adds a cuda_get_device_properties(device) function which returns the major and minor cuda capability of device (specified as an integer), which fixes #729. This turned out to be more difficult than I imagined (with my limited C++ abilities), and I had to dig around in the torch source code substantially to find examples that helped me figure out what to do. For that reason, this definitely needs a code review, because though what I've done does seem to work, I don't 100% understand everything I did (though I feel like I'm inching closer to full understanding), and so it might not be strictly correct or optimal.

Also, apologies for the extra commits. I accidentally included changes from my previous PR originally. So I made an extra commit to reverse those changes so this doesn't overlap with the previous PR. Hopefully this isn't too annoying. If it is let me know and I can try and squash my commits or something..

@dfalbel dfalbel added the lantern Use this label if your PR affects lantern so it's built in the CI label Nov 8, 2021
Copy link
Member

@dfalbel dfalbel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great @rdinnager ! Thanks!! The comments are minor things that we need to tweak in order to also be able to build without CUDA.

__CUDACC__ is supposed to tell code is being compiled by nvcc or not according to this SO post.

The full CI failure will be solved in the next commit as GH actions we build lantern manually.
Let me know if you have additional questions.

@@ -4,6 +4,7 @@

#include "lantern/lantern.h"

#include <ATen/cuda/CUDAContext.h>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You will need to wrap this around something like #ifdef __CUDACC__ to not include this header when CUDA is not available.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, good point. I only tested this on cuda compiled torch previously, and I didn't think about this. This makes perfect sense. Done.

void* _lantern_cuda_get_device_capability(int64_t device)
{
LANTERN_FUNCTION_START
cudaDeviceProp * devprop = at::cuda::getDeviceProperties(device);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Simlarly you can wrap the body of the function in #ifdef __CUDACC__ and just raise a std::runtime_error() if that is not defined.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, that is done too.

Copy link
Collaborator Author

@rdinnager rdinnager Nov 9, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, __CUDACC__ does not seem to be defined even when compiling with ENV{CUDA} defined on my local computer (Windows). This: https://docs.nvidia.com/cuda/cuda-compiler-driver-nvcc/index.html#nvcc-identification-macro suggests __CUDACC__ is only defined when compiling .cu files, but __NVCC__ should be defined for .cpp files as well. Nevertheless testing for __NVCC__ does not seem to work either on my local computer. Some conversation here: boostorg/signals2#53 suggests that nvcc by default uses VC++ for .cpp files anyway and so __NVCC__ might not be set in that case. I'm not sure if this is just a windows issue. I am going to make a commit to run CI, just to see whether this works on different platforms. We may have to set a custom define in the CMakeLists.txt based on ENV{CUDA} and test for that instead.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ohh ! It actually makes sense as it seems that cuda_add_library will only compile .cu files with nvcc.
It seems that we could add: set_source_files_properties(test.cpp PROPERTIES LANGUAGE CUDA) as suggested here when CUDA is enabled - somewhere in here: https://github.com/mlverse/torch/blob/master/lantern/CMakeLists.txt#L133

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, set_source_files_properties(src/Cuda.cpp PROPERTIES LANGUAGE CUDA) at first just caused Cuda.cpp to not get compiled at all. After adding enable_language(CUDA) to the CMakeLists.txt, nvcc was used to compile Cuda.cpp but then there was a compilation error. Likely something to do with incorrect flags or some such. Seemed like more trouble than it was worth, so I ended up just setting a definition using set_source_files_properties(src/Cuda.cpp PROPERTIES COMPILE_DEFINITIONS __NVCC__). This solution works, at least on my machine. Is this an acceptable solution do you think? I could change the flag to something other than __NVCC__ too, in case that makes it easier to remember it is something not being set by the compiler, but manually?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This solution looks great @rdinnager ! Thanks! Let's leave __NVCC__, we can change it later if something comes up.

@rdinnager
Copy link
Collaborator Author

Well, looks like all CI failed because it still says that _lantern_cuda_get_device_capability is an undefined symbol, even though lantern was built this time. What am I missing here?

@rdinnager
Copy link
Collaborator Author

Oh, do I need to do this: ab03e81 ?

@rdinnager
Copy link
Collaborator Author

Oh, do I need to do this: ab03e81 ?

Oops, quoted out the wrong folder. Will try again after attempting the set_source_files_properties solution.

@dfalbel dfalbel merged commit 5e011e7 into mlverse:master Nov 20, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lantern Use this label if your PR affects lantern so it's built in the CI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

torch.cuda.get_device_capability() for R?
2 participants