-
-
Notifications
You must be signed in to change notification settings - Fork 80
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
examples fail on kepler GPU #313
Comments
First of all - thank you for reporting this. My work on cuda-api-wrappers is not supported by NVIDIA, nor by some GPU-specializing lab, so I don't have access to machines with a range of GPUs to test this on. Now, the first thing I'm going to need is the exact error messages you're getting from each of the failing example programs. Let's start with the Titan V - what are the failure? I'll also mention that older cards may simply not support some of the features I'm trying to use with some of my examples. In that case, I'll need to characterize what it is, exactly, that they're unable to do, and work around that. Finally - please try using the development branch, just in case some recent fix has somehow affected what you're seeing. |
Thank you for getting back! Here are the output from some of the tests. Titan machine errors may be related to the fact that we have 2 GPUs and the tests run/failed on the device 0, which is GeForce GTX 1050Ti. The Tesla K40c machines also have 2 GPUs, but the device 0 is K40c. --Titan V, simpleStreams, error may have to do with the fact we have 2 GPUs on the computer--
--Titan V, vectorAddMMAP, note we have 2 GPUs on the computer, device 0 is NVIDIA GeForce GTX 1050 Ti --
--Tesla K40c, vectorAdd, also have 2 GPUs on the computer, Device 0 is Tesla K40c--
--Tesla K40c, simpleStream, also have 2 GPUs on the computer, Device 0 is Tesla K40c--
|
So, there seems to be some kind of issue with |
The first two bugs this has exposed are not too bad. The third one will need a little more work. Most of them can be overcome by making some appropriate device the "current" device, but I'm intentionally sparing my users having to know about a global "current device". For now, please retry with the HEAD of the development branch, and let me know if/what has changed. Or you can wait until I'm done with #316, which can take another while. |
Now the Titan V simpleStream test is fixed! However the other 3 tests still behave the same with the same output and error messages. (only memory location values are different) |
Ok, about vectorAddMMAP: What is your setting |
Also, about #316 - I should clarify that it's not actually a bug, nor something that you can't work with. The only problem there is that some API calls require you to have set the current context / current device somehow. So, for example, if you've allocated some pinned host memory with one device being current, and you want to copy that memory, that device has to be current or you might get an error. |
About the
|
Regarding Regarding |
It shouldn't have. But - if you:
does that affect the other programs? Specifically, vectorAddMMAP? |
…ext, primary contexts, and ensuring their existence in various circumstanves: * Renamed: `context::current::detail_::scoped_current_device_fallback_t` -> `scoped_existence_ensurer_t` `context::current::detail_::scoped_context_existence_ensurer` * context::current::scoped_override_t` now has a ctor which accepts. `primary_context_t&&`'s - to hold on to their PC reference which they are about to let go of. * Moved: `context::current::scoped_override_t` is now implemented in the multi-wrapper implementations directory; consequently * Moved the implementations of `module_t::get_kernel()` and `module::create<Creator>` to the multi-wrapper directory, since they use `context::current::scoped_override_t`. * Added inclusion of `cuda/api/multi_wrapper_impls/module.hpp` to some example code. * Made a device current in some examples to avoid having no current context when executing certain operations with no wrappers (e.g. memcpy with host-side addresses) * When allocating managed or pinned-host memory, now increasing the reference of some context by 1 (choosing the primary context of device 0 since that's the safest), and decreasing it again on destruction. That guarantees that operations involving that allocated memory will not occur with no constructed contexts. * Corresponding comment changes on the `allocate()` and `free()` methods for pinned-host and managed memory. * Factored out the code in `context_t::is_primary()` to a function, `cuda::context::current::detail_::is_primary`, which can now also be used via `cuda::context::current::is_primary()`. * Kernel launch functions now ensure a launch only occurs / is enqueued within a current context (any context). * Getting the current device now ensures its primary context is also active (which getting an arbitrary device does not do so). * Added doxygen comment for `device::detail_::wrap()` mentioning the primary context reference behavior.
…t context, primary contexts, and ensuring their existence in various circumstanves: * Renamed: `context::current::detail_::scoped_current_device_fallback_t` -> `scoped_existence_ensurer_t` `context::current::detail_::scoped_context_existence_ensurer` * context::current::scoped_override_t` now has a ctor which accepts. `primary_context_t&&`'s - to hold on to their PC reference which they are about to let go of. * Moved: `context::current::scoped_override_t` is now implemented in the multi-wrapper implementations directory; consequently * Moved the implementations of `module_t::get_kernel()` and `module::create<Creator>` to the multi-wrapper directory, since they use `context::current::scoped_override_t`. * Added inclusion of `cuda/api/multi_wrapper_impls/module.hpp` to some example code. * Made a device current in some examples to avoid having no current context when executing certain operations with no wrappers (e.g. memcpy with host-side addresses) * When allocating managed or pinned-host memory, now increasing the reference of some context by 1 (choosing the primary context of device 0 since that's the safest), and decreasing it again on destruction. That guarantees that operations involving that allocated memory will not occur with no constructed contexts. * Corresponding comment changes on the `allocate()` and `free()` methods for pinned-host and managed memory. * Factored out the code in `context_t::is_primary()` to a function, `cuda::context::current::detail_::is_primary`, which can now also be used via `cuda::context::current::is_primary()`. * Kernel launch functions now ensure a launch only occurs / is enqueued within a current context (any context). * Getting the current device now ensures its primary context is also active (which getting an arbitrary device does not do so). * Added doxygen comment for `device::detail_::wrap()` mentioning the primary context reference behavior.
…t context, primary contexts, and ensuring their existence in various circumstanves: * Renamed: `context::current::detail_::scoped_current_device_fallback_t` -> `scoped_existence_ensurer_t` `context::current::detail_::scoped_context_existence_ensurer` * context::current::scoped_override_t` now has a ctor which accepts. `primary_context_t&&`'s - to hold on to their PC reference which they are about to let go of. * Moved: `context::current::scoped_override_t` is now implemented in the multi-wrapper implementations directory; consequently * Moved the implementations of `module_t::get_kernel()` and `module::create<Creator>` to the multi-wrapper directory, since they use `context::current::scoped_override_t`. * Added inclusion of `cuda/api/multi_wrapper_impls/module.hpp` to some example code. * Made a device current in some examples to avoid having no current context when executing certain operations with no wrappers (e.g. memcpy with host-side addresses) * When allocating managed or pinned-host memory, now increasing the reference of some context by 1 (choosing the primary context of device 0 since that's the safest), and decreasing it again on destruction. That guarantees that operations involving that allocated memory will not occur with no constructed contexts. * Corresponding comment changes on the `allocate()` and `free()` methods for pinned-host and managed memory. * Factored out the code in `context_t::is_primary()` to a function, `cuda::context::current::detail_::is_primary`, which can now also be used via `cuda::context::current::is_primary()`. * Kernel launch functions now ensure a launch only occurs / is enqueued within a current context (any context). * Getting the current device now ensures its primary context is also active (which getting an arbitrary device does not do so). * Added doxygen comment for `device::detail_::wrap()` mentioning the primary context reference behavior.
…t context, primary contexts, and ensuring their existence in various circumstanves: * Renamed: `context::current::detail_::scoped_current_device_fallback_t` -> `scoped_existence_ensurer_t` `context::current::detail_::scoped_context_existence_ensurer` * context::current::scoped_override_t` now has a ctor which accepts. `primary_context_t&&`'s - to hold on to their PC reference which they are about to let go of. * Moved: `context::current::scoped_override_t` is now implemented in the multi-wrapper implementations directory; consequently * Moved the implementations of `module_t::get_kernel()` and `module::create<Creator>` to the multi-wrapper directory, since they use `context::current::scoped_override_t`. * Added inclusion of `cuda/api/multi_wrapper_impls/module.hpp` to some example code. * Made a device current in some examples to avoid having no current context when executing certain operations with no wrappers (e.g. memcpy with host-side addresses) * When allocating managed or pinned-host memory, now increasing the reference of some context by 1 (choosing the primary context of device 0 since that's the safest), and decreasing it again on destruction. That guarantees that operations involving that allocated memory will not occur with no constructed contexts. * Corresponding comment changes on the `allocate()` and `free()` methods for pinned-host and managed memory. * Factored out the code in `context_t::is_primary()` to a function, `cuda::context::current::detail_::is_primary`, which can now also be used via `cuda::context::current::is_primary()`. * Kernel launch functions now ensure a launch only occurs / is enqueued within a current context (any context). * Getting the current device now ensures its primary context is also active (which getting an arbitrary device does not do so). * Added doxygen comment for `device::detail_::wrap()` mentioning the primary context reference behavior.
Reporter, can you please re-check with the latest version of the code (or beta release 0.5.1b3)? |
Same behavior. With VectorMMAP on TitanV, I got
|
This builds in the current directory, not in |
Typo. I didn't have the dot in the first place. results still the same. the successful test simpleStreams suggests the test is run on the GeForce, not Titan V. |
@rainli323 : Can you run a verbose build (e.g. |
Here's what I did:
Here's what I got:
This does not generate the executable in |
Well, if you build the kernel for a Volta card (7.0), then vectorAddMMAP, which is hard-coded to use your first GPU, the Pascal 6.1 card, will indeed fail to load the kernel. The thing is, the kernel might not get auto-rebuilt if you change CMAKE_CUDA_ARCHITECTURE. I'm not sure why exactly. |
Ping. |
…t context, primary contexts, and ensuring their existence in various circumstanves: * Renamed: `context::current::detail_::scoped_current_device_fallback_t` -> `scoped_existence_ensurer_t` `context::current::detail_::scoped_context_existence_ensurer` * context::current::scoped_override_t` now has a ctor which accepts. `primary_context_t&&`'s - to hold on to their PC reference which they are about to let go of. * Moved: `context::current::scoped_override_t` is now implemented in the multi-wrapper implementations directory; consequently * Moved the implementations of `module_t::get_kernel()` and `module::create<Creator>` to the multi-wrapper directory, since they use `context::current::scoped_override_t`. * Added inclusion of `cuda/api/multi_wrapper_impls/module.hpp` to some example code. * Made a device current in some examples to avoid having no current context when executing certain operations with no wrappers (e.g. memcpy with host-side addresses) * When allocating managed or pinned-host memory, now increasing the reference of some context by 1 (choosing the primary context of device 0 since that's the safest), and decreasing it again on destruction. That guarantees that operations involving that allocated memory will not occur with no constructed contexts. * Corresponding comment changes on the `allocate()` and `free()` methods for pinned-host and managed memory. * Factored out the code in `context_t::is_primary()` to a function, `cuda::context::current::detail_::is_primary`, which can now also be used via `cuda::context::current::is_primary()`. * Kernel launch functions now ensure a launch only occurs / is enqueued within a current context (any context). * Getting the current device now ensures its primary context is also active (which getting an arbitrary device does not do so). * Added doxygen comment for `device::detail_::wrap()` mentioning the primary context reference behavior.
Well, assuming this is resolved. Reopen if you still see failures. |
I'm trying to use this project but many examples fail on machines that have older GPUs. I have tried on a few machines with Tesla K40c, and one Titan V. Most tests passed on the Titan V except two, but many tests do not pass on K40c, including many flavors of VectorAdd. Could you please help? Here are my specs:
--failed calculations and configurations--
Tesla K40c (compute capability 3.5)
NVIDIA-SMI 470.103.01 Driver Version: 470.103.01 CUDA Version: 11.4
cmake version 3.23.0
c++ (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0
failed tests: asyncAPI, binaryPartitionCG, event_management, execution_control, inlinePTX, p2pBandwidthLatencyTest, simpleIPC, simpleStreams, stream_management, vectorAdd, vectorAddManaged, vectorAddMapped, vectorAddMMAP
--less failed calculations and configurations--
Titan V (compute capability 7.0)
NVIDIA-SMI 470.74 Driver Version: 470.74 CUDA Version: 11.4
cmake version 3.23.0
c++ (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0
failed tests: simpleStreams, vectorAddMMAP
The text was updated successfully, but these errors were encountered: