Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Starting CUDA profiling: initialization error #395

Closed
alsam opened this issue Aug 22, 2022 · 11 comments
Closed

Starting CUDA profiling: initialization error #395

alsam opened this issue Aug 22, 2022 · 11 comments

Comments

@alsam
Copy link

alsam commented Aug 22, 2022

Hello,

the testcase vectorAdd_profiled doesn't work from the box for me:

./vectorAdd_profiled
terminate called after throwing an instance of 'cuda::runtime_error'
  what():  Starting CUDA profiling: initialization error

after adding

git diff .
diff --git a/examples/other/vectorAdd_profiled.cu b/examples/other/vectorAdd_profiled.cu
index a31c937..5d1cffa 100644
--- a/examples/other/vectorAdd_profiled.cu
+++ b/examples/other/vectorAdd_profiled.cu
@@ -27,6 +27,7 @@ __global__ void vectorAdd(const float *A, const float *B, float *C, int numEleme
 
 int main()
 {
+       cuInit(0);
        profile_this_scope();
        cuda::profiling::name_this_thread("The single thread for vectorAdd_profile :-)");
        if (cuda::device::count() == 0) {

I got another error message:

./vectorAdd_profiled
terminate called after throwing an instance of 'cuda::runtime_error'
  what():  Starting CUDA profiling: invalid device context
191│ /**
192│  * Start CUDA profiling for the current process
193│  */
194│ void start()
195│ {
196│         auto status = cuProfilerStart();
197├───────> throw_if_error(status, "Starting CUDA profiling");
198│ }
[New Thread 0x7fffefdff000 (LWP 12056)]
terminate called after throwing an instance of 'cuda::runtime_error'
  what():  Starting CUDA profiling: invalid device context

Thread 1 "vectorAdd_profi" received signal SIGABRT, Aborted.
__pthread_kill_implementation (threadid=<optimized out>, signo=signo@entry=6, no_tid=no_tid@entry=0) at p
thread_kill.c:44
44            return INTERNAL_SYSCALL_ERROR_P (ret) ? INTERNAL_SYSCALL_ERRNO (ret) : 0;
(gdb) bt
#0  __pthread_kill_implementation (threadid=<optimized out>, signo=signo@entry=6, no_tid=no_tid@entry=0)
    at pthread_kill.c:44
#1  0x00007ffff64a1543 in __pthread_kill_internal (signo=6, threadid=<optimized out>)
    at pthread_kill.c:78
#2  0x00007ffff6451998 in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26
#3  0x00007ffff643b53d in __GI_abort () at abort.c:79
#4  0x00007ffff5e99833 in __gnu_cxx::__verbose_terminate_handler ()
    at /usr/src/debug/gcc/libstdc++-v3/libsupc++/vterminate.cc:95
#5  0x00007ffff5ea5cfc in __cxxabiv1::__terminate (handler=<optimized out>)
    at /usr/src/debug/gcc/libstdc++-v3/libsupc++/eh_terminate.cc:48
#6  0x00007ffff5ea5d69 in std::terminate ()
    at /usr/src/debug/gcc/libstdc++-v3/libsupc++/eh_terminate.cc:58
#7  0x00007ffff5ea5fcd in __cxxabiv1::__cxa_throw (obj=<optimized out>,
    tinfo=0x555555569d20 <typeinfo for cuda::runtime_error>,
    dest=0x555555561454 <cuda::runtime_error::~runtime_error()>)
    at /usr/src/debug/gcc/libstdc++-v3/libsupc++/eh_throw.cc:98
#8  0x000055555555a731 in cuda::throw_if_error (status=CUDA_ERROR_INVALID_CONTEXT, message=...)
    at /home/alsam/work/github/cuda-api-wrappers/src/cuda/api/error.hpp:325
#9  0x00005555555594d2 in cuda::profiling::start ()
    at /home/alsam/work/github/cuda-api-wrappers/src/cuda/nvtx/profiling.hpp:197
#10 0x000055555555eab5 in cuda::profiling::scope::scope (this=0x7fffffffe62b)
    at /home/alsam/work/github/cuda-api-wrappers/src/cuda/nvtx/profiling.hpp:272
#11 0x000055555555979f in main ()
    at /home/alsam/work/github/cuda-api-wrappers/examples/other/vectorAdd_profiled.cu:31

Thanks!

@alsam
Copy link
Author

alsam commented Aug 22, 2022

it is ok with this:

git diff .
diff --git a/examples/other/vectorAdd_profiled.cu b/examples/other/vectorAdd_profiled.cu
index a31c937..22408a1 100644
--- a/examples/other/vectorAdd_profiled.cu
+++ b/examples/other/vectorAdd_profiled.cu
@@ -25,8 +25,23 @@ __global__ void vectorAdd(const float *A, const float *B, float *C, int numEleme
        if (i < numElements) { C[i] = A[i] + B[i]; }
 }
 
+void initCuda()
+{
+    CUresult a;
+    CUcontext pctx;
+    CUdevice device;
+    cuInit(0);
+    cuDeviceGet(&device, 0);
+    a = cuCtxCreate(&pctx, 0, device );
+    assert(a == CUDA_SUCCESS);
+    std::cout << "Initialized CUDA" << std::endl;
+}
+
 int main()
 {
+    auto res = cuProfilerInitialize ( "prof.conf", "profiler.csv", CU_OUT_CSV );
+
+    initCuda();
        profile_this_scope();
        cuda::profiling::name_this_thread("The single thread for vectorAdd_profile :-)");
        if (cuda::device::count() == 0) {

as cuInit(0) is not sufficient and the context should be initialized.
It would be nice to update the example and make it more instructive, e.g. cuProfileInitialize is deprecated follow the Nvidia docs, it would be nice get some profiler stats using some API.
Thanks!

@eyalroz
Copy link
Owner

eyalroz commented Aug 23, 2022

Acknowledged. This is a regression I somehow missed with making sure everything is covered by the appropriate driver-related initializations. Will work on this hopefully today and otherwise later this week. Sorry about this.

eyalroz added a commit that referenced this issue Aug 23, 2022
… profiling

* `context::current::detail_::scoped_existence_ensurer_t` will now initialize the CUDA driver if necessary - as part of creating a context when none exists.
* The profiling `scope` class now has a `scoped_existence_ensurer_t` member - which ensures the driver is initialized and that some context is current when profiling begins.
@eyalroz
Copy link
Owner

eyalroz commented Aug 23, 2022

Please try the HEAD of the development branch, which now has the fix.

@alsam
Copy link
Author

alsam commented Aug 24, 2022

Great! Thank you @eyalroz , it would be nice to enrich example to get some profile stats, e.g. ipc, memory bandwidth as the example belongs to BLAS level I, it is memory bound.

@alsam
Copy link
Author

alsam commented Aug 24, 2022

Hi @eyalroz , just checked from HEAD

[alsam@Noire build_debug2]$ cmake .. -DCMAKE_BUILD_TYPE=Debug -DCAW_BUILD_EXAMPLES=ON
...
cd exampes/bin
$ ./vectorAdd_profiled
terminate called after throwing an instance of 'cuda::runtime_error'
  what():  Starting CUDA profiling: initialization error

@eyalroz
Copy link
Owner

eyalroz commented Aug 24, 2022

@alsam : Are you sure you checked out from the develop branch?

@alsam
Copy link
Author

alsam commented Aug 24, 2022

@eyalroz maybe I missed something? From which branch should I get the code? See your changes in HEAD

git log
commit 4aac489d89a60675bbe48a1d90a8817bd0039086 (HEAD -> master, tag: v0.5.4, origin/master, origin/HEAD)
Author: Eyal Rozenberg <[email protected]>
Date:   Fri Aug 19 21:45:51 2022 +0300

    Version number bump

commit 86eb49b9df0d2d1374387b566834fbe01de3a419
Author: Eyal Rozenberg <[email protected]>
Date:   Tue Aug 9 21:39:18 2022 +0300

    Added some primary context activation check game to the context management test program.

commit a8a4dee2db90fa03fb4d73ff61671dbce54c5460
Author: Eyal Rozenberg <[email protected]>
Date:   Tue Aug 9 21:38:18 2022 +0300

    Fixes #394 Not ensuring driver initialization when getting a device name by device ID.
...

@eyalroz
Copy link
Owner

eyalroz commented Aug 24, 2022

@alsam :Yes, you missed the branch change. You build the master branch - and fixes don't go there, they go on development. So, right after

cd cuda-api-wrappers/

switch the branch by executing

git checkout development

@alsam
Copy link
Author

alsam commented Aug 24, 2022

ok, it works!

git checkout development
...
./vectorAdd_profiled
CUDA kernel launch with 1954 blocks of 256 threads
SUCCESS

Thanks!

eyalroz added a commit that referenced this issue Aug 24, 2022
… profiling

* `context::current::detail_::scoped_existence_ensurer_t` will now initialize the CUDA driver if necessary - as part of creating a context when none exists.
* The profiling `scope` class now has a `scoped_existence_ensurer_t` member - which ensures the driver is initialized and that some context is current when profiling begins.
@eyalroz
Copy link
Owner

eyalroz commented Aug 24, 2022

Thank you for taking the time to report this and checking the fix. If end up using the library for an interesting project, or even if you just feel like giving some library usability feedback, do drop me a line...

@alsam
Copy link
Author

alsam commented Aug 24, 2022

Thank you for the great library @eyalroz ! Envisage a potential usage for e.g. collecting traces for different workloads, replaying them, collecting some of performance counters :-)

eyalroz added a commit that referenced this issue Aug 27, 2022
… profiling

* `context::current::detail_::scoped_existence_ensurer_t` will now initialize the CUDA driver if necessary - as part of creating a context when none exists.
* The profiling `scope` class now has a `scoped_existence_ensurer_t` member - which ensures the driver is initialized and that some context is current when profiling begins.
@eyalroz eyalroz closed this as completed in f103530 Oct 5, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants