-
Hi, I am trying to understand the concept of a thread-scoped barrier and here is some code I write: #include <stdio.h>
#include <cuda/barrier>
__global__ void k() {
using barrier = cuda::barrier<cuda::thread_scope_thread>;
barrier bar;
init(&bar, 1);
printf("[%u] phase 0 data 0x%lx\n", threadIdx.x, *reinterpret_cast<uint64_t *>(&bar));
bar.arrive_and_wait();
printf("[%u] phase 1 data 0x%lx\n", threadIdx.x, *reinterpret_cast<uint64_t *>(&bar));
bar.arrive_and_wait();
printf("[%u] phase 2 data 0x%lx\n", threadIdx.x, *reinterpret_cast<uint64_t *>(&bar));
}
int main(int argc, char **argv){
k<<<1, 1>>>();
cudaDeviceSynchronize();
return 0;
} The code above is compiled with command I thought thread-scoped means that each thread gets its own copy and they do not interfere each other, so the code above should
But what I get is output like this and the program just hungs in there. It seems that the two thread-scoped barriers from different threads interfers with each other.
Did I get the concept of thread-scoped corectly or I miss somethine? My environment follows if they concern:
|
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 2 replies
-
This is probably caused by #2585. Can you test with #2586 to see if that resolves the problem? In all fairness we should probably redefine the thread-scope barrier to not use atomics in the first place, at least in non-shared memory, but until then, this smells like the above issue to me. |
Beta Was this translation helpful? Give feedback.
This is probably caused by #2585. Can you test with #2586 to see if that resolves the problem?
In all fairness we should probably redefine the thread-scope barrier to not use atomics in the first place, at least in non-shared memory, but until then, this smells like the above issue to me.