You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In the section "Shared Memory and Synchronisation" you introduce the variable cacheIndex as threadIdx().x - 1 in the dot product kernel. But after that you always refer to cacheIndex + 1. It might therefore be more concise to drop that -1.
The updated code with my edits highlighted:
functiondot(a,b,c, N, threadsPerBlock, blocksPerGrid)
# Set up shared memory cache for this current block.
cache =@cuDynamicSharedMem(Int64, threadsPerBlock)
# Initialise some variables.
tid = (threadIdx().x -1) + (blockIdx().x -1) *blockDim().x
totalThreads =blockDim().x *gridDim().x
cacheIndex =threadIdx().x # <<< HERE >>>
temp =0# Iterate over vector to do dot product in parallel waywhile tid < N
temp += a[tid +1] * b[tid +1]
tid += totalThreads
end# set cache values
cache[cacheIndex] = temp # <<< HERE >>># synchronise threadssync_threads()
# In the step below, we add up all of the values stored in the cache
i::Int=blockDim().x ÷2while i!=0if cacheIndex <= i # <<< HERE >>>
cache[cacheIndex] += cache[cacheIndex + i] # <<< HERE >>>endsync_threads()
i = i ÷2end# cache[1] now contains the sum of vector dot product calculations done in# this block, so we write it to cif cacheIndex ==1# <<< HERE >>>
c[blockIdx().x] = cache[1]
endreturnnothingend
Sorry for not doing a proper pull request, I thought this way it would be a bit easier.
Thank you for taking the time writing such a tutorial!
The text was updated successfully, but these errors were encountered:
In the section "Shared Memory and Synchronisation" you introduce the variable
cacheIndex
asthreadIdx().x - 1
in the dot product kernel. But after that you always refer tocacheIndex + 1
. It might therefore be more concise to drop that-1
.The updated code with my edits highlighted:
Sorry for not doing a proper pull request, I thought this way it would be a bit easier.
Thank you for taking the time writing such a tutorial!
The text was updated successfully, but these errors were encountered: