Skip to content

Latest commit

 

History

History

examples

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Examples

Naive matrix multiplication

2304 thread blocks on 68 SMs.

Minimum shortcuts in a graph (4 versions)

Problem description here.

Baseline approach with a memory access pattern that uses many short cachelines, which leads to poor memory transaction coalescing (source).

4 thread blocks on 4 SMs.

Slightly adjusted access pattern where thread warps are accessing consecutive memory addresses, leading to fewer, wider memory transactions (source).

4 thread blocks on 4 SMs.

Reduced amount of memory accesses by reusing data in registers (source). The input data has been copied and transposed to enable a linear memory access pattern for both row- and column-wise accesses.

1 thread block on 1 SM.

Buffering memory accesses through shared memory (source).

1 thread block on 1 SM.