Name		Name	Last commit message	Last commit date
parent directory ..
gifs		gifs
naive-matmul		naive-matmul
v0		v0
v1		v1
v2		v2
v3		v3
README.md		README.md

README.md

Examples

Naive matrix multiplication

2304 thread blocks on 68 SMs.

Minimum shortcuts in a graph (4 versions)

Problem description here.

Baseline approach with a memory access pattern that uses many short cachelines, which leads to poor memory transaction coalescing (source).

4 thread blocks on 4 SMs.

Slightly adjusted access pattern where thread warps are accessing consecutive memory addresses, leading to fewer, wider memory transactions (source).

4 thread blocks on 4 SMs.

Reduced amount of memory accesses by reusing data in registers (source). The input data has been copied and transposed to enable a linear memory access pattern for both row- and column-wise accesses.

1 thread block on 1 SM.

Buffering memory accesses through shared memory (source).

1 thread block on 1 SM.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

examples

examples

README.md

Examples

Naive matrix multiplication

Minimum shortcuts in a graph (4 versions)

Files

examples

Directory actions

More options

Directory actions

More options

Latest commit

History

examples

Folders and files

parent directory

README.md

Examples

Naive matrix multiplication

Minimum shortcuts in a graph (4 versions)