CUDA Optimization Guide

Acknowlegement

This repo was originally part of my HPC Note. With more content related to CUDA added to the original note, I decided to open a seprate repo dedicated to CUDA optimization.

Correction on mistakes is highly welcomed. Please post a issue if you found one.

To open markdown file with better format (e.g. proper image resize, spacing, side bar, etc), typora is recommended (its beta version is free).

Disclaimer

I do not contain the copyright of some image files included in this note. The copyright belongs to the original author.

Any content inside this repo is OPEN FOR EDUCATION PURPOSE but NOT ALLOWED FOR COMMERCIAL USE.

File Structure

# Difference in architecture difference behind CPU and GPU
CPUvsGPU.md

# Memory model of CUDA and memory related optimization techniques (including synchronization)
MemoryModel.md

# Program model of CUDA and program related optimization techniques (including stream)
ProgramModel.md

# Arithmetic related topic. accuracy, speed, etc
Arithmetic.md

# Measure Performence (including nsight, etc)
MeasurePerformence.md

# Other common use optimization techniques that not included as part of programmodel / memory model
CommonOptimizationTechniques.md

# Computation capacity of each generation GPU
ComputationCapacity.md

# Cases that refer to the above optimization techniques and show how those optimization techniques can be applied to real applications.
Cases.md

# Overview of what library NVIDIA provide and functionality of each library
Library.md

Major Refrence

Note: I also refer to other papers / blogs that's not listed below.

Courses
- UIUC ECE 408
- UIUC ECE 508
- UC Berkeley CS 267
- CMU 16.418
Book (CUDA)
- Programing Massively Parallel Processors 3rd edition
- CUDA C++ Best Practices Guide
- CUDA C++ Programing Guide
- Professional CUDA C Progaming
Book (Arch)
- General-Purpose Graphics Processor Architecture
- Processor Microarchitecture: An Implementation Perspective
Papers
- Algorithm and Data Optimization Techniques for Scaling to Massively Threaded Systems
Blogs & code
- CUTLASS: Fast Linear Algebra in CUDA C++ link
- Cutlass github

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
Note.assets		Note.assets
.gitignore		.gitignore
Arithmetic.md		Arithmetic.md
CPUvsGPU.md		CPUvsGPU.md
Cases.md		Cases.md
CommonOptimizationTechniques.md		CommonOptimizationTechniques.md
ComputeCapacity.md		ComputeCapacity.md
LICENSE		LICENSE
Library.md		Library.md
MeasurePerformence.md		MeasurePerformence.md
MemoryModel.md		MemoryModel.md
ProgramModel.md		ProgramModel.md
README.md		README.md
cuda-optimize.code-workspace		cuda-optimize.code-workspace

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CUDA Optimization Guide

Acknowlegement

Disclaimer

File Structure

Major Refrence

About

License

XiaoSong9905/CUDA-Optimization-Guide

Folders and files

Latest commit

History

Repository files navigation

CUDA Optimization Guide

Acknowlegement

Disclaimer

File Structure

Major Refrence

About

Topics

Resources

License

Stars

Watchers

Forks