Home

NVIDIA CUDA Knowledge Base

Welcome to the knowledge base. All you need to get started is a CUDA-enabled device that you can run the example programs on, a basic understanding of C/C++ (some of the more complex elements like memory management and pointers will be reviewed as needed), and the desire to learn.

Fundamental CUDA Concepts

What is Parallel Computing?
What is a GPU?
Basic CUDA Syntax
Memory Management on the GPU
a. CUDA Memory Types
b. Using CUDA Memory
Performance Experiment: On-GPU vs Off-GPU Bandwidth
Thread and Block Scheduling
Common Parallel Applications
a. Reduction
b. Matrix Multiplication

Asynchronous Computing Using CUDA

Intro to Asynchronous Computing
CUDA Streams
Asynchronous Memory Transfers
Performance Experiment: Multi-stream Parallelism
Basic Synchronization Methods
Events and Dependencies
Performance Experiment: Event-Based Synchronization vs Explicit Synchronization
The Graph Model
Creating a CUDA Graph using Stream Capture
Performance Experiment: Graphs vs Streams vs Synchronous Kernels
Performance Experiment: Increasing the Amount of Graph Nodes
CUDA Graph API
Synchronization & Dependencies Inside CUDA Graphs
Using Host Functions in Graphs & Streams
Graph API Node Glossary & Usage Examples

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Home

NVIDIA CUDA Knowledge Base

Fundamental CUDA Concepts

Asynchronous Computing Using CUDA

Clone this wiki locally