- Deep Learning from Scratch to GPU - A series on getting started with parallel programming
- (Article) HPC is more parallel than ever
- Radeon Open Compute - Platform for GPU-Enabled HPC and Ultrascale Computing
- Futhark - Futhark is a small programming language designed to be compiled to efficient parallel code. It is a statically typed, data-parallel, and purely functional array language in the ML family, and comes with a heavily optimising ahead-of-time compiler that presently generates either GPU code via CUDA and OpenCL, or multi-threaded CPU code.
- CUDA Occupancy Calculator - Compute the multiprocessor occupancy of a GPU by a given CUDA kernel