A curated list of awesome high performance computing resources.

General Info

A Few Upcoming Supercomputers

El Capitan - 2023, AMD-based, ~1.5 exaflops
Tianhe-3 - 2022, ~700 Petaflop (Linpack500)
Venado - 2024, Grace-Hopper based ~10 exaflops

Most Recent List of the Top500 Supercomputers

Top500 (June 2024)
HPCG Top500 (June 2024)
Green500 (June 2024)
io500

History

History of Supercomputing (Wikipedia)
History of Parallel Computing (Wikipedia)
History of the Top500 (Wikipedia)
History of LLNL Computing
The Supermen: The Story of Seymour Cray ... (1997)
Unmatched - 50 Years of Supercomputing (2023)

Trends

Trends in HPC for AI workloads

Software

Popular HPC Programming Libraries/APIs/Tools/Standards/Simulators

alpaka - The alpaka library is a header-only C++17 abstraction library for accelerator development
async-rdma - A framework for writing RDMA applications with high-level abstraction and asynchronous APIs
CAF - An Open Source Implementation of the Actor Model in C++
Chapel - A Programming Language for Productive Parallel Computing on Large-scale Systems
Charm++ - Parallel Programming with Migratable Objects
Cilk Plus - C/C++ Extension for Data and Task Parallelism
Codon - high-performance Python compiler that compiles Python code to native machine code without any runtime overhead
CUDA - High performance NVIDIA GPU acceleration
dask - Dask provides advanced parallelism for analytics, enabling performance at scale for the tools you love
DeepSpeed - An easy-to-use deep learning optimization software suite that enables unprecedented scale and speed for Deep Learning Training and Inference
DeterminedAI - Distributed deep learning
FastFlow - High-performance Parallel Patterns in C++
Galois - A C++ Library to Ease Parallel Programming with Irregular Parallelism
Halide - A language for fast, portable computation on images and tensors
Heteroflow - Concurrent CPU-GPU Task Programming using Modern C++
highway - Performance portable SIMD intrinsics
HIP - HIP is a C++ Runtime API and Kernel Language for AMD/Nvidia GPU
HPC-X - Nvidia implementation of MPI
HPX - A C++ Standard Library for Concurrency and Parallelism
Horovod - Distributed deep learning training framework for TensorFlow, Keras, PyTorch, and Apache MXNet
ISPC - An open-source compiler for high-performance SIMD programming on the CPU and GPU
Intel ISPC - SPMD compiler
Intel TBB - Threading Building Blocks
joblib - Data-flow programming for performance (python)
Kompute - The general purpose GPU compute framework for cross vendor graphics cards (AMD, Qualcomm, NVIDIA & friends)
Kokkos - A C++ Programming Model for Writing Performance Portable Applications on HPC platforms
Kubeflow MPI Operator - MPI Operator for Kubeflow
Legate - Nvidia replacement for numpy based on Legion
Legion - Distributed heterogeneous programming library
MAGMA - Next generation linear algebra (LA) GPU accelerated libraries
Merlin - A distributed task queuing system, designed to allow complex HPC workflows to scale to large numbers of simulations
Metal - Apple's GPU API
Microsoft MPI - Microsoft's implementation of MPI
MOGSLib - User defined schedulers
mpi4jax - Zero-copy mpi for jax arrays
mpi4py - Python bindings for MPI
MPI - OpenMPI implementation of the Message passing interface
MPI - MPICH implementation of the Message passing interface
MPI Standardization Forum - Forum for MPI standardization
MPAVICH - Implementation of MPI
NCCL - The NVIDIA Collective Communication Library for multi-GPU and multi-node communication
cuNumeric - GPU drop-in for numpy
stdpar - GPU accelerated C++ from NVIDIA
numba - A JIT compiler that translates a subset of Python into fast machine code
oneAPI - A unified, multiarchitecture, multi-vendor programming model
OpenACC - "OpenMP for GPUs"
OpenCilk - MIT continuation of Cilk Plus
OpenMP - Multi-platform Shared-memory Parallel Programming in C/C++ and Fortran
PVM - Parallel Virtual Machine: A predecessor to MPI for distributed computing
PMIX - Standard for process management
Pollux - Message Passing Cloud orchestrator
Pyfi - Distributed flow and computation system
RAJA - Architecture and programming model portability for HPC applications
RaftLib - A C++ Library for Enabling Stream and Dataflow Parallel Computation
ray - Scale AI and Python workloads from reinforcement learning to deep learning
ROCM - First open-source software development platform for HPC/Hyperscale-class GPU computing
RS MPI - Rust bindings for MPI
Scalix - Data parallel computing framework
Simgrid - Simulate cluster/HPC environments
SkelCL - A Skeleton Library for Heterogeneous Systems
STAPL - Standard Template Adaptive Parallel Programming Library in C++
STLab - High-level Constructs for Implementing Multicore Algorithms with Minimized Contention
SYCL - C++ Abstraction layer for heterogeneous devices
Taichi - Parallel programming language for high-performance numerical computations in Python
Taskflow - A Modern C++ Parallel Task Programming Library
The Open Community Runtime - Specification for Asynchronous Many Task systems
Transwarp - A Header-only C++ Library for Task Concurrency
Triton - Triton is a language and compiler for parallel programming
Tuplex - Blazing fast python data science
UCX - Optimized production proven-communication framework
Zluda - Run unmodified CUDA applications with near-native performance on Intel AMD GPUs.
HyperQueue - HyperQueue is a tool designed to simplify execution of large workflows (task graphs) on HPC clusters.

Cluster Hardware Discovery Tools

cpuid - A software instruction available on Intel, AMD, and other processors that can be used to determine processor type and features.
cpuid instruction note - A detailed note on the CPUID instruction used for processor identification.
cpufetch - A simple yet fancy CPU architecture fetching tool.
gpufetch - A tool similar to cpufetch, but for fetching GPU architecture.
intel cpuinfo - Intel tool providing information about the characteristics of Intel CPUs.
Likwid - Provides all information about the supercomputer/cluster.
LIKWID.jl - Julia wrapper for LIKWID.
openmpi hwloc - Portable Hardware Locality (hwloc) software project.
PRK - Parallel Research Kernels - A collection of kernels for parallel programming research.

Cluster Management/Tools/Schedulers/Stacks

BeeGFS - A parallel file system designed for performance-critical environments.
Bluebanquise - An open-source cluster management tool.
Bright Cluster Manager - Software for deploying and managing HPC and AI server clusters.
Ceph - An open-source distributed storage system.
DeepOps - Nvidia's GPU infrastructure and automation tools for Kubernetes and Slurm clusters.
E4S - The Extreme Scale HPC Scientific Stack - A collection of open-source software packages for HPC environments.
Easybuild - A package manager for HPC/supercomputers.
EESSI - A shared stack of scientific software installations.
Flux framework - A framework for high-performance computing clusters.
fpsync - A tool for fast parallel data transfer using fpart and rsync.
GPFS - A high-performance parallel file system developed by IBM.
Guix - A package manager for HPC/supercomputers.
Intel DAOS - A software-defined scale-out object store for HPC applications.
LSF - A batch system for HPC and distributed computing environments.
Lmod - A Lua-based module system for software environment management on HPC systems.
Lustre Parallel File System - A high-performance distributed filesystem for large-scale cluster computing.
moosefs - A fault-tolerant, highly available, distributed file system.
NetApp - Intelligent data infrastructure for various workloads.
Open Cluster Scheduler - A scalable HPC/AI workload manager based on SGE.
OpenHPC - A community-led set of HPC components.
OpenOnDemand - A web portal for accessing supercomputing resources.
OpenPBS - A software for workload management and job scheduling.
OpenXdMod - A tool for managing high-performance computing resources.
RADIUSS - Rapid Application Development via an Institutional Universal Software Stack.
rocks - An open-source Linux cluster distribution.
Ruse - A tool for managing software environments in HPC clusters.
SGE - A resource management software for large clusters of computers.
Slurm - A cluster management and job scheduling system for Linux clusters.
Spack - A package manager for HPC/supercomputers.
sstack - A tool to install multiple software stacks such as Spack, EasyBuild, and Conda.
Starfish - Unstructured data management and metadata solution for files and objects.
Warewulf - An operating system provisioning system and cluster management tool.
xCat - A distributed computing management and provisioning tool.
XDMoD - An open-source tool for managing high-performance computing resources.
Globus Connect - A fast data transfer tool between supercomputers.
Slurm Web - Open source web dashboard for Slurm HPC clusters.

HPC-specific Operating Systems

Kitten - A lightweight kernel designed for high-performance computing. It focuses on providing low noise and predictable performance for HPC applications.
McKernel - A hybrid kernel that combines Linux and a lightweight kernel designed to provide high performance for HPC applications.
mOS - A specialized operating system for high-performance computing, designed to support large-scale, manycore processors.

Development/Workflow/Monitoring Tools for HPC

Apache Airflow - A platform to programmatically author, schedule, and monitor workflows.
Apptainer (formerly Singularity) - Container platform designed for scientific and high-performance computing (HPC) environments.
arbiter2 - Monitors and protects interactive nodes with cgroups.
Charliecloud - Lightweight container solution for high-performance computing (HPC).
Docker - A set of platform as a service products that use OS-level virtualization to deliver software in packages called containers.
genv - GPU Environment Management for managing and scheduling GPU resources.
Grafana - Open-source platform for monitoring and observability, visualizing metrics.
grpc - A high-performance, open-source universal RPC framework.
HPC Rocket - Allows submitting Slurm jobs in Continuous Integration (CI) pipelines.
HTCondor - An open-source high-throughput computing software framework.
Jacamar-ci - CI/CD tool designed for HPC and scientific computing workflows.
Kubernetes - An open-source system for automating deployment, scaling, and management of containerized applications.
nextflow - A workflow framework to deploy data-driven computational pipelines.
perun - Energy monitor for HPC systems, focusing on performance and energy efficiency.
Prefect - A workflow management system, designed for modern infrastructure and powered by the open-source Prefect Core workflow engine.
Prometheus - An open-source monitoring system with a dimensional data model, flexible query language, efficient time series database and modern alerting approach.
redun - Workflow engine that emphasizes simplicity, reliability, and scalability.
remora - Tool for monitoring and reporting the performance of batch jobs on HPC systems.
ruptime - A utility for monitoring the status of computational jobs and systems.
Slurmvision slurm dashboard - A dashboard for monitoring and managing Slurm jobs.
slurm docker cluster - A Slurm cluster implemented using Docker containers, for development and testing.
snakemake - A workflow management system that reduces the complexity of creating reproducible and scalable data analyses.
Stui slurm dashboard for the terminal - A terminal-based UI for managing and monitoring Slurm clusters.
Vaex - A Python library for lazy Out-of-Core DataFrames (similar to Pandas), to visualize and explore big tabular datasets.

Debugging Tools for HPC

ddt - A powerful debugger designed for developers to solve complex problems on multi-threaded and multi-process environments in HPC.
marmot MPI checker - A tool for detecting and reporting issues in MPI (Message Passing Interface) applications.
python debugging tools - A collection of tools for debugging Python applications, including pdb and other utilities.
seer modern gui for gdb - A graphical user interface for GDB, aiming to improve the debugging experience with modern features and visuals.
Summary of C/C++ debugging tools - An overview of various debugging tools available for C/C++ applications, focusing on HPC environments.
totalview - A comprehensive source code analysis and debugging tool designed for complex software running on HPC systems, supporting a wide range of languages and architectures.

Performance/Benchmark Tools for HPC

demonspawn - A framework for automated execution of benchmarks and simulations, designed for HPC environments.
Google benchmark - A microbenchmark support library for C++ that tracks performance over time.
HPL benchmark - The High Performance Linpack Benchmark for measuring floating-point computing power of systems.
kerncraft - A tool for analytical modeling of loop performance and cache behavior on HPC systems.
NASA parallel benchmark suite - A set of benchmarks designed to evaluate the performance of parallel supercomputers.
papi - Provides standard APIs for accessing hardware performance counters available on modern microprocessors.
scalasca - A software tool that supports performance analysis of large-scale parallel applications.
scalene - A high-performance, high-precision CPU, GPU, and memory profiler for Python.
Summary of code performance analysis tools - An overview of tools for analyzing HPC application performance.
Summary of profiling tools - A comprehensive list of profiling tools for performance analysis in HPC.
tau - TAU (Tuning and Analysis Utilities) is a profiling and tracing toolkit for performance analysis of parallel programs.
The Bandwidth Benchmark - A tool for measuring memory bandwidth across various CPUs and systems.
vampir - A tool for detailed analysis of MPI program executions by visualizing their event traces.
bytehound memory profiler - A detailed memory profiler for tracking down memory issues and leaks.
Flamegraphs - Visualization tool for profiling software, allowing quick identification of performance bottlenecks.
fio - Flexible I/O tester for benchmarking and stress/hardware verification.
IBM Spectrum Scale Key Performance Indicators (KPI) - Provides key performance indicators for IBM Spectrum Scale, aiding in performance tuning and monitoring.
Ior - A parallel file system I/O benchmarking tool used widely in HPC for testing storage systems.
ngstress - A versatile tool for stressing various subsystems of a computer to find hardware faults or to benchmark performance.
Hotspot - The Linux perf GUI for in-depth performance analysis and visualization of software behavior.
mixbench - A benchmark suite designed to evaluate CPUs and GPUs across different compute and memory operations.
pmu-tools (toplev) - Performance monitoring tools for modern Intel CPUs, offering detailed insights into hardware and application performance.
SPEC CPU Benchmark - A benchmark suite designed to provide a comparative measure of compute-intensive performance across the widest practical range of hardware.
STREAM Memory Bandwidth Benchmark - Measures sustainable memory bandwidth and the corresponding computation rate for simple vector kernels.
Intel MPI benchmarks - A set of benchmarks designed to measure the performance and scalability of MPI implementations on Intel architectures.
Ohio state MPI benchmarks - A comprehensive suite of benchmarks for evaluating MPI performance across a variety of message passing patterns and communication protocols.
hpctoolkit - An integrated suite of tools for measurement and analysis of program performance on computers ranging from desktops to supercomputers.
core-to-core-latency - A diagnostic tool designed to measure and report the latency between CPU cores, aiding in the optimization of parallel computing tasks.
speedscope - An interactive, web-based viewer for performance profiles of software. It supports various formats and provides a flamegraph visualization to identify hot paths efficiently.
Differential Flamegraphs - A visualization technique developed by Brendan Gregg that highlights differences between performance profiles, making it easier to spot performance regressions or improvements.
Hyperfine - A command-line benchmarking tool that provides a simple and user-friendly means to compare the performance of commands, featuring statistical analysis across multiple runs.
Openfoam HPC benchmark - A benchmarking suite for evaluating the High Performance Computing capabilities of OpenFOAM, an open-source CFD software, under various computational loads.
OSU microbenchmarks - A collection of microbenchmarks designed to evaluate the performance of MPI implementations across various communication protocols and message sizes.
fio flexible I/O tester - A versatile tool for I/O workload simulation and benchmarking, capable of testing a wide array of storage and filesystem configurations.
vftrace - A tracing tool specifically designed for the NEC SX-Aurora TSUBASA Vector Engine, enabling detailed performance analysis of vectorized code.
tinymembench - A simple memory benchmark tool, focusing on benchmarking memory bandwidth and latency with minimal dependencies, suitable for various platforms.
Geekbench - Cross platform benchmarking tool
Empirical Roofline Tool (ERT) - Create empirical roofline plots, alternative to intel vtune for any machine
Roofline Visualizer for ERT - Visualizer for ERT
Caliper - A Performance Analysis Toolbox in a Library
KDiskMark - Benchmarking Tool For SSD/HDD Drives

IO/Visualization Tools for HPC

ADIOS2 - The Adaptable IO System version 2, designed for flexible and efficient I/O for scientific data, supporting a wide range of HPC simulations.
Amira - A powerful, multifaceted 3D software platform for visualizing, manipulating, and understanding Life Science and bio-medical data coming from all types of sources.
hdf5 - The Hierarchical Data Format version 5 (HDF5), is an open source file format that supports large, complex, heterogeneous data.
paraview - An open-source, multi-platform data analysis and visualization application.
Scientific Visualization Wiki - A comprehensive guide to the field of scientific visualization, detailing techniques, tools, and applications.
the yt project - An open-source, Python-based package for analyzing and visualizing volumetric data.
vedo - A lightweight and powerful python module for scientific analysis and visualization of 3D objects and point clouds based on VTK.
visit - An Open Source, interactive, scalable, visualization, animation and analysis tool.

General Purpose Scientific Computing Libraries for HPC

petsc
ginkgo
GSL
Scalapack
rapids.ai - collection of libraries for executing end-to-end data science pipelines completely in the GPU
trilinos
tnl project

Misc.

mimalloc memory allocator
jemalloc memory allocator
tcmalloc memory allocator
Horde memory allocator
Software utilization at UK National Supercomputing Service, ARCHER2

Wikis

Comparison of cluster software
List of cluster management software

Hardware

Interconnects/Topology

Ethernet
Infiniband
Network topologies
Battle of the infinibands - Omnipath vs Infiniband
Mellanox infiniband cluster config
RoCE - RDMA Over Converged Ethernet
Slingshot interconnect
CXL - Compute Express Link
Infiniband Essentials

CPU

Wikichip
Microarchitecture of Intel/AMD CPUs
Apple M1
Apple M2
Apple M2 Teardown
Apply M1/M2 AMX
Apple M3
List of Intel processors
List of Intel micro architectures
Comparison of Intel processors
Comparison of Apple processors
List of AMD processors
List of AMD CPU micro architectures
Comparison of AMD architectures

GPU

Gpu Architecture Analysis
A trip through the Graphics Pipeline
A100 Whitepaper
MIG
Gentle Intro to GPU Inner Workings
AMD Instinct GPUs
AMD GPU ROCm Support and OS Compatibility
List of AMD GPUs
Comparison of CUDA architectures
Tales of the M1 GPU
List of Intel GPUs
Performance of DGX Cluster

TPU/Tensor Cores

Google TPU
TPU Wiki
NVIDIA Tensor Cores

Many integrated core processor (MIC)

Xeon Phi

Cloud

Awesome Cloud HPC

Vendors

AWS HPC
Azure HPC
rescale
vast.ai
vultr - cheap bare metal CPU, GPU, DGX servers
hetzner - cheap servers incl. 80-core ARM
Ampere ARM cloud-native processors
Scaleway
Chameleon Cloud
Lambda Labs
Runpod

Articles/Papers

The use of Microsoft Azure for high performance cloud computing – A case study
AWS Cluster in the cloud
AWS Parallel Cluster
AWS HPC Workshop
An Empirical Study of Containerized MPI and GUI Application on HPC in the Cloud

Custom/FPGA/ASIC/APU

OpenPiton
Parallela
AMD APU

Certification

Intel Cluster Ready

Student Opportunities / Workshops

Supercomputing Conference Student Opportunities
SCC Student cluster competition
Winter Classic Invitational
Linux Cluster Institute

Other/Wikis

Supercomputer
Supercomputer architecture
Computer cluster
Comparison of Intel processors
Comparison of Apple processors
Comparison of AMD architectures
Comparison of CUDA architectures
Cache
Google TPU
IPMI
FRU
Disk Arrays
RAID
Cray
Digital Signal Processors
Vector Processor

People

Jack Dongarra - 2021 Turing Award - LINPACK, BLAS, LAPACK, MPI
Bill Gropp - 2010 IEEE TCSC Medal for Excellence in Scalable Computing
David Bader - built the first Linux supercomputer
Thomas Sterling - Inventor of Beowulf cluster, ParalleX/HPX
Seymour Cray - Inventor of the Cray Supercomputer
Larry Smarr - HPC Application Pioneer

Resources

Books/Manuals

Free Modern HPC Books by Victor Eijkhout
High Performance Parallel Runtimes
The OpenMP Common Core: Making OpenMP Simple Again
Parallel and High Performance Computing
Algorithms for Modern Hardware
High Performance Computing: Modern Systems and Practices - Thomas Sterling, Maciej Brodowicz, Matthew Anderson 2017
Introduction to High Performance Computing for Scientists and Engineers - Hager 2010
Computer Organization and Design
Optimizing HPC Applications with Intel Cluster Tools: Hunting Petaflops
Introduction to High Performance Scientific Computing - Victor Eijkhout 2021
Parallel Programming for Science and Engineering - Victor EIjkhout 2021
Parallel Programming for Science and Engineering - HTML Version
C++ High Performance
Data Parallel C++ Mastering DPC++ for Programming of Heterogeneous Systems using C++ and SYCL
High Performance Python
C++ Concurrency in Action: Practical Multithreading - Anthony Williams 2012
The Art of Multiprocessor Programming - Maurice Herlihy 2012
Parallel Computing: Theory and Practice - Umut A. Acar 2016
Introduction to Parallel Computing - Zbigniew J. Czech
Practical guide to bare metal C++
Optimizing software in C++
Optimizing subroutines in assembly code
Microarchitecture of Intel/AMD CPUs
Parallel Programming with MPI
HPC, Big Data, AI Convergence Towards Exascale: Challenge and Vision
Introduction to parallel computing - Ananth Grama
The Student Supercomputer Challenge Guide
The Rust Performance Book
E-Zines on Bash, Linux, Perf, etc - Julia Evans
The Art of Writing Efficient Programs: An Advanced Programmer's Guide to Efficient Hardware Utilization and Compiler Optimizations Using C++ Examples
OpenMP Examples - openmp.org
Latest books on OpemMP - openmp.org
Programming Massively Parallel Processors 4th Edition 2023
Software Optimization Cookbook
Power and Performance_ Software Analysis and Optimization
Gropp books on MPI
Performance Analysis and Tuning on Modern CPUs
High Performance Computing in Biomimetics Modeling, Architecture and Applications

Courses

HPC Carpentry
Berkeley: Applications of Parallel Computers - Detailed course on HPC
CS6290 High-performance Computer Architecture - Milos Prvulovic and Catherine Gamboa at George Tech
Udacity High Performance Computing
Parallel Numerical Algorithms
Vanderbilt - Intro to HPC
Illinois - Intro to HPC - Creator of PyCuda
Archer1 Courses
TACC tutorials
Livermore training materials
Xsede training materials
Parallel Computation Math
Introduction to High-Performance and Parallel Computing - Coursera
Foundations of HPC 2020/2021
Principles of Distributed Computing
High Performance Visualization
Temple course on building/maintaining a cluster
Nvidia Deep Learning Course
Coursera GPU Programming Specialization
Coursera Fundamentals of Parallelism on Intel Architecture
Coursera Introduction to High Performance Computing
Archer2 Shared Memory Programming with OpenMP
Archer2 Message-Passing Programming with MPI
HetSys 2022 Course
Edukamu Introduction to Supercomputing
Heterogeneous Parallel Programming by S K
NCSA HPC Training Moodle
Supercomputing in plain english
Cornell workshop
Carpentries Incubator HPC Intro
UL HPC School
Introduction to High-Performance Parallel Distributed Computing using Chapel, UPC++ and Coarray Fortran
Performance Engineering off Software Systems (MIT-OCW)
Introduction to Parallel Computing (CMSC 498X/818X)
Infiniband Essentials
Performance Ninja Optimization Course
HPC Administration Virtual Residency 2024

Tutorials/Guides/Articles

General

MpiTutorial - A fantastic mpi tutorial
Beginners Guide to HPC
Rookie HPC Guide
RedHat High Performance Computing 101
Parallel Computing Training Tutorials - Lawrence Livermore National Laboratory
Foundations of Multithreaded, Parallel, and Distributed Programming
Building pipelines using slurm dependencies
Writing slurm scripts in python,r and bash
Xsede new user tutorials
Supercomputing in plain english
Improving Performance with SIMD intrinsics
Want speed? Pass by value
Introduction to low level bit hacks
How to write fast numerical code: An Introduction
Lecture notes on Loop optimizations
A practical approach to code optimization
Software optimization manuals
Guide into OpenMP: Easy multithreading programming for C++
An Introduction to the Partitioned Global Address Space (PGAS) Programming Model
Jax in 2022
C++ Benchmarking for beginners
Mapping MPI ranks to multiple cuda GPU
Oak Ridge National Lab Tutorials
How to perform large scale data processing in bioinformatics
Step by step SGEMM in OpenCL
Frontier User Guide
Allocating large blocks of memory in bare-metal C programming
Hashmap benchmarks 2022
LLNL HPC Tutorials
High Performance Computing: A Bird's Eye View
The dirty secret of high performance computing
Multiple GPUs with pytorch
Brendan Gregg on Linux Performance
Automatic Slurm build scripts
Fastest unordered_map implementation / benchmarks
Memory bandwith NapkinMath
Avoiding Instruction Cache Misses
Multi-GPU Programming with Standard Parallel C++
EuroCC National Competence Center Sweden (ENCCS) HPC tutorials
LLNL hpc tutorials
python.org Python Performance Tips
HPC toolset tutorial (cluster management)
OpenMP tutorials
CUDA best practices guide
Understanding CPU Architecture And Performance Using LIKWID
32 OpenMP Traps For C++ Developers
Best practices for running jobs on a HPC cluster
Glossary of HPC related terms
Setting the record straight: What is HPC?

Machine Learning Related

Best practices for machine learning with HPC
How to pick the right hardware for AI - Gigabyte - Part 1
A practitioner's guide to testing and running large GPU clusters for training generative AI models
AWS HPC Workshop
Hardware Acceleration of LLMs: A comprehensive survey and comparison

Review Papers/Articles

Interactive and Urgent HPC Challenges (2024)
The Landscape of Exascale Research: A Data-Driven Literature Analysis (2020)
The Landscape of Parallel Computing Research: A View from Berkeley
Extreme Heterogeneity 2018: Productive Computational Science in the Era of Extreme Heterogeneity
Programming for Exascale Computers - Will Gropp, Marc Snir
On the Memory Underutilization: Exploring Disaggregated Memory on HPC Systems (2020)
Advances in Parallel & Distributed Processing, and Applications (conference proceedings)
Designing Heterogeneous Systems: Large Scale Architectural Exploration Via Simulation
Reinventing High Performance Computing: Challenges and Opportunities (2022)
Challenges in Heterogeneous HPC White Paper (2022)
An Evolutionary Technical & Conceptual Review on High Performance Computing Systems (Dec 2021)
New Horizons for High-Performance Computing (2022)
CConfidential High-Performance Computing in the Public Cloud
Containerisation for High Performance Computing Systems: Survey and Prospects
Heterogeneous Computing Systems (2023)
Myths and Legends in High-Performance Computing
Energy-Aware Scheduling for High-Performance Computing Systems: A Survey
Ultimate Physical limits to computation - Seth Lloyd
Myths and Legends in High-Performance Computing
Abstract Machine Models and Proxy Architectures for Exascale Computing, 2014, Sandia National Laboratories and Lawrence Berkeley National Laboratory
Some thoughts on the environmental impact of High Performance Computing
A Research Retrospective on AMD's Exascale Computing Journey

News

InsideHPC
HPCWire
NextPlatform
Datacenter Dynamics
Admin Magazine HPC
Toms hardware
Tech Radar
Phoronix
The Register

Podcasts

This week in HPC
Preparing Applications for Aurora in the Exascale Era
Slurm podcast
HPCPodcast
Developer Stories - The path to a career in high performance computing is not always equitable or clear.
Developer Stories - HPCToolkit

Video Presentations/Courses/Channels

Argonne lectures on Extreme Scale Computing 2022
Argonne supercomputer tour
Containers in HPC - what they fix and what they break
HPC Tech Shorts
CppCon
Create a clustering server
Argonne national lab
Oak Ridge National Lab
Concurrency in C++20 and Beyond - A. Williams
Is Parallel Programming still Hard? - P. McKenney, M. Michael, and M. Wong at CppCon 2017
The Speed of Concurrency: Is Lock-free Faster? - Fedor G Pikus in CppCon 2016
Expressing Parallelism in C++ with Threading Building Blocks - Mike Voss at Intel Webinar 2018
A Work-stealing Runtime for Rust - Aaron Todd in Air Mozilla 2017
C++11/14/17 atomics and memory model: Before the story consumes you - Michael Wong in CppCon 2015
The C++ Memory Model - Valentin Ziegler at C++ Meeting 2014
Sharcnet HPC
Low Latency C++ for fun and profit
scalane python profiler
Kokkos lectures
EasyBuild Tech Talk I - The ABCs of Open MPI, part 1 (by Jeff Squyres & Ralph Castain)
The Spack 2022 Roadmap
A Not So Simple Matter of Software | Talk by Turing Award Winner Prof. Jack Dongarra
Vectorization/SIMD intrinsics
New Silicon for Supercomputers: A Guide for Software Engineers
TechTechPotato Channel
How to write the perfect hash table
FosDem 2024 HPC Big Data Conference videos
Bright Computing Cluster Management Technical Overview
What is HPC? An introduction by Canonical
Slurm job schedular basics
EasyBuild Tech Talk I - The ABCs of Open MPI, part 1 (by Jeff Squyres & Ralph Castain)

Presentation Slides

Task based Parallelism and why it's awesome - Pedro Gonnet
Tuning Slurm Scheduling for Optimal Responsiveness and Utilization
Parallel Programming Models Overview (2020)
Comparative Analysis of Kokkos and Sycl (Jeff Hammond)
Hybrid OpenMP/MPI Programming
Designs, Lessons and Advice from Building Large Distributed Systems - Jeff Dean (Google)
Practical Debugging and Performance Engineering

Building Clusters/Virtual Clusters

Resources for learning about HPC networks and storage r/HPC
Slurm for dummies guide
Build a cluster under 50k
Build a Beowulf cluster
Build a Raspberry Pi Cluster
Puget Systems
Lambda Systems
Titan computers
Temple course on building/maintaining a cluster
Detailed reddit discussion on setting up a small cluster
Tiny titan - build a really cool pi supercomputer
Building an Intel HPC cluster with OpenHPC
Reddit r/HPC post on building clusters
Build a virtual cluster with PelicanHPC
Building a High-performance Computing Cluster Using FreeBSD
Supermicro GPU racks
VirtualOrfeo - Virtual HPC Cluster
Is there a reason to build a raspberry pi clluster

Forums

r/hpc
r/homelab
r/slurm

Careers

HPC University Careers search
HPC wire career site
HPC certification
HPC SysAdmin Jobs (reddit)
The United States Research Software Engineer Association
NCSA Internship
AI and Future HPC Job Prospect
HPC sys admin career (reddit)

Membership Clubs

Association for Computing Machinery
ETP4HPC
The SIGHPC Systems Professionals

Blogs

1024 Cores - Dmitry Vyukov
The Black Art of Concurrency - Internal Pointers
Cluster Monkey
Johnathon Dursi
Arm Vendor HPC blog
HPC Notes
Brendan Gregg Performance Blog
Performance engineering blog
Concurrency Freaks
Servers@Home
Dr.Bandwith Blog
Johnny's Software Lab
Daniel Lemire Blog
Gigabyte HPC Blog

Journals

IEEE Transactions on Parallel and Distributed Systems (TPDS)
Journal of Parallel and Distributed Computing

Conferences

ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming (PPoPP)
ACM Symposium on Parallel Algorithms and Architectures (SPAA)
SC conference (SC)
IEEE International Parallel and Distributed Processing Symposium (IPDPS)
International Conference on Parallel Processing (ICPP)
IEEE High Performance Extreme Computing Conference (HPEC)
FosDem

Communities/Chat Groups

HPC Social Discord server
HPC Social slack group
HPC Social
Beowulf Mailing List

Twitters

Top500
HPE HPC
HPC Wire
Rookie HPC
HPC_Guru
Jeff Hammond

Consulting

Redline Performance
R systems
Advanced Clustering

Interview Preparation

Reddit Entry Level HPC interview help

Organizations

Prace
Xsede
Compute Canada
Riken CSS
Pawsey
International Data Corporation
List of Federally funded research and development centers

Interesting r/HPC posts

finding a supercomputer to use for research

Misc. Wikis

Amdahl's Law
HPC Wiki
FLOPS
Computational complexity of math operations
Many Task Computing
High Throughput Computing
Parallel Virtual Machine
OSI Model
Workflow management
Compute Canada Documentation
Network Interface Controller (NIC)
Just in time compilation
List of distributed computing projects
Computer cluster
Quasi-opportunistic supercomputing
Limits of Computation
Bremermann's Limit
Concurrency patterns
Parallel Computing
Server Management

Misc. Papers/Articles

Advanced Parallel Programming in C++
Tools for scientific computing
Quantum Computing for High Performance Computing
Benchmarking data science: Twelve ways to lie with statistics and performance on parallel computers.
Establishing the IO500 Benchmark
NVIDIA High Performance Computing articles
Let's write a superoptimizer
Why I think C++ is still a desirable coding platform compared to Rust
The State of Fortran (arxiv paper 2022)
50 years later, is two phase locking still the best
Estimating your memory bandwith

Misc. Repos

Build a Beowulf cluster
libsc - Supercomputing library
xbyak jit assembler
cpufetch - pretty cpu info fetcher
RRZE-HPC
Argonne Github
Argonne Leadership Computing Facility
Oak Ridge National Lab Github
Compute Canada
HPCInfo by Jeff Hammond
Texas Advanced Computing Center (TACC) Github
LANL HPC Github
Rust in HPC
University of Buffalo - Center for Computational Research
Center for High Performance Computing - University of Utah

Misc. Theses

Rust programming language in the high-performance computing environment

Misc.

Exascale Project
Pocket HPC Survival Guide
HPC Summer school
Overview of all linear algebra packages
Latency numbers
Nvidia HPC benchmarks
Intel Intrinsics Guide
AWS Cloud calculator
Quickly benchmark C++ functions
LLNL Software repository
Boinc - volunteer computing projects
Prace Training Events
Nice discussion on FlameGraph profiling
Nice discussion on parts of a supercomputer on reddit
Technical Report on C++ performance
BOINC Compute for science
Count prime numbers using MPI

Games/Challenges

Deadlock empire - practice concurrency
Sad Server - practice linux server management

Other Curated Lists

Awesome Cloud HPC
Parallel Computing Guide
Awesome Parallel Computing
Princeton resources on OpenMP
Awesome HPC
Sig HPC Education
Fortran Codes On Github
Fortran Tools

Acknowledgements

This repo started from the great curated list https://github.com/taskflow/awesome-parallel-computing

Files

README.md

Latest commit

History

README.md

File metadata and controls

Table of Contents

General Info

A Few Upcoming Supercomputers

Most Recent List of the Top500 Supercomputers

History

Trends

Software

Popular HPC Programming Libraries/APIs/Tools/Standards/Simulators

Cluster Hardware Discovery Tools

Cluster Management/Tools/Schedulers/Stacks

HPC-specific Operating Systems

Development/Workflow/Monitoring Tools for HPC

Debugging Tools for HPC

Performance/Benchmark Tools for HPC

IO/Visualization Tools for HPC

General Purpose Scientific Computing Libraries for HPC

Misc.

Wikis

Hardware

Interconnects/Topology

CPU

GPU

TPU/Tensor Cores

Many integrated core processor (MIC)

Cloud

Vendors

Articles/Papers

Custom/FPGA/ASIC/APU

Certification

Student Opportunities / Workshops

Other/Wikis

People

Resources

Books/Manuals

Courses

Tutorials/Guides/Articles

General

Machine Learning Related

Review Papers/Articles

News

Podcasts

Video Presentations/Courses/Channels

Presentation Slides

Building Clusters/Virtual Clusters

Forums

Careers

Membership Clubs

Blogs

Journals

Conferences

Communities/Chat Groups

Twitters

Consulting

Interview Preparation

Organizations

Interesting r/HPC posts

Misc. Wikis

Misc. Papers/Articles

Misc. Repos

Misc. Theses

Misc.

Games/Challenges

Other Curated Lists

Acknowledgements