C++ links: computer architecture - GPU
See also: computer architecture
- A closer look at GPUs
- CACM 2008
- Fatahalian, K., & Houston, M.
- http://graphics.stanford.edu/~kayvonf/papers/fatahalianCACM.pdf
- AMD’s Cayman GPU Architecture - http://www.realworldtech.com/cayman/
- Benchmarking the cost of thread divergence in CUDA - https://arxiv.org/abs/1504.01650
- Broadcom VideoCore IV GPU
- Life of a Triangle - https://latchup.blogspot.com/2016/02/life-of-triangle.html
- VideoCore QPU Pipeline - https://latchup.blogspot.com/2016/03/videocore-qpu-pipeline.html
- Demystifying GPU Microarchitecture through Microbenchmarking - http://www.eecg.toronto.edu/~myrto/gpuarch-ispass2010.pdf - microbenchmark suite: http://www.stuffedcow.net/research/cudabmk
- GPU Concurrency: Weak Behaviours and Programming Assumptions
- Architectural Support for Programming Languages and Operating Systems (ASPLOS) 2015
- Alglave, J.; Batty, M.; Donaldson, A. F.; Gopalakrishnan, G.; Ketema, J.; Poetzl, D.; Sorensen, T.; and Wickerson, J.
- http://johnwickerson.github.io/papers/gpuconcurrency.pdf
- http://multicore.doc.ic.ac.uk/gpu-litmus/
- GPU Performance Modeling and Optimization - Ang Li
- GPUs and the Future of Parallel Computing
- IEEE Micro 31(5) 2011
- Stephen W. Keckler, William J. Dally, Brucek Khailany, Michael Garland, David Glasco
- http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.232.1574&rep=rep1&type=pdf
- HAXWell - Joshua Barczak
- Code which loads custom ISA on Intel Haswell GPUs - https://github.com/jbarczak/HAXWell
- You Compiled This, Driver. Trust Me… - http://www.joshbarczak.com/blog/?p=1028
- SPMD Is Not Intel’s Cup Of Tea - http://www.joshbarczak.com/blog/?p=1120
- GPU Ray Tracing The Wrong Way - http://www.joshbarczak.com/blog/?p=1197
- Inside Fermi: Nvidia’s HPC Push - http://www.realworldtech.com/fermi/
- Intel Processor Graphics: Microarchitecture and ISA, Tutorial, MICRO 2016
- Low-Level GPU Documentation - http://renderingpipeline.com/graphics-literature/low-level-gpu-documentation/
- NVIDIA Tesla: A Unified Graphics and Computing Architecture
- IEEE Micro 2008
- Lindholm, E., Nickolls, J., Oberman, S., & Montrym, J.
- http://people.cs.umass.edu/~emery/classes/cmpsci691st/readings/Arch/gpu.pdf
- NVIDIA’s GT200: Inside a Parallel Processor - http://www.realworldtech.com/gt200/
- Patterson, Hennessy (2016): Computer Organization and Design: The Hardware/Software Interface ARM Edition - Appendix B Graphics and Computing GPUs - http://booksite.elsevier.com/9780128017333/content/Appendix%20B.pdf
- Performance Analysis and Tuning for General Purpose Graphics Processing Units (GPGPU)
- H. Kim, R. Vuduc, S. Baghsorkhi, J. Choi, W.-m. Hwu, 2012.
- http://impact.crhc.illinois.edu/shared/papers/sara2012.pdf
- http://impact.crhc.illinois.edu/paper_details.aspx?paper_id=203
- Predicting AMD and Nvidia GPU Performance - http://www.realworldtech.com/amd-nvidia-gpu-performance/
- Understanding Latency Hiding on GPUs
- Vasily Volkov; EECS Department; University of California, Berkeley; Technical Report No. UCB/EECS-2016-143; August 12, 2016
- https://www2.eecs.berkeley.edu/Pubs/TechRpts/2016/EECS-2016-143.html
- Understanding the GPU Microarchitecture to Achieve Bare-Metal Performance Tuning (PPoPP 2017)
- Wilt (2013) "The CUDA Handbook: A Comprehensive Guide to GPU Programming"
- http://www.cudahandbook.com/
- Chapter 8 (Streaming Multiprocessors) sample chapter: HTML PDF
- Intro to Parallel Programming
- CUDA C Programming Guide - http://docs.nvidia.com/cuda/cuda-c-programming-guide/
- CUDA C Best Practices Guide - http://docs.nvidia.com/cuda/cuda-c-best-practices-guide/
- CUDA Toolkit Documentation - http://docs.nvidia.com/cuda/
- How to Optimize a CUDA Matmul Kernel for cuBLAS-like Performance: a Worklog
- 2022; Simon Boehm
- https://siboehm.com/articles/22/CUDA-MMM
- NVBench: CUDA Kernel Benchmarking Library
- CUDA Memory Model
- CUDA Community Meetup Group 2021-01-04
- Georgy Evtushenko
- https://www.youtube.com/watch?v=VJ1QLrmfQws
- https://github.com/CUDACommunity/CUDACommunityMeetup2021/tree/master/CUDAMemoryModel
- The CUDA C++ Standard Library
- Meeting C++ 2021
- Bryce Adelstein Lelbach
- https://www.youtube.com/watch?v=-ENnYEWezKo
- GPLGPU
- MIAOW
- An open source GPU based off of the AMD Southern Islands ISA.
- http://miaowgpu.org/
- https://github.com/VerticalResearchGroup/miaow
- https://github.com/VerticalResearchGroup/miaow/wiki
- MIAOW Architecture Whitepaper [https://raw.githubusercontent.com/wiki/VerticalResearchGroup/miaow/files/MIAOW_Architecture_Whitepaper.pdf]
- Nyuzi Processor
- Nyuzi is an experimental multicore GPGPU processor. It supports vector floating point, hardware multithreading, virtual memory, and cache coherence. The SystemVerilog-based hardware implementation is synthesizable and runs on FPGA. This project also includes an LLVM-based C++ toolchain.
- https://github.com/jbush001/NyuziProcessor
- https://github.com/jbush001/NyuziProcessor/wiki
- ORGFXSoC: ORSoC Graphics Accelerator
- An example implementation of Open Source Graphics Accelerator (a fixed point, fixed function pipeline GPU).
- https://github.com/maidenone/ORGFXSoC
- http://opencores.org/project,orsoc_graphics_accelerator
- Theia GPU Overview - http://opencores.org/project,theia_gpu
- Optimization Techniques for GPU Programming
- ACM Computing Surveys 2022
- Pieter Hijma, Stijn Heldens, Alessio Sclocco, Ben Van Werkhoven, Henri Bal
- https://dl.acm.org/doi/10.1145/3570638
- GPU Vendor/Programming Model Compatibility Table
- 2022, JSC Accelerating Devices Lab Blog
- Andreas Herten
- https://doi.org/10.34732/xdvblg-r1bvif
- https://x-dev.pages.jsc.fz-juelich.de/2022/11/02/gpu-vendor-model-compat.html
- https://github.com/AndiH/gpu-lang-compat
- PerfTest: GPU texture/buffer performance tester
- A simple GPU shader memory operation performance test tool. Current implementation is DirectX 11.0 based.
- https://github.com/sebbbi/perftest
- Pyramid Shader Analyzer
- Pyramid is a free, open GUI tool for offline shader validation and analysis. The UI takes HLSL or GLSL as input, and runs them through various shader compilers and static analyzers.
- https://github.com/jbarczak/Pyramid
- Barra - NVIDIA GPU Architecture Simulator
- GPGPU-Sim
- Integrated gem5 + GPGPU-Sim Simulator
- http://cpu-gpu-sim.ece.wisc.edu/
- gem5-gpu: A Heterogeneous CPU-GPU Simulator
- IEEE Computer Architecture Letters 14(1) 2015
- J. Power, J. Hestness, M.S. Orr, M.D. Hill, D.A. Wood
- http://ieeexplore.ieee.org/document/6709764/
- https://www.researchgate.net/publication/274858518_Gem5-gpu_A_heterogeneous_CPU-GPU_simulator
- MacSim
- A cycle-level, heterogeneous architecture simulator for x86 and NVIDIA PTX instructions.
- http://comparch.gatech.edu/hparch/macsim.html
- https://github.com/gthparch/macsim
- Multi2Sim: A Heterogeneous System Simulator
- GPU Architectures and New Programming Model Features
- Argonne Training Program in Extreme Scale Computing (ATPESC) 2016
- Nikolai Sakharnykh, NVIDIA
- video: https://www.youtube.com/watch?v=CWYx0HZ0zYM
- slides: http://press3.mcs.anl.gov/atpesc/files/2016/08/Sakharnykn_145aug1GPU_architecture.pdf
- Introduction to GPU Architecture and Programming Models
- Argonne Training Program in Extreme Scale Computing (ATPESC) 2018
- Tim Warburton, Virginia Tech
- video: https://www.youtube.com/watch?v=uvVy3CqpVbM&index=4&list=PLGj2a3KTwhRa6Ux64xg5L5ga6Jg8QykoQ
- examples: https://github.com/tcew/ATPESC18
- slides: http://extremecomputingtraining.anl.gov/files/2018/08/ATPESC_2018_Track-2_3_8-2_830am_Warburton-Accelerators.pdf
- Portable GPU Programming: Hands-on
- Argonne Training Program in Extreme Scale Computing (ATPESC) 2016
- Tim Warburton, Virginia Tech
- video: https://www.youtube.com/watch?v=I33WSjcvfpI