-
Notifications
You must be signed in to change notification settings - Fork 1
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
1 parent
1c7dcc1
commit 47dadff
Showing
8 changed files
with
434 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,3 +1,4 @@ | ||
*.pyc | ||
__pycache__ | ||
*.egg-info | ||
.ipynb_checkpoints |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,23 @@ | ||
.reveal, | ||
.reveal h1, | ||
.reveal h2, | ||
.reveal h3, | ||
.reveal h4, | ||
.reveal h5, | ||
.reveal h6 { | ||
font-family: "Nunito"; | ||
} | ||
|
||
.p { | ||
font-family: "Assistant"; | ||
} | ||
|
||
body.notebook_app.rise-enabled { | ||
background: url('img/slide_background.png'); | ||
background-position: left top; | ||
height: 100%; | ||
width: 100%; | ||
background-repeat: no-repeat; | ||
background-size: auto 100%; | ||
padding-top: calc(50px + 2vh); | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,317 @@ | ||
{ | ||
"cells": [ | ||
{ | ||
"cell_type": "markdown", | ||
"id": "09f74f14", | ||
"metadata": { | ||
"slideshow": { | ||
"slide_type": "slide" | ||
} | ||
}, | ||
"source": [ | ||
" <div align=\"center\">\n", | ||
" \n", | ||
"# Kernel Tuner demo\n", | ||
"\n", | ||
"<br />\n", | ||
"<br />\n", | ||
"<br />\n", | ||
"<br />\n", | ||
"<br />\n", | ||
"<br />\n", | ||
"<br />\n", | ||
"\n", | ||
"\n", | ||
"By Ben van Werkhoven, Netherlands eScience Center <br />\n", | ||
"[email protected]\n", | ||
" \n", | ||
"</div>" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "305a09d5", | ||
"metadata": { | ||
"slideshow": { | ||
"slide_type": "notes" | ||
} | ||
}, | ||
"source": [ | ||
"alt+r to start the slideshow, spacebar or shift+spacebar to move forward to next slide, comma to remove on screen buttons\n", | ||
"\n", | ||
"preparation: run the next code cell, start a second terminal and go the the directory of this notebook" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"id": "4b5e909f", | ||
"metadata": { | ||
"slideshow": { | ||
"slide_type": "skip" | ||
} | ||
}, | ||
"outputs": [], | ||
"source": [ | ||
"%%bash\n", | ||
"rm matmul_cache.json\n", | ||
"rm vector_add.cu" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "0fbf8c5a", | ||
"metadata": { | ||
"slideshow": { | ||
"slide_type": "slide" | ||
} | ||
}, | ||
"source": [ | ||
"# Optimizing GPU Applications\n", | ||
"\n", | ||
"To maximize GPU code performance, you need to find the best combination of:\n", | ||
"\n", | ||
"* Different mappings of the problem to threads and thread blocks\n", | ||
"* Different data layouts in different memories (shared, constant, …)\n", | ||
"* Different ways of exploiting special hardware features\n", | ||
"* Thread block dimensions\n", | ||
"* Code optimizations that may be applied or not\n", | ||
"* Work per thread in each dimension\n", | ||
"* Loop unrolling factors\n", | ||
"* Overlapping computation and communication\n", | ||
"* ...\n", | ||
"\n", | ||
"<font color=red>Problem</font>:\n", | ||
"* Creates a very large design space!" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "59e489ff", | ||
"metadata": { | ||
"slideshow": { | ||
"slide_type": "slide" | ||
} | ||
}, | ||
"source": [ | ||
"# Kernel Tuner\n", | ||
"\n", | ||
"*A Python tool for optimizing and tuning GPU applications*\n", | ||
"\n", | ||
"Started in 2016:\n", | ||
"* As a software development tool for GPU projects at the eScience center\n", | ||
"* To be used directly on existing kernels\n", | ||
"* Without inserting dependences in the kernel code\n", | ||
"* Kernels can still be compiled with regular compilers\n", | ||
"\n", | ||
"Today:\n", | ||
"* Comprehensive toolbox for auto-tuning with several tools being built on top\n", | ||
"* Developed by a team of 7 developers across CWI, Astron, and eScience center\n", | ||
"* Used in over 10 different eScience center projects and by others\n", | ||
"\n", | ||
"https://github.com/KernelTuner/kernel_tuner" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "4e2675bf", | ||
"metadata": { | ||
"slideshow": { | ||
"slide_type": "slide" | ||
} | ||
}, | ||
"source": [ | ||
"# Minimal Example" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"id": "169859e8", | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"%%writefile vector_add.cu\n", | ||
"__global__ void vector_add(float *c, float *a, float *b, int n) {\n", | ||
" int i = blockIdx.x * block_size_x + threadIdx.x;\n", | ||
" if (i<n) {\n", | ||
" c[i] = a[i] + b[i];\n", | ||
" }\n", | ||
"}" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"id": "22f3e90d", | ||
"metadata": { | ||
"scrolled": false | ||
}, | ||
"outputs": [], | ||
"source": [ | ||
"import numpy as np\n", | ||
"import kernel_tuner as kt\n", | ||
"\n", | ||
"size = 1000000\n", | ||
"\n", | ||
"a = np.random.randn(size).astype(np.float32)\n", | ||
"b = np.random.randn(size).astype(np.float32)\n", | ||
"c = np.zeros_like(b)\n", | ||
"args = [c, a, b, np.int32(size)]\n", | ||
"\n", | ||
"tune_params = dict()\n", | ||
"tune_params[\"block_size_x\"] = [32, 64, 128, 256, 512]\n", | ||
"\n", | ||
"_ = kt.tune_kernel(\"vector_add\", \"vector_add.cu\", size, args, tune_params)" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "b63c7ea9", | ||
"metadata": { | ||
"slideshow": { | ||
"slide_type": "slide" | ||
} | ||
}, | ||
"source": [ | ||
"<img src=\"img/dashboard_logo.png\" style=\"height:100px;\">\n", | ||
"\n", | ||
"<div align=\"left\">\n", | ||
"\n", | ||
"#### Live visualizations of auto-tuning sessions using Kernel Tuner\n", | ||
"\n", | ||
"<img src=\"img/dashboard.png\" style=\"height:40%;\">\n", | ||
" \n", | ||
"https://github.com/KernelTuner/dashboard\n", | ||
"</div>" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "57b5db81", | ||
"metadata": { | ||
"slideshow": { | ||
"slide_type": "slide" | ||
} | ||
}, | ||
"source": [ | ||
"# Tuning a larger problem" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"id": "fed6931e", | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"from collections import OrderedDict\n", | ||
"problem_size = (512, 512)\n", | ||
"A = np.random.randn(*problem_size).astype(np.float32)\n", | ||
"B = np.random.randn(*problem_size).astype(np.float32)\n", | ||
"C = np.zeros_like(A)\n", | ||
"\n", | ||
"args = [C, A, B]\n", | ||
"\n", | ||
"tune_params = OrderedDict()\n", | ||
"tune_params[\"block_size_x\"] = [2**i for i in range(0, 11)]\n", | ||
"tune_params[\"block_size_y\"] = [2**i for i in range(0, 11)]\n", | ||
"tune_params[\"tile_size_x\"] = [2**i for i in range(0, 6)]\n", | ||
"tune_params[\"tile_size_y\"] = [2**i for i in range(0, 6)]\n", | ||
"\n", | ||
"restrict = [\"block_size_x == block_size_y * tile_size_y\"]\n", | ||
"grid_div_x = [\"block_size_x\", \"tile_size_x\"]\n", | ||
"grid_div_y = [\"block_size_y\", \"tile_size_y\"]\n", | ||
"\n", | ||
"from kernel_tuner.nvml import NVMLObserver\n", | ||
"nvml_observer = NVMLObserver([\"nvml_energy\", \"temperature\", \"core_freq\"])\n", | ||
"\n", | ||
"metrics = OrderedDict()\n", | ||
"metrics[\"GFLOP/s\"] = lambda p : (2 * 512**3 / 1e9) / (p[\"time\"] / 1e3)\n", | ||
"metrics[\"GFLOPs/W\"] = lambda p : (2 * 512**3 / 1e9) / (p[\"nvml_energy\"])\n", | ||
"\n", | ||
"_ = kt.tune_kernel(\"matmul_kernel\", \"matmul.cu\", problem_size, args, tune_params,\n", | ||
" observers=[nvml_observer], grid_div_y=grid_div_y, grid_div_x=grid_div_x,\n", | ||
" restrictions=restrict, metrics=metrics, cache=\"matmul_cache.json\")" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "3778bbd1", | ||
"metadata": { | ||
"slideshow": { | ||
"slide_type": "skip" | ||
} | ||
}, | ||
"source": [ | ||
"While the previous cell is running go to second terminal and type \"ktdashboard matmul_cache.json\"" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "a7377204", | ||
"metadata": { | ||
"slideshow": { | ||
"slide_type": "slide" | ||
} | ||
}, | ||
"source": [ | ||
"# Final remarks\n", | ||
"\n", | ||
"Currently, using Kernel Tuner to optimize and tune code in:\n", | ||
"* Ultrasound Brain Imaging with Erasmus MC (RECRUIT)\n", | ||
"* Atmospheric Modeling (ESiWACE-2 Microhh)\n", | ||
"* Radio Astronomy (CORTEX)\n", | ||
"\n", | ||
"Kernel Tuner can also be used for optimizing the energy efficiency of GPU applications:\n", | ||
"> Going green: optimizing GPUs for energy efficiency through model-steered auto-tuning <br />\n", | ||
"R. Schoonhoven, B. Veenboer, B. van Werkhoven, K. J. Batenburg <br />\n", | ||
"International Workshop on Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems (PMBS) at Supercomputing (SC22) 2022 \n", | ||
"\n", | ||
"Main repository: <br />\n", | ||
"https://github.com/KernelTuner/kernel_tuner <br />\n", | ||
"Documentation: <br /> \n", | ||
"https://KernelTuner.github.io <br />\n", | ||
"Tutorial: <br />\n", | ||
"https://github.com/KernelTuner/kernel_tuner_tutorial <br />" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"id": "6f89b4b0", | ||
"metadata": { | ||
"slideshow": { | ||
"slide_type": "skip" | ||
} | ||
}, | ||
"outputs": [], | ||
"source": [] | ||
} | ||
], | ||
"metadata": { | ||
"celltoolbar": "Slideshow", | ||
"kernelspec": { | ||
"display_name": "Python 3 (ipykernel)", | ||
"language": "python", | ||
"name": "python3" | ||
}, | ||
"language_info": { | ||
"codemirror_mode": { | ||
"name": "ipython", | ||
"version": 3 | ||
}, | ||
"file_extension": ".py", | ||
"mimetype": "text/x-python", | ||
"name": "python", | ||
"nbconvert_exporter": "python", | ||
"pygments_lexer": "ipython3", | ||
"version": "3.9.12" | ||
}, | ||
"rise": { | ||
"transition": "none" | ||
} | ||
}, | ||
"nbformat": 4, | ||
"nbformat_minor": 5 | ||
} |
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Oops, something went wrong.