diff --git a/README.md b/README.md index bb276cefa6..8fcd639473 100644 --- a/README.md +++ b/README.md @@ -29,24 +29,41 @@ For more information, see our [paper](http://www.arxiv.org/abs/1902.10345). See an example SDFG [in the standalone viewer (SDFV)](https://spcl.github.io/dace/sdfv.html?url=https://spcl.github.io/dace/examples/gemm.sdfg). -Tutorials ---------- +Quick Start +----------- + +Install DaCe with pip: `pip install dace` + +Using DaCe in Python is as simple as adding a `@dace` decorator: +```python +import dace +import numpy as np + +@dace +def myprogram(a): + for i in range(a.shape[0]): + a[i] += i + return np.sum(a) +``` + +Calling `myprogram` with any NumPy array or `__{cuda_}array_interface__`-supporting tensor (e.g., PyTorch, Numba) will generate data-centric code, compile, and run it. From here on out, you can _optimize_ (interactively or automatically), _instrument_, and _distribute_ your code. The code creates a shared library (DLL/SO file) that can readily be used in any C ABI compatible language (C/C++, FORTRAN, etc.). + +For more information on how to use DaCe, see the [samples](samples) or tutorials below: * [Getting Started](https://nbviewer.jupyter.org/github/spcl/dace/blob/master/tutorials/getting_started.ipynb) +* [Benchmarks, Instrumentation, and Performance Comparison with Other Python Compilers](https://nbviewer.jupyter.org/github/spcl/dace/blob/master/tutorials/benchmarking.ipynb) * [Explicit Dataflow in Python](https://nbviewer.jupyter.org/github/spcl/dace/blob/master/tutorials/explicit.ipynb) * [NumPy API Reference](https://nbviewer.jupyter.org/github/spcl/dace/blob/master/tutorials/numpy_frontend.ipynb) * [SDFG API](https://nbviewer.jupyter.org/github/spcl/dace/blob/master/tutorials/sdfg_api.ipynb) * [Using and Creating Transformations](https://nbviewer.jupyter.org/github/spcl/dace/blob/master/tutorials/transformations.ipynb) * [Extending the Code Generator](https://nbviewer.jupyter.org/github/spcl/dace/blob/master/tutorials/codegen.ipynb) -Installation and Dependencies ------------------------------ - -To install: `pip install dace` +Dependencies +------------ Runtime dependencies: * A C++14-capable compiler (e.g., gcc 5.3+) - * Python 3.6 or newer + * Python 3.7 or newer (Python 3.6 is supported but not actively tested) * CMake 3.15 or newer Running diff --git a/tutorials/benchmarking.ipynb b/tutorials/benchmarking.ipynb new file mode 100644 index 0000000000..4e7e048bca --- /dev/null +++ b/tutorials/benchmarking.ipynb @@ -0,0 +1,1511 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "6d9dce90", + "metadata": {}, + "source": [ + "# Benchmarking\n", + "\n", + "In this tutorial we will compare DaCe with other popular Python-accelerating libraries. The NumPy results should be a bit faster if an optimized version is installed (for example, compiled with Intel MKL).\n", + "\n", + "Table of Contents:\n", + "* [Dependencies](#Dependencies)\n", + "* [Simple programs](#Simple-programs-with-multiple-operators)\n", + "* [Loops](#Loops)\n", + " * [Varying sizes](#Varying-sizes)\n", + "* [Auto-parallelization](#Auto-parallelization)\n", + "* [Example: 3D Heat Diffusion](#3D-Heat-Diffusion)\n", + "* [Benchmarking and Instrumentation API](#Benchmarking-and-Instrumentation-API)\n" + ] + }, + { + "cell_type": "markdown", + "id": "d0a07e84", + "metadata": {}, + "source": [ + "TL;DR DaCe is fast:\n", + "\n", + "![performance](performance.png \"performance\")" + ] + }, + { + "cell_type": "markdown", + "id": "93f7d90c", + "metadata": {}, + "source": [ + "## Dependencies\n", + "\n", + "First, let's make sure we have all the frameworks ready to go:" + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "id": "dbb480ef", + "metadata": { + "scrolled": true + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "...\n" + ] + } + ], + "source": [ + "%pip install jax jaxlib\n", + "%pip install numba\n", + "%pip install pythran\n", + "# Your library here" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "id": "b2b10c42", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "...\n" + ] + } + ], + "source": [ + "# MKL for performance\n", + "%conda install mkl mkl-include mkl-devel\n", + "\n", + "# matplotlib to draw the results\n", + "%pip install matplotlib" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "id": "927781f2", + "metadata": {}, + "outputs": [], + "source": [ + "# Setup code for plotting\n", + "import matplotlib.pyplot as plt\n", + "\n", + "def barplot(title, labels=False):\n", + " x = ['numpy'] + list(sorted(TIMES.keys() - {'numpy'}))\n", + " bars = [np.median(TIMES[key].timings) for key in x]\n", + " yerr = [np.std(TIMES[key].timings) for key in x]\n", + " color = [('#86add9' if 'dace' in key else 'salmon') for key in x]\n", + "\n", + " p = plt.bar(x, bars, yerr=yerr, color=color)\n", + " plt.ylabel('Runtime [s]'); plt.xlabel('Implementation'); plt.title(title); \n", + " if labels:\n", + " plt.gca().bar_label(p)\n", + " pass" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "id": "317721fd", + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "\n", + " \n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n" + ], + "text/plain": [ + "" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "# Setup code for benchmarked frameworks\n", + "import numpy as np\n", + "import jax\n", + "import numba\n", + "import dace" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "id": "46238b6e", + "metadata": {}, + "outputs": [], + "source": [ + "# Pythran loads in a separate cell\n", + "%load_ext pythran.magic" + ] + }, + { + "cell_type": "markdown", + "id": "0e44fb94", + "metadata": {}, + "source": [ + "## Simple programs with multiple operators\n", + "\n", + "Let's start with a basic program with three different operations. This example program was taken from the [JAX README](https://github.com/google/jax#compilation-with-jit):" + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "id": "d9828ae7", + "metadata": {}, + "outputs": [], + "source": [ + "def slow_f(x):\n", + " return x * x + x * 2.0" + ] + }, + { + "cell_type": "markdown", + "id": "74047bc8", + "metadata": {}, + "source": [ + "First, let's measure the performance of NumPy as-is on this function:" + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "id": "afe8910d", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "68.6 ms ± 2.36 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)\n" + ] + } + ], + "source": [ + "a = np.random.rand(5000, 5000)\n", + "\n", + "TIMES = {}\n", + "\n", + "TIMES['numpy'] = %timeit -o slow_f(a)" + ] + }, + { + "cell_type": "markdown", + "id": "0d5afed2", + "metadata": {}, + "source": [ + "Now we can construct Just-In-Time (JIT) compiled versions of this function, for each framework:" + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "id": "f66e04d1", + "metadata": {}, + "outputs": [], + "source": [ + "jax_f = jax.jit(slow_f)\n", + "numba_f = numba.jit(slow_f)\n", + "dace_f = dace.program(auto_optimize=True)(slow_f)" + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "id": "8b6f4f7b", + "metadata": {}, + "outputs": [], + "source": [ + "%%pythran\n", + "#pythran export pythran_f(float64[:,:])\n", + "def pythran_f(x):\n", + " return x * x + x * 2.0" + ] + }, + { + "cell_type": "markdown", + "id": "de148d29", + "metadata": {}, + "source": [ + "Before we measure the time, we will run the functions first as a warmup, to allow compilers to run JIT compilation:" + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "id": "99491394", + "metadata": { + "scrolled": false + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "1.29 s ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)\n", + "323 ms ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)\n", + "1.23 s ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)\n" + ] + } + ], + "source": [ + "# On your marks...\n", + "%timeit -r 1 -n 1 jax_f(a).block_until_ready()\n", + "%timeit -r 1 -n 1 numba_f(a)\n", + "%timeit -r 1 -n 1 dace_f(a)\n", + "%timeit -r 1 -n 1 pythran_f(a)\n", + "pass\n", + "# ...get set..." + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "id": "067febc9", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "43.6 ms ± 4.87 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)\n" + ] + } + ], + "source": [ + "# ...Go!\n", + "TIMES['jax'] = %timeit -o jax_f(a).block_until_ready()" + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "id": "e7f811ff", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "27.8 ms ± 3.97 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)\n" + ] + } + ], + "source": [ + "TIMES['numba'] = %timeit -o numba_f(a)" + ] + }, + { + "cell_type": "code", + "execution_count": 13, + "id": "e6d98ce6", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "31.3 ms ± 5.15 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)\n" + ] + } + ], + "source": [ + "TIMES['pythran'] = %timeit -o pythran_f(a)" + ] + }, + { + "cell_type": "code", + "execution_count": 14, + "id": "9db35692", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "25.7 ms ± 2.61 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)\n" + ] + } + ], + "source": [ + "TIMES['dace_jit'] = %timeit -o dace_f(a)" + ] + }, + { + "cell_type": "markdown", + "id": "4bb9be46", + "metadata": {}, + "source": [ + "You could also precompile the program for faster runtimes (be aware that the return value is retained across calls!):" + ] + }, + { + "cell_type": "code", + "execution_count": 15, + "id": "a3c0702e", + "metadata": {}, + "outputs": [], + "source": [ + "# Either provide type annotations on the `@dace.program`, or call `compile` with sample arguments\n", + "cprog = dace_f.compile(a)" + ] + }, + { + "cell_type": "code", + "execution_count": 16, + "id": "d0754f47", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "21.5 ms ± 1.6 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)\n" + ] + } + ], + "source": [ + "TIMES['dace'] = %timeit -o cprog(a)" + ] + }, + { + "cell_type": "markdown", + "id": "78d0830a", + "metadata": {}, + "source": [ + "We can now plot the results:" + ] + }, + { + "cell_type": "code", + "execution_count": 17, + "id": "01ae5917", + "metadata": { + "scrolled": true + }, + "outputs": [ + { + "data": { + "image/png": "iVBORw0KGgoAAAANSUhEUgAAAYgAAAEXCAYAAAC3c9OwAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjQuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8rg+JYAAAACXBIWXMAAAsTAAALEwEAmpwYAAAj20lEQVR4nO3de7xVdZ3/8ddbUNQULGScBBFLzLCSEqUabU46ljbTYDNeMPP2MxmbyKaLk9ak6FhpM0aldqFkZNRSw6lOhoOpoWlKgOIFlToiBegkIF5QUcDP74/v9+his84+Gzzr7HMO7+fjsR9nXb7ruz5r7b3PZ3/X5bsUEZiZmdXaqtkBmJlZz+QEYWZmpZwgzMyslBOEmZmVcoIwM7NSThBmZlbKCaKPknScpBsrqvtySedXUbeVkzRJ0pV15jf8fks6SdLtXRed9VVOEL2YpAMl/VbS05KelHSHpP0BIuKqiPhAs2O0ridphKSQ1L992pb4fud9sGez4+jL+ndexHoiSQOB64FPANcC2wAHAS82M66uJKl/RKzroroEKCJe7or6rFpd+d43o/6+wi2I3msvgIj4cUSsj4gXIuLGiLgPNj6MkH9t/bOkP0h6VtK/S3pzboE8I+laSdvksi2Slkr6oqQVkhZLOq6jQCT9naT5kp7K9b2jTtmQdLqkRbnu/5C0VSHmOyRNlrQSmCRpkKT/lrRc0h8l/VuhfD9JF+V6HpU0sfjLWtIsSV+RdAfwPPAmSSdLeijvg0WS/qkQW/t2/6ukJyQ9LukISR+S9PvcSvtio29Q3m9nSLpP0nOSLpO0i6Qb8vpvkvT64rpLlv+bkqpvy3+fkrRa0ns6eL9L93NJnHtL+lXevoWSjq6zTbtKas1l2ySdWpg3SdJ0Sdfk7btb0r41y16X38tHJZ1esuyVkp4BTpJ0gKQ78+fqcUmXFD6j7fvg3rwPjsnTT81xPZnj3LVmn3xS0h+APyiZnN/rZyTdL+ltHW37Fiki/OqFL2AgsBKYBhwOvL5m/knA7YXxAH6el9uH1NK4GXgTMAh4EDgxl20B1gHfAAYAfw08B7wlz78cOD8PvxN4AhgL9ANOBBYDAzqIO4BfA28AhgO/Bz5eiHkd8ClS63Y74L9z3DsCI3L5U3L503Lcw4DXAzfl+vvn+bOAP+Xt7Q9sDfwt8GZAebueB95Vs91n57KnAsuBH+X17wO8AOzR4Hu0GLgL2AUYmvfT3XmfbQvcApxTWPfSkuX/Jg9PAq7MwyOK21nn/a63n2/Pw68DlgAn5330TmAFMKqDbboN+E6Of3TePwcXYlwLHJn33+eBR/PwVsC8vG+3IX3uFgEfrFn2iFx2O2A/4N05rhHAQ8C/1GzjnoXxg3Ps7yJ9bi8Gbqsp/6u8T7YDPphj2il/Ht4KvLHZ3+2e9HILopeKiGeAA0kf+h8Ay/Mvpl3qLPb1iHgmIhYADwA3RsSiiHgauIH0z6HoyxHxYkTcCvwSKPtlOQH4fkTMjtSSmUZKPu+uE8eFEfFkRPwJ+CZwbGHeYxFxcaTm/0vAeOCsiHg2IhYDFwHH57JHA9+KiKURsQq4oGRdl0fEgohYFxFrI+KXEfFIJLcCN5IOzbVbC3wlItYCVwM753U8m/fbg8C+G62lYxdHxJ8jYhnwG2B2RNwTEWuAn7LxPu9K9fZzu78DFkfEf+V9dA9wHXBUbUFJuwF/BXwhItZExHzgh8AJhWLzImJ63n/fICWSdwP7A0Mi4ryIeCkiFpE+t+MLy94ZET+LiJcjtYjnRcRdOa7FwPdJSb0jxwFTI+LuiHgROAt4j6QRhTJfy/vkBdJ7vSOwN+nw40MR8Xid+rc4ThC9WP5AnxQRw4C3AbuS/hF05M+F4RdKxncojK+KiOcK43/M9dfaHfhcPgzwlKSngN06KNtuSZ16i/N2Jv36/GNN+aF5eNea8sXh0mmSDpd0Vz4E8RTwobyedisjYn0efiH/rbefOrMp+7yr1dvP7XYHxta8f8cBf1lSdlfgyYh4tqbeoYXxV9YZ6XzP0rzc7sCuNev5Iql1VRYvkvaSdL2k/8uHnb7Khu9VWXyvfFYiYjWpld1RfLcAlwCXAk9ImqJ0bs8yJ4g+IiIeJh366apjqK+X9LrC+HDgsZJyS0i/uHcqvLaPiB/XqXu3OvUWuxdeQfqVt3tN+WV5+HHS4aWyejeqT9IA0q/j/wR2iYidgBmkwwvN9hywffuIpH7AkA7KNtoFc7393G4JcGvN+7dDRHyipOxjwBsk7VhT77LC+CvrzOc8huXllgCP1qxnx4j4UJ3t+i7wMDAyIgaSEkq99+oxCp+V/PkdXBPfBuuIiG9HxH7AKNJ5vTPq1L/FcYLopfKJxc9JGpbHdyMdQrirC1dzrqRtJB1EOhTxk5IyPwBOkzQ2n/R7naS/rfknUusMSa/PMX8auKasUP4lfy3wFUk7Stod+CzQfj/AtcCnJQ2VtBPwhU62ZxvSsenlwDpJhwObfWloPrHcVf3l/x7YNu+7rYF/I8VaZjnwMuk4fj2N7Ofrgb0kHS9p6/zaX9JbawtGxBLgt8DXJG2rdDHCKbz6fgDsJ+kflC4U+BfS4ca7gN8Bz0r6gqTtlC4weJvyZdkd2BF4BlgtaW/SFXtFf67ZBz8GTpY0Ov8Y+CrpkN7issrzdo7N+/s5YA1pv1rmBNF7PUs6MTxb0nOkL+EDwOe6qP7/A1aRfpVdBZyWWykbiIi5pJO5l+TybaSToPX8nHRycD7p3MZldcp+ivTlXQTcTjphPDXP+wHpHMJ9wD2k1sA6YP3G1UA+NHI6KbGsAj4KtHYSaz27kf5hvmb5PNA/k47pLyNt89IOyj4PfAW4Ix+u6eh8T6f7Oe+TD5DOBTxGet8vpOPkdCzphPFjpHMo50TETTXrPIa0f48H/iGf+1lP+pExmnTiekXe1kEdrAfSSe6Pkj7rP2DjBDcJmJb3wdE5ji+TWomPky5GGE/HBuZ6V5EOTa0E/qNO+S2OIvzAINuQpBbSFTPDOim6OXUH6ZBBWwV1Hw58LyJ277Rw16zvh8BPImJmd6xvU1S5n+uscxLpqqKPddc6rVq+Uc56LUnbAe8ntSJ2Ac4h/artFhHx8e5al1kz+BCT9WYCziUdIriHdJ382U2NyKwP8SEmMzMr5RaEmZmV6jPnIHbeeecYMWJEs8MwM+tV5s2btyIiSu+56TMJYsSIEcydO7fZYZiZ9SqS/tjRPB9iMjOzUk4QZmZWygnCzMxKOUGYmVkpJwgzMyvlBGFmZqWcIMzMrJQThJmZlXKCMDOzUk4QFWlpaaGlpaXZYZiZbbZKE4SkwyQtlNQm6cyS+QMkXZPnz5Y0Ik8/TtL8wutlSaOrjNXMzDZUWYLID12/FDic9EDwYyWNqil2CrAqIvYEJpMedUhEXBURoyNiNOmxhY9GxPyqYjUzs41V2YI4AGiLiEUR8RJwNTCupsw4YFoeng4cIkk1ZY7Ny5qZWTeqMkEMBZYUxpfmaaVlImId8DQwuKbMMcCPK4rRzMw60KNPUksaCzwfEQ90MH+CpLmS5i5fvrybozMz69uqTBDLgN0K48PytNIykvoDg4CVhfnjqdN6iIgpETEmIsYMGVL6vAszM9tMVT4waA4wUtIepEQwHvhoTZlW4ETgTuBI4JbID8mWtBVwNHBQhTG+Yu25n+vS+mLxI5XUu/U5F3VpfWZmHaksQUTEOkkTgZlAP2BqRCyQdB4wNyJagcuAKyS1AU+Skki79wFLImJRVTGamVnHKn3kaETMAGbUTDu7MLwGOKqDZWcB764yPjMz61iPPkltZmbN4wRhZmalnCDMzKxUpecgtmQ3nXx0s0MwM3tN3IIwM7NSThBmZlbKCcLMzEo5QZiZWSknCDMzK+UEYWZmpZwgzMyslBOEmZmVcoIwM7NSThBmZlbKCcLMzEo5QZiZWSknCDMzK+UEYWZmpZwgzMyslBOEmZmVqjRBSDpM0kJJbZLOLJk/QNI1ef5sSSMK894h6U5JCyTdL2nbKmM1M7MNVZYgJPUDLgUOB0YBx0oaVVPsFGBVROwJTAYuzMv2B64ETouIfYAWYG1VsZqZ2caqbEEcALRFxKKIeAm4GhhXU2YcMC0PTwcOkSTgA8B9EXEvQESsjIj1FcZqZmY1qkwQQ4ElhfGleVppmYhYBzwNDAb2AkLSTEl3S/rXCuM0M7MS/ZsdQAf6AwcC+wPPAzdLmhcRNxcLSZoATAAYPnx4twdpZtaXVdmCWAbsVhgflqeVlsnnHQYBK0mtjdsiYkVEPA/MAN5Vu4KImBIRYyJizJAhQyrYBDOzLVeVCWIOMFLSHpK2AcYDrTVlWoET8/CRwC0REcBM4O2Sts+J46+BByuM1czMalR2iCki1kmaSPpn3w+YGhELJJ0HzI2IVuAy4ApJbcCTpCRCRKyS9A1SkglgRkT8sqpYzcxsY5Weg4iIGaTDQ8VpZxeG1wBHdbDslaRLXc3MrAl8J7WZmZVygjAzs1JOEGZmVsoJwszMSjlBmJlZKScIMzMr5QRhZmalnCDMzKyUE4SZmZVygjAzs1JOEGZmVsoJwszMSjlBmJlZKScIMzMr5QRhZmalnCDMzKyUE4SZmZVygjAzs1JOEGZmVsoJwszMSlWaICQdJmmhpDZJZ5bMHyDpmjx/tqQRefoISS9Imp9f36syTjMz21j/qiqW1A+4FDgUWArMkdQaEQ8Wip0CrIqIPSWNBy4EjsnzHomI0VXFZ2Zm9VXZgjgAaIuIRRHxEnA1MK6mzDhgWh6eDhwiSRXGZGZmDaoyQQwFlhTGl+ZppWUiYh3wNDA4z9tD0j2SbpV0UNkKJE2QNFfS3OXLl3dt9GZmW7ieepL6cWB4RLwT+CzwI0kDawtFxJSIGBMRY4YMGdLtQZqZ9WVVJohlwG6F8WF5WmkZSf2BQcDKiHgxIlYCRMQ84BFgrwpjNTOzGlUmiDnASEl7SNoGGA+01pRpBU7Mw0cCt0RESBqST3Ij6U3ASGBRhbGamVmNyq5iioh1kiYCM4F+wNSIWCDpPGBuRLQClwFXSGoDniQlEYD3AedJWgu8DJwWEU9WFauZmW2ssgQBEBEzgBk1084uDK8BjipZ7jrguipjMzOz+nrqSWozM2syJwizPqKlpYWWlpZmh2F9iBOEmZmVqnsOQtIbGqjj5Yh4qmvCMTOznqKzk9SP5Ve97i/6AcO7LCIzM+sROksQD+W7mTsk6Z4ujMfMzHqIzs5BvKeBOhopY2ZmvUzdBJHvU0DSmyUNyMMtkk6XtFOxjJmZ9S2NXsV0HbBe0p7AFFL/ST+qLCozM2u6RhPEy7k77o8AF0fEGcAbqwvLzMyardEEsVbSsaSO9a7P07auJiQzM+sJGk0QJ5NORn8lIh6VtAdwRXVhmZlZszXUWV9+jvTphfFHSc+PNjOzPqpuC0LSlM4qaKSMmZn1Pp21II6QVO8yVgHv78J4zMysh+gsQZzRQB2/6YpAzLY0a8/9XJfWF4sfqaTerc+5qEvrs96jboKIiGndFYiZmfUs7u7bzMxKOUGYmVmpTUoQkravKhAzM+tZGkoQkt4r6UHg4Ty+r6TvNLDcYZIWSmqTdGbJ/AGSrsnzZ0saUTN/uKTVkj7f2OaYmVlXabQFMRn4ILASICLuBd5XbwFJ/YBLgcOBUcCxkkbVFDsFWBURe+Z11N589w3ghgZjNDOzLtTwIaaIWFIzaX0nixwAtEXEooh4CbgaGFdTZhzQfqXUdOAQSQKQdATwKLCg0RjNzKzrNJoglkh6LxCSts6HfB7qZJmhQDGpLM3TSsvk3mKfBgZL2gH4AnBuvRVImiBprqS5y5cvb3BTzMysEY0miNOAT5L+oS8DRufxqkwCJkfE6nqFImJKRIyJiDFDhgypMBwzsy1Po531rQCO28S6l5EeLNRuWJ5WVmappP7AINJ5jrHAkZK+DuwEvCxpTURcsokxmJnZZmooQeTuvT8FjCguExF/X2exOcDIvOwyYDzw0ZoyraRnTNwJHAncEhEBHFRY9yRgtZODmVn3aihBAD8DLgN+AbzcyAIRsU7SRGAm0A+YGhELJJ0HzI2I1lznFZLagCdJScTMzHqARhPEmoj49qZWHhEzgBk1084uDK8Bjuqkjkmbul4zM3vtGk0Q35J0DnAj8GL7xIi4u5KozGyT3XTy0c0OwfqYRhPE24HjgYN59RBT5HEzM+uDGk0QRwFvyje8mZnZFqDR+yAeIF1uamZmW4hGWxA7AQ9LmsOG5yDqXeZqZma9WKMJ4pxKozAzsx6n0Tupb606EDMz61nqJghJt0fEgZKeJV219MosICJiYKXRmZlZ09RNEBFxYP67Y/eEY2ZmPUWjT5S7opFpZmbWdzR6mes+xZHc8+p+XR+OmZn1FHUThKSz8vmHd0h6Jr+eBf4M/LxbIjQzs6aomyAi4mv5/MN/RMTA/NoxIgZHxFndFKOZmTVBo5e5niVpKLA7Gz4P4raqAjMzs+Zq9IFBF5Ce1fAgsD5PDsAJwsysQS0tLQDMmjWrqXE0qtE7qT8CvCUiXuy0pJmZ9QmNXsW0CNi6ykDMzKxnabQF8TwwX9LNbNhZ3+mVRGVmZk3XaIJozS8zM9tCNHoV07SqAzEzs56l0a42HpW0qPbVwHKHSVooqU3SmSXzB0i6Js+fLWlEnn6ApPn5da+kj2zylpmZ2WvS6CGmMYXhbUmPIH1DvQUk9QMuBQ4FlgJzJLVGxIOFYqcAqyJiT0njgQuBY0hPsBsTEeskvRG4V9IvImJdg/Gamdlr1FALIiJWFl7LIuKbwN92stgBQFtELMrPsr4aGFdTZhzQfvhqOnCIJEXE84VksC0bdjVuZluAlpaWV+4bsOZo9Ea5dxVGtyK1KDpbdiiwpDC+FBjbUZncWngaGAyskDQWmEq6e/v4staDpAnABIDhw4c3silmZtagRg8xXVQYXgcsJh1mqkxEzAb2kfRWYJqkGyJiTU2ZKcAUgDFjxriVYWbWhRq9iun9xfF8fmE88Ps6iy0DdiuMD8vTysoszV2IDwJW1qz7IUmrgbcBcxuJ18zMXrvOuvsemLv8vkTSoUomAm3A0Z3UPQcYKWkPSduQEkrtvRStwIl5+EjgloiIvEz/HMPuwN6kVouZmXWTzloQVwCrgDuBU4EvkZ5H/ZGImF9vwXxOYSIwE+gHTI2IBZLOA+ZGRCtwGXCFpDbgSVISATgQOFPSWuBl4J8jYsXmbKCZmW2ezhLEmyLi7QCSfgg8DgyvPRfQkYiYAcyomXZ2YXgNJecyIuIKUnIyq0Rv61XTrBk6SxBr2wciYr2kpY0mBzOz3m7tuZ/r0vpi8SOV1Lv1ORd1XmgzdJYg9pX0TB4WsF0eFxARMbCSqMzMrOnqJoiI6NddgZiZWc/S6PMgzMxsC9PojXJmZnVt6cfr+yK3IMzMrJRbENYrfO1/6t20v+n+tOKFSuo96x/26tL6zJrJLQgzMyvlBGFmZqWcIMzMrJQThJmZlXKCMDOzUk4QZmZWygnCzMxK+T4Ia0hf6x771PPcm7xZZ5wgzMy6yU0nd/Ygzp7FCcLMeqTe9s+0L/I5CDMzK+UEYWZmpXyIqY9y53Zm9lpV2oKQdJikhZLaJJ1ZMn+ApGvy/NmSRuTph0qaJ+n+/PfgKuM0M7ONVZYgJPUDLgUOB0YBx0oaVVPsFGBVROwJTAYuzNNXAB+OiLcDJwK+JtHMrJtV2YI4AGiLiEUR8RJwNTCupsw4YFoeng4cIkkRcU9EPJanLwC2kzSgwljNzKxGlecghgJLCuNLgbEdlYmIdZKeBgaTWhDt/hG4OyJerF2BpAnABIDhw4d3XeS2Ed9YZrbl6dFXMUnah3TY6Z/K5kfElIgYExFjhgwZ0r3BmZn1cVUmiGXAboXxYXlaaRlJ/YFBwMo8Pgz4KXBCRDxSYZxmZlaiygQxBxgpaQ9J2wDjgdaaMq2kk9AARwK3RERI2gn4JXBmRNxRYYxmZtaByhJERKwDJgIzgYeAayNigaTzJP19LnYZMFhSG/BZoP1S2InAnsDZkubn119UFauZmW2s0hvlImIGMKNm2tmF4TXAUSXLnQ+cX2VsZmZWX48+SW1mZs3jBGFmZqWcIMzMrJQThJmZlXKCMDOzUk4QZmZWygnCzMxKOUGYmVkpJwgzMyvlBGFmZqWcIMzMrJQThJmZlXKCMDOzUk4QZmZWygnCzMxKOUGYmVkpJwgzMyvlBGFmZqWcIMzMrJQThJmZlao0QUg6TNJCSW2SziyZP0DSNXn+bEkj8vTBkn4tabWkS6qM0czMylWWICT1Ay4FDgdGAcdKGlVT7BRgVUTsCUwGLszT1wBfBj5fVXxmZlZflS2IA4C2iFgUES8BVwPjasqMA6bl4enAIZIUEc9FxO2kRGFmZk1QZYIYCiwpjC/N00rLRMQ64GlgcKMrkDRB0lxJc5cvX/4awzUzs6JefZI6IqZExJiIGDNkyJBmh2Nm1qdUmSCWAbsVxoflaaVlJPUHBgErK4zJzMwaVGWCmAOMlLSHpG2A8UBrTZlW4MQ8fCRwS0REhTGZmVmD+ldVcUSskzQRmAn0A6ZGxAJJ5wFzI6IVuAy4QlIb8CQpiQAgaTEwENhG0hHAByLiwariNTOzDVWWIAAiYgYwo2ba2YXhNcBRHSw7osrYzMysvl59ktrMzKrjBGFmZqWcIMzMrJQThJmZlXKCMDOzUk4QZmZWygnCzMxKOUGYmVkpJwgzMyvlBGFmZqWcIMzMrJQThJmZlXKCMDOzUk4QZmZWygnCzMxKOUGYmVkpJwgzMyvlBGFmZqWcIMzMrJQThJmZlao0QUg6TNJCSW2SziyZP0DSNXn+bEkjCvPOytMXSvpglXGamdnGKksQkvoBlwKHA6OAYyWNqil2CrAqIvYEJgMX5mVHAeOBfYDDgO/k+szMrJtU2YI4AGiLiEUR8RJwNTCupsw4YFoeng4cIkl5+tUR8WJEPAq05frMzKybKCKqqVg6EjgsIj6ex48HxkbExEKZB3KZpXn8EWAsMAm4KyKuzNMvA26IiOk165gATMijbwEWVrIxm29nYEWzg+hC3p6er69tU1/bHuh527R7RAwpm9G/uyPpShExBZjS7Dg6ImluRIxpdhxdxdvT8/W1bepr2wO9a5uqPMS0DNitMD4sTystI6k/MAhY2eCyZmZWoSoTxBxgpKQ9JG1DOuncWlOmFTgxDx8J3BLpmFcrMD5f5bQHMBL4XYWxmplZjcoOMUXEOkkTgZlAP2BqRCyQdB4wNyJagcuAKyS1AU+Skgi53LXAg8A64JMRsb6qWCvUYw9/bSZvT8/X17apr20P9KJtquwktZmZ9W6+k9rMzEo5QZiZWSknCHuFpEmSPt/sOKxzkn7b7BiaSdIsST32UlFJXywMj8j3fPU6ThDW63VHYpM0Q9JOefi3+e8ISR+tcr0diYj3NmO91rAvdl5kQ/lS/x7FCaID+cv/kKQfSFog6UZJ2xV/uUjaWdLiPHySpJ9J+pWkxZImSvqspHsk3SXpDbncLEnfkjRf0gOSDpC0laQ/SBqSy2yVOyosvbuxi7fzS5J+L+l20t3oSDpV0hxJ90q6TtL2efoukn6ap98r6b15+sck/S5v0/f7Yr9ZEfGhiHgqD7f/cx4BNCVBSFotaQdJN0u6W9L9ksbleftLuk/StpJelz+/b2tSnJV8j7Lji9+jvPwBku7M5X8r6S1duB0PS7oqb890SR+S9LNCmUPz9+MCYLsc21V5dr/afZCXmSXpm5LmAp+W9GGljkvvkXSTpF1yuUmSpubyiySd3hXb1amI8KvkRfryrwNG5/FrgY8Bs4AxedrOwOI8fBKpz6gdgSHA08Bped5k4F/y8CzgB3n4fcADeficQpkPANd1wzbuB9wPbA8MzPF/HhhcKHM+8Kk8fE0hxn6kGxvfCvwC2DpP/w5wQjfE/iXg98DtwI9z3KeS7r+5F7gO2D6X3QX4aZ5+L/DePP1jpPtr5gPfB/rVWd9iYOc8vDr/vSu/z/OBz3Tz53M16TL1gYXPYhuvXpl4PvCfpA4zz+ru708hzu7+Hg0E+ufhv+mq71HejgD+Ko9PBc4AHgaG5Gk/Aj5c/IzU2weF7fhOoezrC+/hx4GL8vAk4LfAgLy/VpK/c1W+3IKo79GImJ+H55He6Hp+HRHPRsRy0gf7F3n6/TXL/hggIm4DBuZDF1OBE/L8/wf812uMvREHAT+NiOcj4hlevZHxbZJ+I+l+4DhSr7oABwPfzbGvj4ingUNIiWaOpPl5/E1VBi1pP9I9M6OBDwH751n/ExH7R8S+wEOk3oIBvg3cmqe/C1gg6a3AMaQv/GhgPWlbN8WZwG8iYnRETH4Nm7S5BHxV0n3ATcBQUjIEOA84FBgDfL0JsRV15/doEPATpWP+k3n1s9sVlkTEHXn4SuCvgCuAj+V1vwe4oYNl6+2DawrDw4CZ+bt3BhvG/8tIHZiuAJ7g1fe6Mj3umFcP82JheD2wHemXQHti3bZO+ZcL4y+z4b6uvfkkImKJpD9LOpjUc+2m/rPqSpcDR0TEvZJOAlrqlBUwLSLO6oa42r2S2AAkFRPb+cBOwA6kmzQhJbYTICU24GmlziPbExuk9/aJ7tqALnIc6Vf2fhGxNh+maf9MDibtg63ztOeaEmHSbd8j4N9JCeYjSs+XmbXZUW+sbH3/RUpga4CfRMS6DpYt2wftiu/NxcA3IqJVUgup5dBRHZX//3YLYtMtJv1jgdQ9yOY4BkDSgcDT+Zc4wA9Jv0x+Et1z5/htwBH5mPCOwIfz9B2BxyVtzYaJ6mbgEzn2fpIG5WlHSvqLPP0NknbvhtjLXA5MjIi3A+ey8T+eovbENjq/3hIRk7ohxq40CHgiJ4f3A8X9/n3gy8BV5Oes9DCLqeZ7NIhX+2076TXEV2a4pPfk4Y8Ct0fEY8BjwL+xYat/bf7+bKpi/CfWK9gdnCA23X8Cn5B0D+lY4OZYk5f/Hq8eBoF0iGcHuufwEhFxN6l5ey+paTwnz/oyMBu4g3SMtd2ngffn5u88YFREPEj6ctyYD3X8CnhjxaH3lMT2bF5nMwTpn/+Y/H6cQH6vJJ0ArI2IHwEXAPvnlmlPUtX36OvA1/L0rv6FvRD4pKSHSOcKvpunX0U6/PRQoewU4L7CSepGTSIdIptHD+gS3F1tdDNJs4DPR8TcknljgMkRcVC3B9bLSPoS6RfWE8CfgLtJTfV/BZaTEtyOEXFSvhJkCuncyHrgExFxp6RjgLNIP5TWkvr8uquD9S0mHcpZKWl1ROyQE9FM0uGcy7vrPISkwcDdEdGsltoWJx+uuj4iNroaTNIlwD0RcVm3B1YxJ4hu1lGCUHpm9yeA4yLi9mbEZuWULtt9AvjLiFjb5Fh2JR1XvzgiLm5mLFuSjhJE/qX/HHBoRLxYtmxv5gRh1glJDwM/j4gvNDsWs+7kq5jMCiTNJl1rXnRURNzfjHjMmsktCDMzK+WrmMzMrJQThJmZlXKCsD5D0uouqqdF0vVdUddmrLuhHmJry0kaI+nb1UZnWxonCLOeZQSN9RC7QbmImBsR3dPDp20xnCCsz8ktgFsl/Tx3jXyBpOOUuiS/X9Kbc7nLJX1P0lylLs//rqSu1+Vuln+Xu2Bu71K70e7d3yzpfyXNyx0g7l1Y97eVuqReJKm9u4kLgIOUuor+TG4p/EapS++7lbtYLyn3Sqsn3xX+M6Uuv++S9I48vTldRlvvVXV3sX751V0vXu2GuwV4itTlxwBS3zbn5nmfBr6Zhy8H/pf0Q2kksJTUf1ML6aYogK/yatfMO5G6GH8djXdLfTMwMg+PBW4prPsned2jgLZC7NcXtml7YNs8PBKY20G5YswXA+fk4YOB+Xl4Ek3oMtqv3vvyfRDWV82JiMcBJD0C3Jin3w+8v1Du2oh4GfiDpEXA3jX1fAD4e736xLptgeF5+NcR8SzwrKTabqnfIWkH4L2kvnXa6yveY/GzvO4Hc3cgZbYGLpE0mtRNyF6dbzoHAv8IEBG3SBosaWCe98tId/y+KKm9y+ilDdRpWyAnCOurXkuX0UUC/jEiFm4wURrbwDq2Ap6K9LyJzmJUB2U+A/wZ2DfXt6aDco3q9i6jrffyOQjb0h2l9IjXN5M681tYM38m8CnlJoCkdzZacaSHMD0q6ai8rCTt28litT3EDgIezy2N40lP8isrV/Qbcm+2Ss8UWJFjMdskThC2pfsT6bGjN5DOIdT+Qv930mGe+yQtyOOb4jjgFEn3AguAcZ2Uvw9Yr/TM78+QHuF6Yl5+b159uExtuaJJwH5K3a9fQA94roD1Tu5qw7ZYki4nndid3uxYzHoityDMzKyUWxBmZlbKLQgzMyvlBGFmZqWcIMzMrJQThJmZlXKCMDOzUv8fdb2FiO+jXeEAAAAASUVORK5CYII=\n", + "text/plain": [ + "
" + ] + }, + "metadata": { + "needs_background": "light" + }, + "output_type": "display_data" + } + ], + "source": [ + "barplot('Simple program, multiple operators')" + ] + }, + { + "cell_type": "markdown", + "id": "16be4882", + "metadata": {}, + "source": [ + "## Loops\n", + "\n", + "Here we test how interpreter overhead can be mitigated by the Python compiling frameworks. Let's take another application from Numba's [5 minute guide](https://numba.readthedocs.io/en/stable/user/5minguide.html):" + ] + }, + { + "cell_type": "code", + "execution_count": 18, + "id": "c7134a92", + "metadata": {}, + "outputs": [], + "source": [ + "def go_fast(a):\n", + " trace = 0.0\n", + " for i in range(a.shape[0]):\n", + " trace += np.tanh(a[i, i])\n", + " return a + trace" + ] + }, + { + "cell_type": "code", + "execution_count": 19, + "id": "844c1c84", + "metadata": {}, + "outputs": [], + "source": [ + "import numpy as np\n", + "b = np.random.rand(1000, 1000)\n", + "\n", + "TIMES = {}" + ] + }, + { + "cell_type": "code", + "execution_count": 20, + "id": "69ef66f2", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "1.94 ms ± 109 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)\n" + ] + } + ], + "source": [ + "TIMES['numpy'] = %timeit -o go_fast(b)" + ] + }, + { + "cell_type": "code", + "execution_count": 21, + "id": "1b6aef84", + "metadata": {}, + "outputs": [], + "source": [ + "numba_fast = numba.jit(go_fast)" + ] + }, + { + "cell_type": "code", + "execution_count": 22, + "id": "e74804c4", + "metadata": {}, + "outputs": [], + "source": [ + "import jax.numpy as jnp\n", + "\n", + "@jax.jit\n", + "def jax_fast(a):\n", + " trace = 0.0\n", + " for i in range(a.shape[0]):\n", + " trace += jnp.tanh(a[i, i])\n", + " return a + trace" + ] + }, + { + "cell_type": "code", + "execution_count": 23, + "id": "f88a24c6", + "metadata": {}, + "outputs": [], + "source": [ + "N = dace.symbol('N')\n", + "\n", + "@dace.program(auto_optimize=True)\n", + "def dace_fast(a: dace.float64[N, N]):\n", + " trace = 0.0\n", + " for i in range(N):\n", + " trace += np.tanh(a[i, i])\n", + " return a + trace" + ] + }, + { + "cell_type": "code", + "execution_count": 24, + "id": "e6f18b89", + "metadata": { + "scrolled": true + }, + "outputs": [], + "source": [ + "%%pythran\n", + "from numpy import tanh\n", + "\n", + "#pythran export pythran_fast(float64[:,:])\n", + "def pythran_fast(a):\n", + " trace = 0.0\n", + " for i in range(a.shape[0]):\n", + " trace += tanh(a[i, i])\n", + " return a + trace" + ] + }, + { + "cell_type": "code", + "execution_count": 25, + "id": "e7e5ab60", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "DaCe compilation time: 0.5581727027893066 seconds\n" + ] + } + ], + "source": [ + "import time\n", + "start = time.time()\n", + "csdfg = dace_fast.compile(b)\n", + "print('DaCe compilation time:', time.time() - start, 'seconds')" + ] + }, + { + "cell_type": "code", + "execution_count": 26, + "id": "a67d01ac", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "11.8 s ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)\n", + "147 ms ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)\n" + ] + } + ], + "source": [ + "%timeit -r 1 -n 1 jax_fast(b).block_until_ready()\n", + "%timeit -r 1 -n 1 numba_fast(b)\n", + "%timeit -r 1 -n 1 pythran_fast(b)" + ] + }, + { + "cell_type": "markdown", + "id": "7e0ffab9", + "metadata": {}, + "source": [ + "Note that the slow JAX first run time is due to the inspector/executor model, in which the compilation time depends on the size of the array." + ] + }, + { + "cell_type": "code", + "execution_count": 27, + "id": "97657722", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "2.28 ms ± 538 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)\n" + ] + } + ], + "source": [ + "TIMES['jax'] = %timeit -o jax_fast(b).block_until_ready()" + ] + }, + { + "cell_type": "code", + "execution_count": 28, + "id": "6696626d", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "970 µs ± 130 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)\n" + ] + } + ], + "source": [ + "TIMES['numba'] = %timeit -o numba_fast(b)" + ] + }, + { + "cell_type": "code", + "execution_count": 29, + "id": "98a80c82", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "673 µs ± 54.9 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)\n" + ] + } + ], + "source": [ + "TIMES['pythran'] = %timeit -o pythran_fast(b)" + ] + }, + { + "cell_type": "code", + "execution_count": 30, + "id": "7a741c90", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "668 µs ± 56.8 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)\n" + ] + } + ], + "source": [ + "TIMES['dace'] = %timeit -o csdfg(b, N=b.shape[0])" + ] + }, + { + "cell_type": "code", + "execution_count": 31, + "id": "fc4c6fb2", + "metadata": {}, + "outputs": [ + { + "data": { + "image/png": "iVBORw0KGgoAAAANSUhEUgAAAZQAAAEWCAYAAABBvWFzAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjQuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8rg+JYAAAACXBIWXMAAAsTAAALEwEAmpwYAAAcxklEQVR4nO3dfZxdVX3v8c+XEILKc8jlFQKYCBEarAQJUbnY1wBS0NsaaHkIRUTMC7QXClpqS6gFRKNgRfABrLFEuJESECxMaSrPiFSFDBAekhgZk2gCEULAANUEkvzuH3sN2Tk5M3MmWfucnOT7fr3OK3uvvfY6a81M5jv74aytiMDMzGxTbdPqDpiZ2ZbBgWJmZlk4UMzMLAsHipmZZeFAMTOzLBwoZmaWhQPFzMyycKCYVUDSIkkfbHU/zJrJgWJmZlk4UMyaRNIQSVdJei69rpI0pLT9TEndkl6S1Clpz9K2kHSupAWSXpT0z5K2Sdv2k/RjSSvStptaMT4zB4pZ8/wj8D5gLHAQMB74HICkI4EvAycBw4FfAzNq9j8eGAe8B5gAfCKVfwG4C9gV2Av4ZoVjMOuVA8WseU4FLo2IFyJiGfB54LTStmkR8VhErAImA++XNLK0/+UR8VJE/Aa4Cjgllb8BvB3YMyJWRsRDTRiL2QYcKGbNsyfFkUePX6eyDbZFxGvAcmBEqf7iXvb9e0DAI5LmSPoEZi3gQDFrnucojiR67JPKNtgm6W3AUODZUv296+0bEb+NiDMjYk/gk8A1kvbL332zvjlQzKozWNL2PS/gRuBzkoZJ2h24CPh+qnsjcIakselC/ZeAhyNiUam9z0raVdLewHnATQCSTpS0V6rzMhDA2spHZ1bDgWJWnZnAH0qv7YEu4EngKeAx4IsAEXEP8E/ArcBSYF9gYk17twOPArOB/wSuTeWHAg9Leg3oBM6LiAVVDcqsN/IDtsw2f5ICGB0R3a3ui1lvfIRiZmZZOFDMzCwLn/IyM7MsfIRiZmZZbNvqDrTS7rvvHiNHjmx1N8zM2sqjjz76YkQMqy3fqgNl5MiRdHV1tbobZmZtRdKv65X7lJeZmWXhQDEzsywcKGZmloUDxczMsnCgmJlZFg4UMzPLwoFiZmZZOFDMzCwLB4qZmWXhQDHbzHR0dNDR0dHqbpgNmAPFzMyycKCYmVkWDhQzM8vCgWJmZllUGiiSjpU0X1K3pAvqbB8i6aa0/WFJI0vbJqfy+ZKOSWV7S7pf0lxJcySdV6p/iaRnJc1Orw9XOTYzM1tfZc9DkTQIuBo4GlgCzJLUGRFzS9UmAS9HxH6SJgKXAydLGgNMBA4E9gTukfROYDVwfkQ8JmlH4FFJd5favDIivlrVmMzMrHdVHqGMB7ojYkFEvA7MACbU1JkAXJ+WbwGOkqRUPiMiVkXEQqAbGB8RSyPiMYCIeBWYB4yocAxmZtagKgNlBLC4tL6EDX/5v1knIlYDK4ChjeybTo8dDDxcKj5H0pOSpknatV6nJJ0lqUtS17JlywY8KDMzq68tL8pL2gG4Ffh0RLySir8N7AuMBZYCV9TbNyKmRsS4iBg3bNgGj0Q2M7ONVGWgPAvsXVrfK5XVrSNpW2BnYHlf+0oaTBEmN0TED3sqRMTzEbEmItYC36U45WZmZk1SZaDMAkZLGiVpO4qL7J01dTqB09PyCcB9ERGpfGK6C2wUMBp4JF1fuRaYFxFfKzckaXhp9Xjg6ewjMjOzXlV2l1dErJZ0DnAnMAiYFhFzJF0KdEVEJ0U4TJfUDbxEETqkejcDcynu7Do7ItZIOhw4DXhK0uz0VhdGxEzgK5LGAgEsAj5Z1djMzGxDlQUKQPpFP7Om7KLS8krgxF72nQJMqSl7CFAv9U/b1P6amdnGa8uL8mZmtvlxoJiZWRYOFDMzy8KBYmZmWThQzMwsCweKmZll4UAxM7MsHChmZpaFA8XMzLJwoJiZWRYOFDMzy8KBYmZmWThQzMwsCweKmZll4UAxM7MsHChmZpaFA8XMzLJwoJiZWRYOFDMzy8KBYmZmWThQzMwsCweKmZll4UAxM7MsHChmZpaFA8XMzLJwoJiZWRYOFDMzy8KBYmZmWWzb6g6YtZs3Pn9+pe3Hol815X0ABl98ReXvYVsPH6GYmVkWDhQzM8ui0kCRdKyk+ZK6JV1QZ/sQSTel7Q9LGlnaNjmVz5d0TCrbW9L9kuZKmiPpvFL93STdLemZ9O+uVY7NzMzWV1mgSBoEXA18CBgDnCJpTE21ScDLEbEfcCVwedp3DDAROBA4FrgmtbcaOD8ixgDvA84utXkBcG9EjAbuTetmZtYkVR6hjAe6I2JBRLwOzAAm1NSZAFyflm8BjpKkVD4jIlZFxEKgGxgfEUsj4jGAiHgVmAeMqNPW9cBx1QzLzMzqqTJQRgCLS+tLWPfLf4M6EbEaWAEMbWTfdHrsYODhVLRHRCxNy78F9qjXKUlnSeqS1LVs2bIBDqm5Ojo66OjoaHU3zMwa0pYX5SXtANwKfDoiXqndHhEBRL19I2JqRIyLiHHDhg2ruKdmZluPKgPlWWDv0vpeqaxuHUnbAjsDy/vaV9JgijC5ISJ+WKrzvKThqc5w4IVsIzEzs35VGSizgNGSRknajuIie2dNnU7g9LR8AnBfOrroBCamu8BGAaOBR9L1lWuBeRHxtT7aOh24PfuIzMysV5V9Uj4iVks6B7gTGARMi4g5ki4FuiKikyIcpkvqBl6iCB1SvZuBuRR3dp0dEWskHQ6cBjwlaXZ6qwsjYiZwGXCzpEnAr4GTqhqbmZltqNKpV9Iv+pk1ZReVllcCJ/ay7xRgSk3ZQ4B6qb8cOGoTu2xmZhupLS/Km5nZ5seBYmZmWThQzMwsC09fv5GaMbV4s6Yx9xTmZpaDj1DMzCwLB4qZmWXhQDEzsywcKGZmloUDxczMsnCgmJlZFg4UMzPLwp9D2Yzdc4bntzSz9uEjFDMzy8KBYmZmWThQzMwsCweKmZll4UAxM7Ms+rzLS9JuDbSxNiJ+l6c7ZmbWrvq7bfi59Kr72N1kELBPth6ZmVlb6i9Q5kXEwX1VkPR4xv6YmVmb6u8ayvsbaKOROmZmtoXrM1AiYiWApH0lDUnLHZLOlbRLuY6ZmW3dGr3L61ZgjaT9gKnA3sC/VdYrMzNrO40GytqIWA0cD3wzIj4LDK+uW2Zm1m4aDZQ3JJ0CnA7ckcoGV9MlMzNrR40GyhkUF9+nRMRCSaOA6dV1y8zM2k1D09dHxFzg3NL6QuDyqjplZmbtp88jFElT+2ugkTpmZrbl6+8I5ThJfd0WLOCIjP0xM7M21V+gfLaBNn6SoyNmZtbe+gyUiLh+UxqXdCzwdYr5vv41Ii6r2T4E+H/AIcBy4OSIWJS2TQYmAWuAcyPizlQ+Dfgz4IWIeFeprUuAM4FlqejCiJi5Kf03M7PGVfZMeUmDgKuBo4ElwCxJnekCf49JwMsRsZ+kiRQX+k+WNAaYCBwI7AncI+mdEbEGuA74FkUQ1boyIr5a1ZjMmuGeM05qdRfMNkqVz0MZD3RHxIKIeB2YAUyoqTMB6DkKugU4SpJS+YyIWJXuKOtO7RERDwIvVdhvMzPbCAMKFElvHUD1EcDi0vqSVFa3Tvok/gpgaIP71nOOpCclTZO06wD6amZmm6ihQJF0mKS5wC/S+kGSrqm0ZwP3bWBfYCywFLiiXiVJZ0nqktS1bNmyelXMzGwjNHqEciVwDMWFcyLiCeBP+tnnWYpJJHvslcrq1pG0LbBzeo9G9l1PRDwfEWsiYi3wXdIpsjr1pkbEuIgYN2zYsH6GYGZmjWr4lFdELK4pWtPPLrOA0ZJGSdqO4iJ7Z02dTor5wQBOAO6LiEjlEyUNSdO8jAYe6evNJJUnqzweeLqf/pmZWUaN3uW1WNJhQEgaDJwHzOtrh4hYLekc4E6K24anRcQcSZcCXRHRCVwLTJfUTXGhfWLad46km4G5wGrg7HSHF5JuBDqA3SUtAS6OiGuBr0gaCwSwCPhkg2MzM7MMGg2UT1F8nmQExamnu4Cz+9spfQ5kZk3ZRaXllcCJvew7BZhSp/yUXuqf1l9/zMysOo1ODvkicGrFfTEzszbWUKCk6xh/A4ws7xMRH6mmW2Zm1m4aPeV1G8X1jv8A1lbWGzMza1uNBsrKiPhGpT0xM7O21migfF3SxRQX41f1FEbEY5X0yszM2k6jgfLHwGnAkaw75RVp3czMrOFAORF4R5rk0czMbAONflL+aWCXCvthZmZtrtEjlF2AX0iaxfrXUHzbsJmZAY0HysWV9sLMzNpeo5+U/3HVHTEzs/bWZ6BIeigiDpf0KsVdXW9uAiIidqq0d2Zm1jb6DJSIODz9u2NzumNmZu2q0Sc2Tm+kzMzMtl6N3jZ8YHklPV3xkPzdMTOzdtVnoEianK6fvFvSK+n1KvA8cHtTemhmZm2hz0CJiC+n6yf/HBE7pdeOETE0IiY3qY9mZtYGGr1teLKkEcDbWf95KA9W1TEzM2svjT5g6zKK573PBdak4gAcKGZmBjT+Sfnjgf0jYlW/Nc3MbKvU6F1eC4DBVXbEzMzaW6NHKL8HZku6l/Unhzy3kl6ZmVnbaTRQOtPLzMysrkbv8rq+6o6YmVl7a/Qur4WsPzkkABHxjuw9MjOzttToKa9xpeXtKR4JvFv+7piZWbtq6C6viFheej0bEVcB/6farpmZWTtp9JTXe0qr21AcsTR6dGNmZluBRkPhitLyamARxWkvMzMzoPG7vI4or0saRDEVyy+r6JSZbb06OjoAeOCBB1raDxu4/qav3ylNYf8tSUercA7QDZzUnC6amVk76O+i/HRgf+Ap4EzgfopTXcdHxIT+Gpd0rKT5krolXVBn+xBJN6XtD0saWdo2OZXPl3RMqXyapBckPV3T1m6S7pb0TPp31/76Z2Zm+fQXKO+IiI9HxHeAU4AxwDERMbu/htNpsauBD6X9TpE0pqbaJODliNgPuBK4PO07huKU2oHAscA1qT2A61JZrQuAeyNiNHBvWjczsybpL1De6FmIiDXAkohY2WDb44HuiFgQEa8DM4Dao5oJQM+n8G8BjpKkVD4jIlZFxEKKU2zjUz8eBF6q837ltq4Hjmuwn2ZmlkF/F+UPkvRKWhbwlrQuICJipz72HQEsLq0vAd7bW52IWC1pBTA0lf+8Zt8R/fR1j4hYmpZ/C+xRr5Kks4CzAPbZZ59+mjQzs0b1GSgRMaiv7ZuriAhJG0wVk7ZNBaYCjBs3rm4dMzMbuEafh7IxngX2Lq3vlcrq1pG0LbAzsLzBfWs9L2l4ams48MJG99zMzAasykCZBYyWNErSdhQX2WunwO8ETk/LJwD3RUSk8onpLrBRwGjgkX7er9zW6cDtGcZgZmYNqixQImI1cA5wJzAPuDki5ki6VNJHUrVrgaGSuoG/Jd2ZFRFzgJspnmH/I+DsdFMAkm4EfgbsL2mJpEmprcuAoyU9A3wwrZuZWZNUOh9XRMwEZtaUXVRaXkkvU7hExBRgSp3yU3qpvxw4alP6a2ZmG88TPJpZw974/PmVv0cs+lVT3mvwxVf0X8kGpMprKGZmthVxoJiZWRYOFDMzy8KBYmZmWThQzMwsCweKmZll4UAxM7MsHChmZpaFA8XMzLLwJ+XNbLNyzxkntboLtpF8hGJmZlk4UMzMLAsHipmZZeFAMTOzLBwoZmaWhQPFNksdHR10dHS0uhtmTdXuP/cOFDMzy8KBYmZmWThQzMwsCweKmZll4alXbKN8+Ye/rLT937z4h6a8z+S/eGel7duW5Y3Pn19p+7HoV015H4DBF1+RvU0foZiZWRYOFDMzy8KBYmZmWfgaim2Wzrx0equ7YNZ07T51v49QzMwsCweKmZll4UAxM7MsHChmZpaFA8XMzLKoNFAkHStpvqRuSRfU2T5E0k1p+8OSRpa2TU7l8yUd01+bkq6TtFDS7PQaW+XYzMxsfZXdNixpEHA1cDSwBJglqTMi5paqTQJejoj9JE0ELgdOljQGmAgcCOwJ3COpZ46Mvtr8bETcUtWYzMysd1UeoYwHuiNiQUS8DswAJtTUmQBcn5ZvAY6SpFQ+IyJWRcRCoDu110ibZmbWAlUGyghgcWl9SSqrWyciVgMrgKF97Ntfm1MkPSnpSklD6nVK0lmSuiR1LVu2bOCjMjOzuraki/KTgQOAQ4HdgH+oVykipkbEuIgYN2zYsGb2z8xsi1ZloDwL7F1a3yuV1a0jaVtgZ2B5H/v22mZELI3CKuB7FKfHzMysSaoMlFnAaEmjJG1HcZG9s6ZOJ3B6Wj4BuC8iIpVPTHeBjQJGA4/01aak4elfAccBT1c4NjMzq1HZXV4RsVrSOcCdwCBgWkTMkXQp0BURncC1wHRJ3cBLFAFBqnczMBdYDZwdEWsA6rWZ3vIGScMAAbOBT1U1NjMz21Clsw1HxExgZk3ZRaXllcCJvew7BZjSSJup/MhN7a+ZmW28LemivJmZtZADxczMsnCgmJlZFg4UMzPLwoFiZmZZOFDMzCwLB4qZmWXhQDEzsywcKGZmloUDxczMsnCgmJlZFg4UMzPLwoFiZmZZOFDMzCwLB4qZmWXhQDEzsywcKGZmloUDxczMsnCgmJlZFg4UMzPLwoFiZmZZOFDMzCwLB4qZmWXhQDEzsywcKGZmloUDxczMsnCgmJlZFg4UMzPLwoFiZmZZOFDMzCwLB4qZmWVRaaBIOlbSfEndki6os32IpJvS9ocljSxtm5zK50s6pr82JY1KbXSnNrercmxmZra+ygJF0iDgauBDwBjgFEljaqpNAl6OiP2AK4HL075jgInAgcCxwDWSBvXT5uXAlamtl1PbZmbWJFUeoYwHuiNiQUS8DswAJtTUmQBcn5ZvAY6SpFQ+IyJWRcRCoDu1V7fNtM+RqQ1Sm8dVNzQzM6u1bYVtjwAWl9aXAO/trU5ErJa0Ahiayn9es++ItFyvzaHA7yJidZ3665F0FnBWWn1N0vwBjKkVdgderPQdLvlapc1vgsrHfmGVjW+a6r/vsFV/77fqscOmjv/t9QqrDJTNUkRMBaa2uh+NktQVEeNa3Y9W8Ni3zrHD1j3+dh57lae8ngX2Lq3vlcrq1pG0LbAzsLyPfXsrXw7sktro7b3MzKxCVQbKLGB0uvtqO4qL7J01dTqB09PyCcB9ERGpfGK6C2wUMBp4pLc20z73pzZIbd5e4djMzKxGZae80jWRc4A7gUHAtIiYI+lSoCsiOoFrgemSuoGXKAKCVO9mYC6wGjg7ItYA1GszveU/ADMkfRF4PLW9JWib03MV8Ni3Xlvz+Nt27Cr+uDczM9s0/qS8mZll4UAxM7MsHCjWMpIukfR3re5Hq0n6aav70A4kPSCpLW+nrUfShaXlkZKebmV/cnCgmLVYRBzW6j5YSwz4c7Wlj0ZslhwoFUp/dcyT9F1JcyTdJekt5b+0JO0uaVFa/rik2yTdLWmRpHMk/a2kxyX9XNJuqd4Dkr4uabakpyWNl7SNpGckDUt1tkkTZQ5r2RegDkn/KOmXkh4C9k9lZ0qaJekJSbdKemsq30PSv6fyJyQdlso/KumRNP7vpDne2pak1yTtIOleSY9JekrShLTtUElPStpe0tvSz9G7Wt3nvlT1c5+cVv65T/uPl/SzVP+nkvZv/qjfHPcvJN2Qxn+LpA9Luq1U5+j0M30Z8JY0lhvS5kG1X7O0zwOSrpLUBZwn6c9VTIT7uKR7JO2R6l0iaVqqv0DSuc3+GjhQqjcauDoiDgR+B/xlP/XfBfwFcCgwBfh9RBwM/Az4WKneWyNiLPB/KW6fXgt8Hzg1bf8g8ERELMs0jk0m6RCKW8PHAh+mGCPADyPi0Ig4CJjHuok9vwH8OJW/B5gj6Y+Ak4H/nca/hnVjbmcrgeMj4j3AEcAVkhQRsyg+l/VF4CvA9yOiHU6NNOXnPpX9AvhAqn8R8KVMY9gY+wPXRMQfAa9QTHB7QOkPuzMo/r9eAPwhIsZGRM/Pb19fs+0iYlxEXAE8BLwvjXcG8PelegcAx1DMe3ixpMGVjLIXm/Xh0xZiYUTMTsuPAiP7qX9/RLwKvKpibrP/SOVPAe8u1bsRICIelLSTpF0o/oPdDlwFfAL4Xob+5/QB4N8j4vcAkno+6PouFZ8f2gXYgeJzRlBM+PkxgPQ5pBWSTgMOAWZJAngL8EKzBlAhAV+S9CfAWoq56PYAfgtcSvGh3pVA0//q3EjN/LnfEbhe0mgggKb+Eq2xOCL+Oy1/n+L7NR34qKTvAe9n/YAs6+trdlNpeS/gJknDge2AhaVt/xkRq4BVkl6g+BlasvHDGRgHSvVWlZbXUPwCXM26o8Pt+6i/trS+lvW/X7UfIIqIWCzpeUlHUvyF0i5/uV8HHBcRT0j6ONDRR10B10fE5Cb0q5lOBYYBh0TEG+l0UM/PxlCKoB2cyv6nJT0cmKb93ANfoAik41U8U+mBje71pqvXv+9RBORK4AelSWxr1fua9Sh/z78JfC0iOiV1AJf00UZTf8f7lFdrLKL4KxvWTRczUCcDSDocWBERK1L5v1L8ZfSDntkFNiMPAsel8+k7An+eyncElqbD83II3gv8NRTP15G0cyo7QdL/SuW7Sao782mb2Rl4IYXJEaw/m+t3gH8CbiA9M6hNLaKan/udWTd338c3oX857CPp/Wn5r4CHIuI54Dngc6x/1uCNjTwlVR7v6X1VbDYHSmt8FfhrSY9TTFW9MVam/f+F9R8m1knx1+zmdrqLiHiM4tD9CeC/KE7jQPHL8mHgvynOh/c4DzhC0lMUpwDGRMRciv+Yd0l6ErgbGN6cEVQmKMJiXBrrx0hfB0kfA96IiH8DLgMOTUeg7aiqn/uvAF9O5a0+6zIfOFvSPGBX4Nup/AaK02HzSnWnAk+WLso36hLgB5IepRnT3A+Ap15pQ5IeAP4uIrrqbBtH8eTKDzS9YzZgkoYCj0XElnCUtVVLp9vuiIgN7sKT9C3g8YjYUuYYrKvVaW4ZSbqA4hRRu1w72apJ2pPifP9XW9wVq1A6kvgf4PxW96VqPkIxM7MsfA3FzMyycKCYmVkWDhQzM8vCgWJbNUmvZWqnQ9IdOdraiPceKemvBlpP0jhJ36i2d7Y1caCYtb+RFB+iG1C9iOiKiHaZysXagAPFjDePMH4s6fY0U+tlkk5VMavxU5L2TfWuk/QvkrpUzJr8Z3Xaelua9fWRNCNsz8zBjc4mva+kH0l6VNJPJB1Qeu9vqJhRd4Gknk+bXwZ8QMXMtZ9JRyI/UTFz8WNKszTXqffmUVWaceA2FTMb/1zSu1N5y2ewtTYSEX75tdW+gNfSvx0UM7wOB4ZQTG3x+bTtPOCqtHwd8COKP8ZGU0y8t33a/45U50vAR9PyLsAvgbdRTAvSTTHVzDBgBfCpVO9K4NNp+V5gdFp+L3Bf6b1/kN57DNBd6vsdpTG9Fdg+LY8GunqpV+7zN4GL0/KRwOy0fAnw0/Q12R1YDgxu9ffNr83z5Q82mq0zKyKWAkj6FXBXKn+KYkr5HjdH8biAZyQtoJgyvOxPgY9o3dMotwf2Scv3Rx+z6kraATiMYmqNnvaGlNq+Lb33XKXnYNQxGPiWpLEUEwS+s/+hczhpuvSIuE/SUEk7pW0tncHW2ocDxWydTZnxtkzAX0bE/PUKpfc28B7bAL+L4pkf/fVRvdT5DPA8cFBqb2Uv9RrV0hlsrX34GorZwJ2o4omY+wLvoJgQsOxO4G+UDjEkHdxowxHxCrBQ0olpX0k6qJ/dXqU4jdZjZ2BpOpI5DRjUS72yn5Cm7FExJfqLqS9mDXOgmA3cb4BHKGZM/lRE1B4BfIHitNOTkuak9YE4FZgk6QlgDjChn/pPAmtUPCb5M8A1wOlp/wNY9yyN2npllwCHpBmcL2Mzmxbd2oPn8jIbAEnXUVzIvqXVfTHb3PgIxczMsvARipmZZeEjFDMzy8KBYmZmWThQzMwsCweKmZll4UAxM7Ms/j/CQgl8tj9NIQAAAABJRU5ErkJggg==\n", + "text/plain": [ + "
" + ] + }, + "metadata": { + "needs_background": "light" + }, + "output_type": "display_data" + } + ], + "source": [ + "barplot('Loops')" + ] + }, + { + "cell_type": "markdown", + "id": "e888f1f6", + "metadata": {}, + "source": [ + "### Varying sizes\n", + "\n", + "Since the DaCe program was defined symbolically, the input array size can be changed without recompilation:" + ] + }, + { + "cell_type": "code", + "execution_count": 32, + "id": "fbdc52c3", + "metadata": {}, + "outputs": [], + "source": [ + "sizes = [np.random.randint(700, 5000) for _ in range(10)]\n", + "arrays = [np.random.rand(n, n) for n in sizes]\n", + "\n", + "def vary_size(call):\n", + " for a in arrays:\n", + " call(a)\n", + "\n", + "def vary_size_dace(call):\n", + " for a, n in zip(arrays, sizes):\n", + " call(a, N=n)\n", + " \n", + "def vary_size_jax(call):\n", + " for a in arrays:\n", + " call(a).block_until_ready()\n", + " \n", + "TIMES = {}" + ] + }, + { + "cell_type": "code", + "execution_count": 33, + "id": "2aa26e86", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "155 ms ± 2.63 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)\n", + "125 ms ± 3.49 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)\n", + "124 ms ± 2.5 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)\n", + "114 ms ± 8.27 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)\n", + "334 ms ± 166 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)\n" + ] + } + ], + "source": [ + "TIMES['numpy'] = %timeit -o vary_size(go_fast)\n", + "TIMES['numba'] = %timeit -o vary_size(numba_fast)\n", + "TIMES['pythran'] = %timeit -o vary_size(pythran_fast)\n", + "TIMES['dace'] = %timeit -o vary_size_dace(csdfg)\n", + "TIMES['jax'] = %timeit -o vary_size_jax(jax_fast)" + ] + }, + { + "cell_type": "code", + "execution_count": 34, + "id": "144b470a", + "metadata": {}, + "outputs": [ + { + "data": { + "image/png": "iVBORw0KGgoAAAANSUhEUgAAAYIAAAEWCAYAAABrDZDcAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjQuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8rg+JYAAAACXBIWXMAAAsTAAALEwEAmpwYAAAZ00lEQVR4nO3de5hcVZ3u8e9LCPcEhEQezYUgxEuOCEoLIwe0YdAJniORURTkYlTgwDHCUXEGRgQEjlwUQRGPREEyKHLziBHjhBFBQQXTQLgkGO2ESAIoAQVBTEjIb/7Yq8lOUd1d3eld1ZX1fp6nHvZl1d6/XSn6rb127VWKCMzMLF+btLoAMzNrLQeBmVnmHARmZplzEJiZZc5BYGaWOQeBmVnmHARmTSRpP0mLmrzPIyTd3Mx9WnuR7yOwqklaChwTET9tdS21JI0D/gC8LiIW16z7AbA4Ik5uSXFmTeIzAstaRDwK3AIcVV4uaXvg3cCsgWxP0qZDV51ZczgIrGUkbS7pYkmPpcfFkjYvrT9WUrekP0uaLenVpXUh6URJSyQ9KemLkgb7fp5FTRAAhwELI+IBSadIWizpWUkLJR1SqmO6pF9KukjSU8BZqd7dSm1eKel5SWMldUpaXlq3VNLJku6X9IykayVtUVr/L5IeT6/PMem4d+3l9ZyeXo9nJT0s6YjS8jtK23uu9Fgt6cq0bltJl6f9PSrpHEkj0rpdJf081fikpGsH+VrbMOQgsFb6LPAPwB7A7sBewGkAkg4AzgU+ALyKovvmmprnHwJ0AG8BpgEfHWQdPwDGSNq3tOwo1p0NLAb2A7YFPg98R9KrSm33BpYAOwJnpzqPLK0/HLglIlb0sv8PAFOBnYE3AdMBJE0FPgUcCOwKdPZ2AJK2Br4KHBQRo4B9gPm17SLigojYJiK2Ad4ArAB6/qhfCaxJ+3oz8C7gmLTubOBm4BXAeOCS3mqxNhQRfvhR6QNYChxYZ/li4N2l+X8Clqbpy4ELSuu2AVYDk9J8AFNL6/83xR/bwdb4LWBmmp4MvAC8spe284FpaXo68EjN+r2BR1h3Da4L+ECa7gSW17w2R5bmLwC+kaavAM4trds1HfeudWraGngaeB+wZc266cAdNcu2BO4G/jXN7wisKj+XIsBuTdP/DswExrf6/eTH0D98RmCt9GqKT/o9/pCWvWxdRDwHPAWMK7Vf1stz11PTFTKxl1pmAYembpmjgLkR8UR6/tGS5kt6WtLTwBuBMb3UQUTcBTwPdEp6PcUf8Nm97Bfgj6Xp5ylCj3Q85W2vt5+aff4N+CBwPPC4pB+nfffmcmBRRJyf5ncCRqbn9hznZcAr0/p/AQT8RtICSYM9+7JhyBe2rJUeo/gDtCDNT0zLyuuAl7o+dgAeLT1/Qi/PXU8U3SD9uQP4M0UX05EUf/iQtBPwTeAfgV9HxIuS5lP8UXxpF3W2Nytt54/ADRGxsoEaaj1O0Q3TY0JfjSNiLjBX0pbAOanu/WrbSToFeG3NumUUZwRjImJNnW3/ETg2PX9f4KeSfhER3QM6IhuWfEZgzTJS0halx6bA94DT0kXUMcDpwHdS++8BH5G0R7qA/AXgrohYWtrmZyS9QtIE4CTW9XUPWEQERffH+cB2wI/Sqq0p/tCvAJD0EYozgv58h+IaxpFpu4NxHcVr8AZJWwGf662hpB0lTUuBuQp4Dlhbp91BwInAIRHx957lEfE4xTWACyWNlrSJpF0kvSM971BJPaH0F4rX5GXbt/bkILBmmQP8vfQ4k+JTaxdwP/AAcE9aRhT3HHwO+D7FJ+NdKL7JU/ZDin7u+cCPKbo7NsS/U5xZXBsRq1IdC4ELgV8DfwJ2A37Z34YiYlk6ngBuH0wxEfETigvAtwLdwJ1p1ao6zTehuLD8GMWZzTuAE+q0+yAwFnio1F32jbTuaGAzYCHFH/sbKC7UA7wVuEvScxTdXCdFxJLBHJcNP76hzNqSpAAmD+euCUlXAI9FxGlDtL03AA8Cm9frvjEbLJ8RmFVA0iTgn9nAsxRJh6i43+IVFN1WP3II2FBzEJgNMUlnU3xy/2JEPLyBm/tfwBMUX7V9kfrdPWYbxF1DZmaZ8xmBmVnm2u4+gjFjxsSkSZNaXYaZWVu5++67n4yIsfXWtV0QTJo0ia6urlaXYWbWViT9obd17hoyM8ucg8DMLHMOAjOzzDkIzMwy5yAwM8ucg8DMLHMOAjOzzDkIzMwy5yAwM8ucg8BsCHR2dtLZ2dnqMswGxUFgZpY5B4GZWeYcBGZmmXMQmJllzkFgZpY5B4GZWeYcBGZmmXMQmJllzkFgZpY5B4GZWeYcBGZmmXMQmJllzkFgZpY5B4GZWeYcBGZmmXMQmJllzkFgZpa5SoNA0lRJiyR1Szqlj3bvkxSSOqqsx8zMXq6yIJA0ArgUOAiYAhwuaUqddqOAk4C7qqrFzMx6V+UZwV5Ad0QsiYgXgGuAaXXanQ2cD6yssBYzM+tFlUEwDlhWml+elr1E0luACRHx4742JOk4SV2SulasWDH0lZqZZaxlF4slbQJ8Gfh0f20jYmZEdEREx9ixY6svzswsI1UGwaPAhNL8+LSsxyjgjcBtkpYC/wDM9gVjM7PmqjII5gGTJe0saTPgMGB2z8qIeCYixkTEpIiYBNwJHBwRXRXWZGZmNSoLgohYA8wA5gIPAddFxAJJZ0k6uKr9mpnZwGxa5cYjYg4wp2bZ6b207ayyFjMzq893FpuZZc5BYGaWOQeBmVnmHARmZplzEJiZZc5BYGaWOQeBmVnmHARmZplzEJiZZc5BYGaWOQeBmVnmHARmZplzEJiZZc5BYGaWOQeBmVnmHARmZplzEJiZZc5BYGaWOQeBmVnmHARmZplzEJiZZc5BYGaWOQeBmVnmHARmZplzEJiZZc5BYGaWOQeBmVnmHARmZplzEJiZZc5BYGaWOQeBmVnmHARmZplzEJiZZc5BYGaWOQeBmVnmHARmZplzEJiZZc5BYGaWuUqDQNJUSYskdUs6pc764yU9IGm+pDskTamyHjMze7nKgkDSCOBS4CBgCnB4nT/0V0fEbhGxB3AB8OWq6jEzs/qqPCPYC+iOiCUR8QJwDTCt3CAi/lqa3RqICusxM7M6Nq1w2+OAZaX55cDetY0kfRz4FLAZcEC9DUk6DjgOYOLEiUNeqJlZzlp+sTgiLo2IXYB/BU7rpc3MiOiIiI6xY8c2t0Azs41clUHwKDChND8+LevNNcB7K6zHzMzq6LNrSNL2DWxjbUQ8XWf5PGCypJ0pAuAw4EM1258cEb9Ps/8D+D1mZtZU/V0jeCw91EebEcDLOu4jYo2kGcDc1OaKiFgg6SygKyJmAzMkHQisBv4CfHgQx2BmZhugvyB4KCLe3FcDSff2ti4i5gBzapadXpo+qZEizcysOv1dI3hbA9topI2ZmQ1TfQZBRKwEkLSLpM3TdKekEyVtV25jZmbtqdFvDX0feFHSrsBMim8DXV1ZVWZm1jSN3lC2Nl38PQS4JCIu6evagNlws/rzn650+7F0cVP2AzDyjAsr34flpdEzgtWSDqf4Vs9NadnIakoyM7NmajQIPkJxUfj/RsTD6d6Aq6ory8zMmqWhrqGIWAicWJp/GDi/qqLMzKx5+jwjkDSzvw000sbMzIav/s4I3iupr6+HCth/COsxM7Mm6y8IPtPANm4fikLMzKw1+gyCiJjVrELMzKw1Wv57BGZm1loOAjOzzA0oCCRtVVUhZmbWGg0FgaR9JC0Efpvmd5f09UorMzOzpmj0jOAi4J+ApwAi4j7g7VUVZWZmzdNw11BELKtZ9OIQ12JmZi3Q6OijyyTtA4SkkcBJwEPVlWVmZs3S6BnB8cDHgXEUP0S/R5o3M7M21+igc08CR1Rci5mZtUBDQZCGnf4EMKn8nIg4uJqyzMysWRq9RnAjcDnwI2BtZdWYmVnTNRoEKyPiq5VWYmZmLdFoEHxF0hnAzcCqnoURcU8lVZmZWdM0GgS7AUcBB7CuayjSvJmZtbFGg+BQ4DUR8UKVxZiZWfM1eh/Bg8B2FdZhZmYt0ugZwXbAbyXNY/1rBP76qJlZm2s0CM6otAozM2uZRu8s/nnVhZiZWWv0GQSS7oiIfSU9S/EtoZdWARERoyutzszMKtffj9fvm/47qjnlmJlZszX6C2VXNbLMzMzaT6NfH/1v5RlJmwJ7Dn05ZmbWbH0GgaRT0/WBN0n6a3o8C/wJ+GFTKjQzs0r1GQQRcW66PvDFiBidHqMiYoeIOLVJNZqZWYUa/froqZLGATux/u8R/KKqwszMrDka/WGa84DDgIWs+9H6APoMAklTga8AI4BvRcR5Nes/BRwDrAFWAB+NiD8M5ADMzGzDNHpn8SHA6yJiVb8tE0kjgEuBdwLLgXmSZkfEwlKze4GOiHhe0gnABcAHG92HmZltuEa/NbQEGDnAbe8FdEfEkjRq6TXAtHKDiLg1Ip5Ps3cC4we4DzMz20CNnhE8D8yXdAvrDzp3Yh/PGQcsK80vB/buo/3HgJ/UWyHpOOA4gIkTJzZYspmZNaLRIJidHpWQdCTQAbyj3vqImAnMBOjo6Ih6bczMbHAa/dbQrEFs+1FgQml+fFq2HkkHAp8F3jGQaxBmZjY0Gv3W0MOsP+gcABHxmj6eNg+YLGlnigA4DPhQzXbfDFwGTI2IJxot2szMhk6jXUMdpektKH66cvu+nhARayTNAOZSfH30iohYIOksoCsiZgNfBLYBrpcE8Ih/7MbMrLka7Rp6qmbRxZLuBk7v53lzgDk1y04vTR/YYJ1mZlaRRruG3lKa3YTiDKHRswkzMxvGGv1jfmFpeg2wlKJ7yMzM2lyjXUP7l+fTXcOHAb+roigzM2ue/oahHp2Gov6apHeqMAPoBj7QnBLNzKxK/Z0RXAX8Bfg1cCzF9/0FHBIR86stzczMmqG/IHhNROwGIOlbwOPAxIhYWXllZmbWFP0NOre6ZyIiXgSWOwTMzDYu/Z0R7C7pr2lawJZpXkBExOhKqzMzs8r1GQQRMaJZhZiZWWs0+nsEZma2kXIQmJllzkFgZpY5B4GZWeYcBGZmmXMQDLHOzk46OztbXYaZWcMcBGZmmcvqNwVWf/7Tle8jli5uyr5GnnFh/43MzBqQVRA0w08/4kFZzay9uGvIzCxzDgIzs8w5CMzMMucgMDPLnIPAzCxzDgIzs8w5CMzMMucgMDPLnIPAzCxzDgIzs8w5CMxsg+U86u7GcOwea8gsA1UPgujBFtubg8DMNmobSwhCdUHoIDCzDZbzqLsbw7H7GoENmY2hr9QsRz4jMBsCG8OnQsuXzwjMzDLnIDAzy5y7hjJy7v//XaXbf+TJvzdlP6f+82sr3b5Zbio9I5A0VdIiSd2STqmz/u2S7pG0RtL7q6zFzMzqqywIJI0ALgUOAqYAh0uaUtPsEWA6cHVVdZiZWd+q7BraC+iOiCUAkq4BpgELexpExNK0bm2FdViTHHvWVa0uwcwGocquoXHAstL88rRswCQdJ6lLUteKFSuGpDgzMyu0xbeGImJmRHRERMfYsWNbXY6Z2UalyiB4FJhQmh+flpmZ2TBSZRDMAyZL2lnSZsBhwOwK92dmZoNQWRBExBpgBjAXeAi4LiIWSDpL0sEAkt4qaTlwKHCZpAVV1WNmZvVVekNZRMwB5tQsO700PY+iy8jMzFqkLS4Wm5lZdRwEZmaZcxCYmWXOQWBmljkHgZlZ5hwEZmaZcxCYmWXOQWBmljkHgZlZ5hwEZmaZcxCYmWXOQWBmljkHgZlZ5hwEZmaZcxCYmWXOQWBmljkHgZlZ5hwEZmaZcxCYmWXOQWBmljkHgZlZ5hwEZmaZcxCYmWXOQWBmljkHgZlZ5hwEZmaZcxCYmWXOQWBmljkHgZlZ5hwEZmaZcxCYmWXOQWBmljkHgZlZ5hwEZmaZcxCYmWXOQWBmljkHgZlZ5hwEZmaZqzQIJE2VtEhSt6RT6qzfXNK1af1dkiZVWY+Zmb1cZUEgaQRwKXAQMAU4XNKUmmYfA/4SEbsCFwHnV1WPmZnVV+UZwV5Ad0QsiYgXgGuAaTVtpgGz0vQNwD9KUoU1mZlZDUVENRuW3g9MjYhj0vxRwN4RMaPU5sHUZnmaX5zaPFmzreOA49Ls64BFlRQ9dMYAT/bbauPkY89XzsffDse+U0SMrbdi02ZXMhgRMROY2eo6GiWpKyI6Wl1HK/jY8zx2yPv42/3Yq+waehSYUJofn5bVbSNpU2Bb4KkKazIzsxpVBsE8YLKknSVtBhwGzK5pMxv4cJp+P/CzqKqvyszM6qqsaygi1kiaAcwFRgBXRMQCSWcBXRExG7gcuEpSN/BnirDYGLRNN1YFfOz5yvn42/rYK7tYbGZm7cF3FpuZZc5BYGaWOQeBDZikMyWd3Oo6WknSr1pdQzuQdJuktv1aZS1J/1aanpTuhWp7DgKzQYiIfVpdg7XEv/XfZH3pq/HDmoOgRkr5hyR9U9ICSTdL2rL8yUbSGElL0/R0STdK+k9JSyXNkPQpSfdKulPS9qndbZK+Imm+pAcl7SVpE0m/lzQ2tdkkDcBX9+6/VpL0WUm/k3QHxd3dSDpW0jxJ90n6vqSt0vIdJf0gLb9P0j5p+ZGSfpNeg8vSeFRtSdJzkraRdIukeyQ9IGlaWvdWSfdL2kLS1ul99MZW19yXqt73yVHl9316/l6Sfp3a/0rS65p/1C8d928lfTcd/w2S3i3pxlKbd6b383nAlulYvptWj6h9zdJzbpN0saQu4CRJ71ExsOa9kn4qacfU7kxJV6T2SySd2OzXAICI8KP0ACYBa4A90vx1wJHAbUBHWjYGWJqmpwPdwChgLPAMcHxadxHwf9L0bcA30/TbgQfT9BmlNu8Cvt/q16DOa7In8ACwFTA6He/JwA6lNucAn0jT15aOaQTFjYJvAH4EjEzLvw4c3epj24DX5DmKr1+PLr0nuln3TbxzgC9RDLx4aqvrbeB4mv2+Hw1smqYPbNX7Ph13AP89zV8BfAb4LTA2LbsaeE/Pv3t/r1npuL9eavuK0nvjGODCNH0m8Ctg8/T6PtXz/0gzH8P+lKVFHo6I+Wn6bop/8L7cGhHPAs9KeobiDx4UfzzfVGr3PYCI+IWk0ZK2o3jj/RC4GPgo8O0hqH+o7Qf8ICKeB5DUc2PgGyWdA2wHbENxzwjAAcDRABHxIvCMirGm9gTmqRhXcEvgiWYdQEUEfEHS24G1wDhgR+CPwFkUN1WuBFrzKW/gmvm+HwXMkjSZ4g/xyCE5gsFZFhG/TNPfofj3ugo4UtK3gbeR3s919PWaXVuaHg9cK+lVwGbAw6V1P46IVcAqSU9QvIeWD/5wBs5BUN+q0vSLFH+01rCuK22LPtqvLc2vZf3XuPamjYiIZZL+JOkAihFbj9iQwpvsSuC9EXGfpOlAZx9tBcyKiFObUFezHEHxaXjPiFiduk163hs7UITjyLTsby2pcGCa9r4HzqYIkkNU/A7JbYOuesPVq+/bFMG2Erg+Itb08tx6r1mP8r/5JcCXI2K2pE6KM4HettH0v8u+RtC4pRSfaKEYDmMwPgggaV/gmYh4Ji3/FsUnkevTJ+jh5hfAe1Of8SjgPWn5KOBxSSNZP8BuAU6A4ncpJG2blr1f0ivT8u0l7dS0I6jGtsATKQT2B8rHcxnwOeC7tPfvbCylmvf9tqwbe2z6BtQ3FCZKelua/hBwR0Q8BjwGnMb6Z+mr0/t9oMrH++G+GraCg6BxXwJOkHQvRV/eYKxMz/8GxY/y9JhN8elxOHYLERH3UJzm3gf8hKLLA4o/dHcBv6ToU+1xErC/pAcoTpenRMRCiv+pbpZ0P/CfwKuacwSVCIo/8h3pOI8mvQaSjgZWR8TVwHnAW9MZXzuq6n1/AXBuWt7qnolFwMclPUTRl///0vLvUnQbPVRqOxO4v3SxuFFnAtdLupthOFy1h5hoEkm3ASdHRFeddR3ARRGxX9MLswGTtANwT0S0+xlN9lK31E0R8bJvdUn6GnBvRFze9MKarNVJnD0Vv+V8Au11bSBbkl5N0Z/9pRaXYhVKn9z/Bny61bU0g88IzMwy52sEZmaZcxCYmWXOQWBmljkHgbUdSc8N0XY6Jd00FNsaxL4nSfrQQNtJ6pD01Wqrs9w4CMxaYxLFzUsDahcRXRHRLkNWWJtwEFjbSp/ofy7ph2nkxvMkHaFihNMHJO2S2l0p6RuSulSMoPo/62xr6zQK5G/SCJE9I4k2OrrsLpL+Q9Ldkm6X9PrSvr+qYoTNJZJ67s49D9hPxUiWn0yf/G9XMZLpPUojttZp99JZTLo7+0YVI53eKelNafnwGNHS2kcrRvzzw48NeZBGgKQY2+hpijuUN6e4hf/zad1JwMVp+krgPyg++EymGNBri/T8m1KbL7Bu5MjtgN8BW9P4KJu3AJPT9N7Az0r7vj7tewrQXar9ptIxbQVskaYnA129tCvXfAlwRpo+AJifps9kGIxo6Uf7PHxDmbW7eRHxOICkxcDNafkDwP6ldtdFxFrg95KWAK+v2c67gIO17pfXtgAmpulbo49RNiVtA+xDMYRAz/Y2L237xrTvhUrj0NcxEviapD0oBh57bf+Hzr7A+wAi4meSdpA0Oq1r+YiW1j4cBNbuNmQEzDIB74uIRestlPZuYB+bAE9HxB4N1Khe2nwS+BOwe9reyl7aNarlI1pa+/A1AsvFoSp+AW4X4DUUA42VzQU+ofSRXtKbG91wRPwVeFjSoem5krR7P097lqK7qce2wOPpzOEoih/0qdeu7HbS0CQqhjZ+MtViNiAOAsvFI8BvKEZPPT4iaj9xn03RPXO/pAVpfiCOAD4m6T5gATCtn/b3Ay+q+CnPT1L8YtuH0/Nfz7qx7GvblZ0J7JlGcz2PYTi8sbUHjzVkGz1JV1JcYL2h1bWYDUc+IzAzy5zPCMzMMuczAjOzzDkIzMwy5yAwM8ucg8DMLHMOAjOzzP0X/dXcfChwSCcAAAAASUVORK5CYII=\n", + "text/plain": [ + "
" + ] + }, + "metadata": { + "needs_background": "light" + }, + "output_type": "display_data" + } + ], + "source": [ + "barplot('Loop - Varying sizes')" + ] + }, + { + "cell_type": "markdown", + "id": "16405894", + "metadata": {}, + "source": [ + "## Auto-parallelization\n", + "\n", + "DaCe can use data-centric dependency analysis to not only track and reduce data movement, but also automatically extract parallel regions in code. Here we look at a simple program and how it is run in parallel. We use the `auto_optimize` flag in the `dace.program` decorator to automatically apply optimization heuristics." + ] + }, + { + "cell_type": "code", + "execution_count": 35, + "id": "eb5b28ca", + "metadata": {}, + "outputs": [], + "source": [ + "def element_update(a):\n", + " return a * 5\n", + "\n", + "def someforloop(A):\n", + " for i in range(A.shape[0]):\n", + " for j in range(A.shape[1]):\n", + " A[i, j] = element_update(A[i, j])" + ] + }, + { + "cell_type": "code", + "execution_count": 36, + "id": "d80217b2", + "metadata": {}, + "outputs": [], + "source": [ + "a = np.random.rand(1000, 1000)\n", + "daceloop = dace.program(auto_optimize=True)(someforloop)" + ] + }, + { + "cell_type": "markdown", + "id": "f2ba2545", + "metadata": {}, + "source": [ + "Here it is compared with numpy and numba's similar capability:" + ] + }, + { + "cell_type": "code", + "execution_count": 37, + "id": "8420d1f0", + "metadata": { + "scrolled": true + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "446 ms ± 41.1 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + ":4: NumbaWarning: \u001b[1m\n", + "Compilation is falling back to object mode WITH looplifting enabled because Function \"someforloop\" failed type inference due to: \u001b[1mUntyped global name 'element_update':\u001b[0m \u001b[1m\u001b[1mCannot determine Numba type of \u001b[0m\n", + "\u001b[1m\n", + "File \"\", line 7:\u001b[0m\n", + "\u001b[1mdef someforloop(A):\n", + " \n", + " for j in range(A.shape[1]):\n", + "\u001b[1m A[i, j] = element_update(A[i, j])\n", + "\u001b[0m \u001b[1m^\u001b[0m\u001b[0m\n", + "\u001b[0m\u001b[0m\n", + " def someforloop(A):\n", + ":4: NumbaWarning: \u001b[1m\n", + "Compilation is falling back to object mode WITHOUT looplifting enabled because Function \"someforloop\" failed type inference due to: \u001b[1m\u001b[1mCannot determine Numba type of \u001b[0m\n", + "\u001b[1m\n", + "File \"\", line 5:\u001b[0m\n", + "\u001b[1mdef someforloop(A):\n", + "\u001b[1m for i in range(A.shape[0]):\n", + "\u001b[0m \u001b[1m^\u001b[0m\u001b[0m\n", + "\u001b[0m\u001b[0m\n", + " def someforloop(A):\n", + "/home/user/anaconda3/envs/py38/lib/python3.8/site-packages/numba/core/object_mode_passes.py:151: NumbaWarning: \u001b[1mFunction \"someforloop\" was compiled in object mode without forceobj=True, but has lifted loops.\n", + "\u001b[1m\n", + "File \"\", line 5:\u001b[0m\n", + "\u001b[1mdef someforloop(A):\n", + "\u001b[1m for i in range(A.shape[0]):\n", + "\u001b[0m \u001b[1m^\u001b[0m\u001b[0m\n", + "\u001b[0m\n", + " warnings.warn(errors.NumbaWarning(warn_msg,\n", + "/home/user/anaconda3/envs/py38/lib/python3.8/site-packages/numba/core/object_mode_passes.py:161: NumbaDeprecationWarning: \u001b[1m\n", + "Fall-back from the nopython compilation path to the object mode compilation path has been detected, this is deprecated behaviour.\n", + "\n", + "For more information visit https://numba.pydata.org/numba-doc/latest/reference/deprecation.html#deprecation-of-object-mode-fall-back-behaviour-when-using-jit\n", + "\u001b[1m\n", + "File \"\", line 5:\u001b[0m\n", + "\u001b[1mdef someforloop(A):\n", + "\u001b[1m for i in range(A.shape[0]):\n", + "\u001b[0m \u001b[1m^\u001b[0m\u001b[0m\n", + "\u001b[0m\n", + " warnings.warn(errors.NumbaDeprecationWarning(msg,\n", + ":4: NumbaWarning: \u001b[1m\n", + "Compilation is falling back to object mode WITHOUT looplifting enabled because Function \"someforloop\" failed type inference due to: \u001b[1mUntyped global name 'element_update':\u001b[0m \u001b[1m\u001b[1mCannot determine Numba type of \u001b[0m\n", + "\u001b[1m\n", + "File \"\", line 7:\u001b[0m\n", + "\u001b[1mdef someforloop(A):\n", + " \n", + " for j in range(A.shape[1]):\n", + "\u001b[1m A[i, j] = element_update(A[i, j])\n", + "\u001b[0m \u001b[1m^\u001b[0m\u001b[0m\n", + "\u001b[0m\u001b[0m\n", + " def someforloop(A):\n", + "/home/user/anaconda3/envs/py38/lib/python3.8/site-packages/numba/core/object_mode_passes.py:151: NumbaWarning: \u001b[1mFunction \"someforloop\" was compiled in object mode without forceobj=True.\n", + "\u001b[1m\n", + "File \"\", line 5:\u001b[0m\n", + "\u001b[1mdef someforloop(A):\n", + "\u001b[1m for i in range(A.shape[0]):\n", + "\u001b[0m \u001b[1m^\u001b[0m\u001b[0m\n", + "\u001b[0m\n", + " warnings.warn(errors.NumbaWarning(warn_msg,\n", + "/home/user/anaconda3/envs/py38/lib/python3.8/site-packages/numba/core/object_mode_passes.py:161: NumbaDeprecationWarning: \u001b[1m\n", + "Fall-back from the nopython compilation path to the object mode compilation path has been detected, this is deprecated behaviour.\n", + "\n", + "For more information visit https://numba.pydata.org/numba-doc/latest/reference/deprecation.html#deprecation-of-object-mode-fall-back-behaviour-when-using-jit\n", + "\u001b[1m\n", + "File \"\", line 5:\u001b[0m\n", + "\u001b[1mdef someforloop(A):\n", + "\u001b[1m for i in range(A.shape[0]):\n", + "\u001b[0m \u001b[1m^\u001b[0m\u001b[0m\n", + "\u001b[0m\n", + " warnings.warn(errors.NumbaDeprecationWarning(msg,\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "406 ms ± 13.5 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)\n", + "549 µs ± 212 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)\n" + ] + } + ], + "source": [ + "numbaloop = numba.jit(parallel=True)(someforloop)\n", + "csdfg = daceloop.compile(a)\n", + "\n", + "TIMES = {}\n", + "TIMES['numpy'] = %timeit -o someforloop(a)\n", + "TIMES['numba'] = %timeit -o numbaloop(a)\n", + "TIMES['dace'] = %timeit -o csdfg(a)" + ] + }, + { + "cell_type": "code", + "execution_count": 38, + "id": "36a48195", + "metadata": {}, + "outputs": [ + { + "data": { + "image/png": "iVBORw0KGgoAAAANSUhEUgAAAYIAAAEWCAYAAABrDZDcAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjQuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8rg+JYAAAACXBIWXMAAAsTAAALEwEAmpwYAAAmWUlEQVR4nO3de5xVZd338c8XBjJDEAVLGQxxkAQ5pIOCTxipieDteMgENQ+hmaWpUZ5ub8ksi7w109DMTFEzB41HmZIbNfPEEwoDIgrJQRkdEA9wo6AkOPB7/thrxs0whw3MnmFc3/frtV+sda1rXeu39mb2b69rrXUtRQRmZpZebVo6ADMza1lOBGZmKedEYGaWck4EZmYp50RgZpZyTgRmZinnRGCpI+kDST1bOo6GSJoo6efJ9DBJy3Jc72pJf0qm9072tW0TxzZU0sKmbNNalhOB5UzSU5JWS/rMVq4XkoryFVcj235K0jnZZRHRISJea4l4mlNEvJHs68btaaf25xcRz0ZE7+2P0HYUTgSWE0k9gKFAACUtG03rJqmgpWMwy+ZEYLk6A3gOmAicmb2g9q9uSWdJmp5MP5MUv5h0U4xKyr8jaYmk/5VUJmmvrPVD0vclLZa0VtLPJO0r6Z+S1kh6QFL7pG5nSX+T9G5ytPI3SYXJsmvJJK8JybYnZLVflEx/VtINkl6X9L6k6ZI+W3vnq7tnJP2npJWSKiSdlrX8GEkvJPFVSro6a1mPZJtnS3oD+EdS/qCkt5LtPiOpby4fhKS9JE1O9nmppAvrqVe93QJJQ5L3oPr1kaSKpN7BkmZIek/SCkkTst7fLT6/2l1VkvZP/g+8J2m+pJKsZRMl3SLpkeSzfF7SvrnspzUfJwLL1RnAfclruKTP57JSRByWTA5IuikmSToc+CVwMrAn8DpQWmvV4cBBwGDgUuB24FtAd+AA4JSkXhvgLuCLwN7Av4EJybavBJ4FLki2fUEdIV6fbOdQYLdkW5vq2Z0vAF2AbmSS4e2SqrtIPiTzHu0KHAN8T9Lxtdb/KrB/sm8A/wP0AvYA5pB5bxskqQ3wV+DFJI4jgIslDW9ovYiYkbwHHYDOwPPA/cnijcAPk30bkrT5/WS9LT6/WvG0S+J5LNmPHwD3Zb0vAKOBnybbXQJc29h+WvNyIrBGSfoKmS/aByJiNvAqcOp2NHkacGdEzImI9cAVwJCk+6nadRGxJiLmAy8Dj0XEaxHxPpkv0C8DRMSqiJgcEesiYi2ZL5mv5rhfbYAxwEURsTwiNkbEP5OY6nNVRKyPiKeBR8gkMyLiqYh4KSI2RcQ8Ml+yteO4OiI+jIh/J+vcGRFrk+1dDQyQ1KmRsAcBXSPimojYkJzr+AOZL9tc3QysBa5M4pgdEc9FRFVEVAC/ryP2+gwGOgDjk3j+AfyNTxI1wEMRMTMiqsgku4FbEas1AycCy8WZZL6IVybzf6ZW99BW2ovMUQAAEfEBsIrML9xqb2dN/7uO+Q4AknaW9Puka2cN8Aywq3K7UqYLsBOZxJaL1RHxYdb868m+IOkQSU8m3TXvA+cl7WerrJ6Q1FbSeEmvJnFXZMXUkC8CeyXdMO9Jeg/4TyCnIzRJ3wWGAadGxKakbL+kS+2tJJZf5BBHtb2Ayuq2Eq+z+Wf5Vtb0OpLPznYcTgTWoKS//GTgq8kXxVtkuhEGSBqQVPsQ2DlrtS800uybZL7QqrfxOWB3YPk2hPgjoDdwSER0BKq7MpT829DwuiuBj4Bc+6w7J7FW25vMvkAmOZYB3SOiE3BbVgzVsmM5FTgOOBLoBPSoFXd9KoGlEbFr1muXiBjZWPCShgI/A46LiDVZi34HvAL0St7D/8whjmpvAt2To6tqe7Ntn6W1ECcCa8zxZPqQ+5A5pB9Ipp/7WTJ94gBzgROTX+dFwNm12ngbyL5u/37g25IGKnMp6i+A55Nuia21C5kjhPck7Qb8pJFt10h+xd4J/Do5Ads2Oana0OWxP5XUPvlS/Q/gwaw4/jciPpJ0MI13ne0CrCdzJLQzmfcgFzOBtZIuS050t5V0gKRBDa0kqTvwAHBGRCyqI5Y1wAeSvgR8r9byet9DMuca1gGXSmonaRhwLFue87EdmBOBNeZM4K7kmvS3ql9kTsiepsylkDcCG8h8YdzNlic9rwbuTroyTo6IvwNXAZOBFWR+kW9NH3e23wCfJfPr/jlgWq3lNwEnKXNF0c11rP9j4CVgFvC/wK+o/+/iLWA1mV/B9wHnRcQrybLvA9dIWguMI/Ol25B7yHShLAcWJLE3Krkn4D/IJOSlZPb7DjJHFQ05gkz30V+yrhyanyz7MZnEtZbM+YZJtda9mqzPr1Y8G8h88Y9IYrmVTLJ5BWs15AfTmDUu+aX7p4gobOFQzJqcjwjMzFLOicDMLOXcNWRmlnI+IjAzS7lWN/hVly5dokePHi0dhplZqzJ79uyVEdG1rmWtLhH06NGD8vLylg7DzKxVkfR6fcvcNdQEpk2bRu/evSkqKmL8+PH11ps8eTKSNktk8+bNY8iQIfTt25d+/frx0UcfATB79mz69etHUVERF154IdXncq666ir69+/PwIEDOeqoo3jzzcyNrRHBhRdeSFFREf3792fOnDkAzJ07t6b9/v37M2lS7UvEzSz1IqJVvQ466KDYkVRVVUXPnj3j1VdfjfXr10f//v1j/vz5W9Rbs2ZNDB06NA455JCYNWtWRER8/PHH0a9fv5g7d25ERKxcuTKqqqoiImLQoEExY8aM2LRpUxx99NExderUiIh4//33a9q86aab4rvf/W5ERDzyyCNx9NFHx6ZNm2LGjBlx8MEHR0TEwoULY9GiRRERsXz58vjCF74Qq1evzs+bYWY7LKA86vle9RHBdpo5cyZFRUX07NmT9u3bM3r0aKZMmbJFvauuuorLLruMnXbaqabsscceo3///gwYkBmyZ/fdd6dt27asWLGCNWvWMHjwYCRxxhln8PDDDwPQsWPHmvU//PBDpMyQMFOmTOGMM85AEoMHD+a9995jxYoV7LfffvTq1QuAvfbaiz322IN33303X2+HmbVCTgTbafny5XTv3r1mvrCwkOXLNx9va86cOVRWVnLMMcdsVr5o0SIkMXz4cA488ECuu+66mjYLCwvrbfPKK6+ke/fu3HfffVxzzTU5xzFz5kw2bNjAvvv6uSBm9gkngjzbtGkTY8eO5YYbbthiWVVVFdOnT+e+++5j+vTpPPTQQzzxxBONtnnttddSWVnJaaedxoQJE3KKY8WKFZx++uncddddtGnjj93MPuFvhO3UrVs3Kitrhpln2bJldOv2yVDsa9eu5eWXX2bYsGH06NGD5557jpKSEsrLyyksLOSwww6jS5cu7LzzzowcOZI5c+bQrVs3li1bVm+b1U477TQmT57caBxr1qzhmGOO4dprr2Xw4MFN/h6YWeuW10Qg6WhJC5V5Nu3ldSw/K3mQx9zkdU5d7ezIBg0axOLFi1m6dCkbNmygtLSUkpJPnu3eqVMnVq5cSUVFBRUVFQwePJiysjKKi4sZPnw4L730EuvWraOqqoqnn36aPn36sOeee9KxY0eee+45IoJ77rmH4447DoDFixfXtD1lyhS+9KUvAVBSUsI999xDRPDcc8/RqVMn9txzTzZs2MAJJ5zAGWecwUknndS8b46ZtQp5u48geULULcDXgWXALEllEbGgVtVJUfezZFuFgoICJkyYwPDhw9m4cSNjxoyhb9++jBs3juLi4s2SQm2dO3dm7NixDBo0CEmMHDmy5jzCrbfeyllnncW///1vRowYwYgRIwC4/PLLWbhwIW3atOGLX/wit912GwAjR45k6tSpFBUVsfPOO3PXXXcB8MADD/DMM8+watUqJk6cCMDEiRMZOHBg/t4UM2tV8jbWkKQhZJ7ROjyZvwIgIn6ZVecsoHhrEkFxcXH4hjIzs60jaXZEFNe1LJ9dQ93IekYrmaOCLTu64RuS5kn6S/IUpS1IOldSuaTytFz6OGzYMIYNG9bSYZhZCrT0yeK/Aj0ioj/wOJmnW20hIm6PiOKIKO7atc6hMszMbBvlMxEsB7J/4RdS64HWEbEqItYns3cAB+UxHjMzq0M+E8EsoJekfSS1J/NM2rLsCpL2zJotAf6Vx3jMzKwOebtqKCKqJF0APAq0Be6MiPmSriEz5kUZcKGkEqCKzIPDz8pXPGZmVre8DkMdEVOBqbXKxmVNXwFckc8YzMysYS19stjMzFqYE4GZWco5EZiZpZwTgZlZyjkRmJmlnBOBmVnKORGYmaWcE4GZWco5EZiZpZwTgZlZyjkRmJmlXF7HGtrRfPzTH7V0CDmLileB1hNzu5/c0NIhmNk28hGBmVnKORGYmaWcE4GZWco5EZiZpZwTgZlZyjkRmJmlnBOBmVnKORGYmaWcE4GZWco5EZjZDm/atGn07t2boqIixo8fv8Xy2267jX79+jFw4EC+8pWvsGDBAgA2bNjAt7/9bfr168eAAQN46qmnataZNGkS/fv3p2/fvlx22WU15T/84Q8ZOHAgAwcOZL/99mPXXXetWXbZZZdxwAEHcMABBzBp0qSa8ojgyiuvZL/99mP//ffn5ptvbvo3IY9SNcSEmbU+Gzdu5Pzzz+fxxx+nsLCQQYMGUVJSQp8+fWrqnHrqqZx33nkAlJWVMXbsWKZNm8Yf/vAHAF566SXeeecdRowYwaxZs1i9ejWXXHIJs2fPpmvXrpx55pk88cQTHHHEEdx444017f72t7/lhRdeAOCRRx5hzpw5zJ07l/Xr1zNs2DBGjBhBx44dmThxIpWVlbzyyiu0adOGd955pxnfoe3nIwIz26HNnDmToqIievbsSfv27Rk9ejRTpkzZrE7Hjh1rpj/88EMkAbBgwQIOP/xwAPbYYw923XVXysvLee211+jVqxddu3YF4Mgjj2Ty5MlbbPv+++/nlFNOqWnrsMMOo6CggM997nP079+fadOmAfC73/2OcePG0aZNm5pttSZOBGa2Q1u+fDndu3evmS8sLGT58uVb1LvlllvYd999ufTSS2u6ZgYMGEBZWRlVVVUsXbqU2bNnU1lZSVFREQsXLqSiooKqqioefvhhKisrN2vv9ddfZ+nSpTWJZMCAAUybNo1169axcuVKnnzyyZp1Xn31VSZNmkRxcTEjRoxg8eLF+Xo78sKJwMw+Fc4//3xeffVVfvWrX/Hzn/8cgDFjxlBYWEhxcTEXX3wxhx56KG3btqVz58787ne/Y9SoUQwdOpQePXrQtm3bzdorLS3lpJNOqik/6qijGDlyJIceeiinnHIKQ4YMqVm2fv16dtppJ8rLy/nOd77DmDFjmnfnt5MTgZnt0Lp167bZr/Vly5bRrVu3euuPHj2ahx9+GICCggJuvPFG5s6dy5QpU3jvvffYb7/9ADj22GN5/vnnmTFjBr17964pr1ZaWlrTLVTtyiuvZO7cuTz++ONERM06hYWFnHjiiQCccMIJzJs3b7v3uzk5EZjZDm3QoEEsXryYpUuXsmHDBkpLSykpKdmsTnZXzCOPPEKvXr0AWLduHR9++CEAjz/+OAUFBTUnmatP6K5evZpbb72Vc845p6aNV155hdWrVzNkyJCaso0bN7Jq1SoA5s2bx7x58zjqqKMAOP7443nyyScBePrpp7dIKjs6XzW0g/r7t09u6RDMdggFBQVMmDCB4cOHs3HjRsaMGUPfvn0ZN24cxcXFlJSUMGHCBP7+97/Trl07OnfuzN133w1kvuyHDx9OmzZt6NatG/fee29NuxdddBEvvvgiAOPGjdvsy7u0tJTRo0fXnHQG+Pjjjxk6dCiQOTn9pz/9iYKCzFfo5ZdfzmmnncaNN95Ihw4duOOOO/L+vjQlRURLx7BViouLo7y8fJvWbS1P+2qN/IQysx2bpNkRUVzXMncNmVkqDRs2jGHDhrV0GDsEJwIzs5TLayKQdLSkhZKWSLq8gXrfkBSS6jxsMTOz/MlbIpDUFrgFGAH0AU6R1KeOersAFwHP5ysWMzOrXz6PCA4GlkTEaxGxASgFjquj3s+AXwEf5TEWMzOrRz4TQTcg+57tZUlZDUkHAt0j4pE8xmFmZg1osfsIJLUBfg2clUPdc4FzAfbee+/8BmZm26W1XKYdFa8CrSdeyN9l2vk8IlgOdM+aL0zKqu0CHAA8JakCGAyU1XXCOCJuj4jiiCiuHi3QzMyaRj4TwSygl6R9JLUHRgNl1Qsj4v2I6BIRPSKiB/AcUBIR23a3mJmZbZO8JYKIqAIuAB4F/gU8EBHzJV0jqaThtc3MrLnk9RxBREwFptYqG1dP3WH5jMXMzOrmQefMLJU8sOMnPMSEmVnKORGYmaWcE4GZWco5EZiZpZwTgZlZyjkRmJmlnBOBmVnKORGYmaWcE4GZWco5EZiZpZwTgZlZyjkRmJmlnBOBmVnKORGYmaWcE4GZWco5EZiZpZwTgZlZyjkRmJmlnBOBmVnKORGYmaWcE4GZWco5EZiZpZwTgZlZyhU0tFDSbjm0sSki3muacMzMrLk1mAiAN5OXGqjTFti7ySIyM7Nm1Vgi+FdEfLmhCpJeaMJ4zMysmTV2jmBIDm3kUsfMzHZQDSaCiPgIQNK+kj6TTA+TdKGkXbPrmJlZ65TrVUOTgY2SioDbge7An/MWlZmZNZtcE8GmiKgCTgB+GxGXAHvmLywzM2suuSaCjyWdApwJ/C0pa5efkMzMrDnlmgi+Teak8LURsVTSPsC9+QvLzMyaS06JICIWRMSFEXF/Mr80In7V2HqSjpa0UNISSZfXsfw8SS9JmitpuqQ+W78LZma2PRpMBJJub6yB+upIagvcAowA+gCn1PFF/+eI6BcRA4HrgF/nErSZmTWdxm4oO15SQ5eHCvhaPcsOBpZExGsAkkqB44AF1RUiYk1W/c8B0WjEZmbWpBpLBJfk0Maz9ZR3Ayqz5pcBh9SuJOl8YCzQHji8roYknQucC7D33h7NwsysKTWYCCLi7nwHEBG3ALdIOhX4LzJXJtWuczuZ+xcoLi72UYOZWRPK5zDUy8nceFatMCmrTylwfB7jMTOzOuQzEcwCeknaR1J7YDRQll1BUq+s2WOAxXmMx8zM6tDYOYLNSNo5ItblUjciqiRdADxKZqjqOyNivqRrgPKIKAMukHQk8DGwmjq6hczMLL9ySgSSDgXuADoAe0saAHw3Ir7f0HoRMRWYWqtsXNb0RVsdsZmZNalcu4ZuBIYDqwAi4kXgsHwFZWZmzSfncwQRUVmraGMTx2JmZi0g13MElUn3UEhqB1wE/Ct/YZmZWXPJ9YjgPOB8MjeJLQcGJvNmZtbK5XREEBErgdPyHIuZmbWAXK8a2gf4AdAje52IKMlPWGZm1lxyPUfwMPBH4K/AprxFY2ZmzS7XRPBRRNyc10jMzKxF5JoIbpL0E+AxYH11YUTMyUtUZmbWbHJNBP2A08kME13dNRTUM2y0mZm1Hrkmgm8CPSNiQz6DMTOz5pfrfQQvA7vmMQ4zM2shuR4R7Aq8ImkWm58j8OWjZmatXK6J4Cd5jcLMzFpMrncWP53vQMzMrGU0mAgkTY+Ir0haS+YqoZpFQEREx7xGZ2ZmedfYw+u/kvy7S/OEY2ZmzS2nq4Yk3ZtLmZmZtT65Xj7aN3tGUgFwUNOHY2Zmza3BRCDpiuT8QH9Ja5LXWuBtYEqzRGhmZnnVYCKIiF8m5wf+OyI6Jq9dImL3iLiimWI0M7M8yvXy0SskdQO+yObPI3gmX4GZmVnzyPXBNOOB0cACPnlofQBOBGZmrVyudxafAPSOiPWN1jQzs1Yl16uGXgPa5TMQMzNrGbkeEawD5kp6gs0HnbswL1GZmVmzyTURlCUvMzP7lMn1qqG78x2ImZm1jFyvGlrK5oPOARARPZs8IjMza1a5dg0VZ03vRObRlbs1fThmZtbccrpqKCJWZb2WR8RvgGPyG5qZmTWHXLuGDsyabUPmCCHXowkzM9uB5fplfkPWdBVQQaZ7qEGSjgZuAtoCd0TE+FrLxwLnJG2+C4yJiNdzjMnMzJpArlcNfS17XlJbMkNOLKpvnaTOLcDXgWXALEllEbEgq9oLQHFErJP0PeA6YNTW7YKZmW2Pxoah7pgMRT1B0teVcQGwBDi5kbYPBpZExGsRsQEoBY7LrhART0bEumT2OaBw23bDzMy2VWNHBPcCq4EZwHeAK8k8r/iEiJjbyLrdgMqs+WXAIQ3UPxv4n7oWSDoXOBdg7733bmSzZma2NRpLBD0joh+ApDuAFcDeEfFRUwYh6VtkTkB/ta7lEXE7cDtAcXHxFvczmJnZtmssEXxcPRERGyUt24oksBzonjVfmJRtRtKRZI40vurRTc3Mml9jiWCApDXJtIDPJvMCIiI6NrDuLKCXpH3IJIDRwKnZFSR9Gfg9cHREvLMtO2BmZtunwUQQEW23teGIqEpOLD9K5vLROyNivqRrgPKIKAP+G+gAPCgJ4I2IKNnWbZqZ2dbL601hETEVmFqrbFzW9JH53L6ZmTUu1wfTmJnZp5QTgZlZyjkRmJmlnBOBmVnKORGYmaWcE4GZWco5EZiZpZwTgZlZyjkRmJmlnBOBmVnKORGYmaWcE4GZWco5EZiZpZwTgZlZyjkRmJmlnBOBmVnKORGYmaWcE4GZWco5EZiZpZwTgZlZyjkRmJmlnBOBmVnKORGYmaWcE4GZWco5EZiZpZwTgZlZyjkRmJmlnBOBmVnKORGYmaWcE4GZWco5EZiZpZwTgZlZyuU1EUg6WtJCSUskXV7H8sMkzZFUJemkfMZiZmZ1y1sikNQWuAUYAfQBTpHUp1a1N4CzgD/nKw4zM2tYQR7bPhhYEhGvAUgqBY4DFlRXiIiKZNmmPMZhZmYNyGfXUDegMmt+WVK21SSdK6lcUvm7777bJMGZmVlGqzhZHBG3R0RxRBR37dq1pcMxM/tUyWciWA50z5ovTMrMzGwHks9EMAvoJWkfSe2B0UBZHrdnZmbbIG+JICKqgAuAR4F/AQ9ExHxJ10gqAZA0SNIy4JvA7yXNz1c8ZmZWt3xeNURETAWm1ioblzU9i0yXkZmZtZBWcbLYzMzyx4nAzCzlnAjMzFLOicDMLOWcCMzMUs6JwMws5ZwIzMxSzonAzCzlnAjMzFLOicDMLOWcCMzMUs6JwMws5ZwIzMxSzonAzCzlnAjMzFLOicDMLOWcCMzMUs6JwMws5ZwIzMxSzonAzCzlnAjMzFLOicDMLOWcCMzMUs6JwMws5ZwIzMxSzonAzCzlnAjsU2XatGn07t2boqIixo8fv8Xy9evXM2rUKIqKijjkkEOoqKioWfbLX/6SoqIievfuzaOPPppzmxdeeCEdOnSomZ84cSJdu3Zl4MCBDBw4kDvuuGOz+mvWrKGwsJALLrgAgLVr19bUHThwIF26dOHiiy/ebJ3JkycjifLy8pqyefPmMWTIEPr27Uu/fv346KOPtuq9MqtW0NIBmDWVjRs3cv755/P4449TWFjIoEGDKCkpoU+fPjV1/vjHP9K5c2eWLFlCaWkpl112GZMmTWLBggWUlpYyf/583nzzTY488kgWLVoE0GCb5eXlrF69eotYRo0axYQJE+qM86qrruKwww6rmd9ll12YO3duzfxBBx3EiSeeWDO/du1abrrpJg455JCasqqqKr71rW9x7733MmDAAFatWkW7du227Y2z1PMRgX1qzJw5k6KiInr27En79u0ZPXo0U6ZM2azOlClTOPPMMwE46aSTeOKJJ4gIpkyZwujRo/nMZz7DPvvsQ1FRETNnzmywzY0bN3LJJZdw3XXX5Rzj7NmzefvttznqqKPqXL5o0SLeeecdhg4dWlN21VVXcdlll7HTTjvVlD322GP079+fAQMGALD77rvTtm3bnOMwy+ZEYJ8ay5cvp3v37jXzhYWFLF++vN46BQUFdOrUiVWrVtW7bkNtTpgwgZKSEvbcc88tYpk8eTL9+/fnpJNOorKyEoBNmzbxox/9iOuvv77efSgtLWXUqFFIAmDOnDlUVlZyzDHHbFZv0aJFSGL48OEceOCBW5WMzGpzIjDbBm+++SYPPvggP/jBD7ZYduyxx1JRUcG8efP4+te/XnMEcuuttzJy5EgKCwvrbbe0tJRTTjkFyCSOsWPHcsMNN2xRr6qqiunTp3Pfffcxffp0HnroIZ544okm2jtLG58jsE+Nbt261fz6Bli2bBndunWrs05hYSFVVVW8//777L777g2uW1f5Cy+8wJIlSygqKgJg3bp1FBUVsWTJEnbfffea+ueccw6XXnopADNmzODZZ5/l1ltv5YMPPmDDhg106NCh5gT0iy++SFVVFQcddBCQOTfw8ssvM2zYMADeeustSkpKKCsro7CwkMMOO4wuXboAMHLkSObMmcMRRxzRJO+lpYuPCOxTY9CgQSxevJilS5eyYcMGSktLKSkp2axOSUkJd999NwB/+ctfOPzww5FESUkJpaWlrF+/nqVLl7J48WIOPvjgets85phjeOutt6ioqKCiooKdd96ZJUuWALBixYqa7ZWVlbH//vsDcN999/HGG29QUVHB9ddfzxlnnLHZVUj3339/zdEAQKdOnVi5cmXNNgYPHkxZWRnFxcUMHz6cl156iXXr1lFVVcXTTz+92Ulxs62R1yMCSUcDNwFtgTsiYnyt5Z8B7gEOAlYBoyKiIp8x2adXQUEBEyZMYPjw4WzcuJExY8bQt29fxo0bR3FxMSUlJZx99tmcfvrpFBUVsdtuu1FaWgpA3759Ofnkk+nTpw8FBQXccsstNSdf62qzITfffDNlZWUUFBSw2267MXHixJzif+CBB5g6dWpOdTt37szYsWMZNGgQkhg5cuQW5xHMcqWIyE/DUltgEfB1YBkwCzglIhZk1fk+0D8izpM0GjghIkY11G5xcXFkX0u9NT7+6Y+2aT1rXLufbNmPbenkv7P82Z6/M0mzI6K4rmX57Bo6GFgSEa9FxAagFDiuVp3jgLuT6b8AR6j6cgkzM2sW+ewa6gZUZs0vAw6pr05EVEl6H9gdWJldSdK5wLnJ7AeSFuYl4h1PF2q9Fzusq3/d0hHsCFrP52XVWtdntn1/Z1+sb0GruGooIm4Hbm/pOJqbpPL6DuVsx+PPq/XxZ5aRz66h5UD3rPnCpKzOOpIKgE5kThqbmVkzyWcimAX0krSPpPbAaKCsVp0y4Mxk+iTgH5Gvs9dmZlanvHUNJX3+FwCPkrl89M6ImC/pGqA8IsqAPwL3SloC/C+ZZGGfSF13WCvnz6v18WdGHi8fNTOz1sF3FpuZpZwTgZlZyjkRmG0FSVdL+nFLx2H5JekpSam5rNSJwMws5ZwI8kRSD0n/kvQHSfMlPSbps9m/NCR1kVSRTJ8l6WFJj0uqkHSBpLGSXpD0nKTdknpPSbpJ0lxJL0s6WFIbSYsldU3qtJG0pHreto+kKyUtkjQd6J2UfUfSLEkvSposaeek/POSHkrKX5R0aFL+LUkzk8/t98lYXLad8vV3ljg9++8sWf9gSTOS+v+U1Lv597rpORHkVy/glojoC7wHfKOR+gcAJwKDgGuBdRHxZWAGcEZWvZ0jYiDwfTKX5W4C/gScliw/EngxIt5tov1ILUkHkbmseSAwksxnA/B/I2JQRAwA/gWcnZTfDDydlB8IzJe0PzAK+D/J57aRTz4r237N8neWlL0CDE3qjwN+0UT70KJaxRATrdjSiJibTM8GejRS/8mIWAusTcZd+mtS/hLQP6ve/QAR8YykjpJ2JfMfdQrwG2AMcFcTxG8wFHgoItYBSKq+KfIAST8HdgU6kLlfBuBwki+TiNgIvC/pdDJDrc9KxlT8LPBOc+1ACjTn39kuwN2SegEBtGuSPWhhTgT5tT5reiOZL4AqPjkS26mB+puy5jex+WdV++aPiIhKSW9LOpzMyK/+xZlfE4HjI+JFSWcBwxqoK+DuiLiiGeJKo2b7OwN+RiaRnCCpB/DUNke9A3HXUPOrIPPrEDLDamyLUQCSvgK8HxHvJ+V3kOkiejD5NWrb7xng+KTfeRfg2KR8F2CFpHZsnnSfAL4HmWdySOqUlJ0kaY+kfDdJ9Y4EaU2igvz8nXXikzHTztqO+HYoTgTN73rge5JeIDME7rb4KFn/Nj7pm4bM2E0dcLdQk4mIOcAk4EXgf8iMoQVwFfA88P/I9BtXuwj4mqSXyHRT9EkexvRfwGOS5gGPA3s2zx6kVr7+zq4DfpmUf2p6VDzERCsj6SngxxGxxWPakqskboyIoc0emJm1Wp+ajJZ2ki4n0yXhcwNmtlV8RGBmlnI+R2BmlnJOBGZmKedEYGaWck4E1upI+qCJ2hkm6W9N0dY2bLuHpFO3tp6kYkk35zc6SxsnArOW0QNoNBHUrhcR5RFxYZ5ispRyIrBWK/lF/7SkKZJekzRe0mnJKJ8vSdo3qTdR0m2SypNRRP+jjrY+J+nOZN0XJB2XlOc6Kuy+kqZJmi3pWUlfytr2zclIla9Jqr7LdTwwNBnd8ofJL/9nJc1JXofWU6/mKCa5Q/lhSfOSWPon5Vcn+/JUsk0nDmtYRPjlV6t6AR8k/w4jM9rknsBnyNz6/9Nk2UXAb5LpicA0Mj98egHLyIw/Mwz4W1LnF8C3kuldgUXA58gMI7CEzJASXYH3gfOSejcCFyfTTwC9kulDgH9kbfvBZNt9gCVZsf8ta592BnZKpnsB5fXUy475t8BPkunDgbnJ9NXAP5P3pAuwCmjX0p+bXzvuyzeUWWs3KyJWAEh6FXgsKX8J+FpWvQciM1z3YkmvAV+q1c5RQIk+efrYTsDeyfST0cBolZI6AIcCDyaji0LmS7jaw8m2F0j6fD370Q6YIGkgmYHT9mt81/kKyZDLEfEPSbtL6pgseyQi1gPrJb0DfJ5MAjTbghOBtXbbM5JkNgHfiIiFmxVKh+SwjTbAe5EZu76xGFVPnR8CbwMDkvY+qqdermqPyOm/dauXzxFYWnxTmSe37Qv0BBbWWv4o8AMlP+klfTnXhiNiDbBU0jeTdSVpQCOrrSXT3VStE7AiOXI4HWhbT71sz5IMKSJpGLAyicVsqzgRWFq8AcwkM4LoeRFR+xf3z8h0z8yTND+Z3xqnAWdLehGYDxzXSP15wEZlHmf5Q+BW4Mxk/S8BH9ZTL9vVwEHJiKbjgTO3MmYzwGMNWQpImkjmBOtfWjoWsx2RjwjMzFLORwRmZinnIwIzs5RzIjAzSzknAjOzlHMiMDNLOScCM7OU+//Jmp+/QjFlzQAAAABJRU5ErkJggg==\n", + "text/plain": [ + "
" + ] + }, + "metadata": { + "needs_background": "light" + }, + "output_type": "display_data" + } + ], + "source": [ + "barplot('Automatic parallelization', labels=True)" + ] + }, + { + "cell_type": "markdown", + "id": "3864237f", + "metadata": {}, + "source": [ + "As we can see, the nested call triggered the numba code to stay sequential, whereas the global data dependency analysis in DaCe allowed it to parallelize the code, yielding a performance of **549 µs** vs. 406 ms." + ] + }, + { + "cell_type": "markdown", + "id": "f71c9c78", + "metadata": {}, + "source": [ + "## 3D Heat Diffusion\n", + "\n", + "As a more realistic application, the following program, `heat3d` is taken from the [NPBench numpy benchmark](https://github.com/spcl/npbench). It runs a three-dimensional stencil repeatedly to perform heat diffusion:" + ] + }, + { + "cell_type": "code", + "execution_count": 39, + "id": "8b7f6433", + "metadata": {}, + "outputs": [], + "source": [ + "def heat3d(TSTEPS, A, B):\n", + " for t in range(1, TSTEPS):\n", + " B[1:-1, 1:-1,\n", + " 1:-1] = (0.125 * (A[2:, 1:-1, 1:-1] - 2.0 * A[1:-1, 1:-1, 1:-1] +\n", + " A[:-2, 1:-1, 1:-1]) + 0.125 *\n", + " (A[1:-1, 2:, 1:-1] - 2.0 * A[1:-1, 1:-1, 1:-1] +\n", + " A[1:-1, :-2, 1:-1]) + 0.125 *\n", + " (A[1:-1, 1:-1, 2:] - 2.0 * A[1:-1, 1:-1, 1:-1] +\n", + " A[1:-1, 1:-1, 0:-2]) + A[1:-1, 1:-1, 1:-1])\n", + " A[1:-1, 1:-1,\n", + " 1:-1] = (0.125 * (B[2:, 1:-1, 1:-1] - 2.0 * B[1:-1, 1:-1, 1:-1] +\n", + " B[:-2, 1:-1, 1:-1]) + 0.125 *\n", + " (B[1:-1, 2:, 1:-1] - 2.0 * B[1:-1, 1:-1, 1:-1] +\n", + " B[1:-1, :-2, 1:-1]) + 0.125 *\n", + " (B[1:-1, 1:-1, 2:] - 2.0 * B[1:-1, 1:-1, 1:-1] +\n", + " B[1:-1, 1:-1, 0:-2]) + B[1:-1, 1:-1, 1:-1])" + ] + }, + { + "cell_type": "code", + "execution_count": 40, + "id": "d8e54447", + "metadata": {}, + "outputs": [], + "source": [ + "# Using the \"L\" size\n", + "TSTEPS, N = 100, 70\n", + "A = np.fromfunction(lambda i, j, k: (i + j + (N - k)) * 10 / N, (N, N, N),\n", + " dtype=np.float64)\n", + "B = np.copy(A)" + ] + }, + { + "cell_type": "code", + "execution_count": 41, + "id": "29ef687a", + "metadata": {}, + "outputs": [], + "source": [ + "dace_heat3d = dace.program(auto_optimize=True)(heat3d)\n", + "numba_heat3d = numba.jit(nopython=True, parallel=True)(heat3d)" + ] + }, + { + "cell_type": "code", + "execution_count": 42, + "id": "a7606742", + "metadata": {}, + "outputs": [], + "source": [ + "%%pythran\n", + "#pythran export pythran_heat3d(int, float64[:,:,:], float64[:,:,:])\n", + "def pythran_heat3d(TSTEPS, A, B):\n", + " for t in range(1, TSTEPS):\n", + " B[1:-1, 1:-1,\n", + " 1:-1] = (0.125 * (A[2:, 1:-1, 1:-1] - 2.0 * A[1:-1, 1:-1, 1:-1] +\n", + " A[:-2, 1:-1, 1:-1]) + 0.125 *\n", + " (A[1:-1, 2:, 1:-1] - 2.0 * A[1:-1, 1:-1, 1:-1] +\n", + " A[1:-1, :-2, 1:-1]) + 0.125 *\n", + " (A[1:-1, 1:-1, 2:] - 2.0 * A[1:-1, 1:-1, 1:-1] +\n", + " A[1:-1, 1:-1, 0:-2]) + A[1:-1, 1:-1, 1:-1])\n", + " A[1:-1, 1:-1,\n", + " 1:-1] = (0.125 * (B[2:, 1:-1, 1:-1] - 2.0 * B[1:-1, 1:-1, 1:-1] +\n", + " B[:-2, 1:-1, 1:-1]) + 0.125 *\n", + " (B[1:-1, 2:, 1:-1] - 2.0 * B[1:-1, 1:-1, 1:-1] +\n", + " B[1:-1, :-2, 1:-1]) + 0.125 *\n", + " (B[1:-1, 1:-1, 2:] - 2.0 * B[1:-1, 1:-1, 1:-1] +\n", + " B[1:-1, 1:-1, 0:-2]) + B[1:-1, 1:-1, 1:-1])" + ] + }, + { + "cell_type": "code", + "execution_count": 43, + "id": "3b2218b6", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "3.28 s ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)\n", + "3.75 s ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)\n", + "216 ms ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)\n" + ] + } + ], + "source": [ + "# Warmup\n", + "%timeit -r 1 -n 1 dace_heat3d(TSTEPS, A, B)\n", + "%timeit -r 1 -n 1 numba_heat3d(TSTEPS, A, B)\n", + "%timeit -r 1 -n 1 pythran_heat3d(TSTEPS, A, B)" + ] + }, + { + "cell_type": "code", + "execution_count": 44, + "id": "d3975c40", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "799 ms ± 11.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)\n" + ] + } + ], + "source": [ + "TIMES = {}\n", + "TIMES['numpy'] = %timeit -o heat3d(TSTEPS, A, B)" + ] + }, + { + "cell_type": "code", + "execution_count": 45, + "id": "452597f9", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "11.2 ms ± 406 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)\n", + "77.1 ms ± 3.46 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)\n", + "184 ms ± 573 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)\n" + ] + } + ], + "source": [ + "TIMES['dace'] = %timeit -o dace_heat3d(TSTEPS, A, B)\n", + "TIMES['numba'] = %timeit -o numba_heat3d(TSTEPS, A, B)\n", + "TIMES['pythran'] = %timeit -o pythran_heat3d(TSTEPS, A, B)" + ] + }, + { + "cell_type": "code", + "execution_count": 46, + "id": "38bb42f7", + "metadata": {}, + "outputs": [ + { + "data": { + "image/png": "iVBORw0KGgoAAAANSUhEUgAAAYIAAAEWCAYAAABrDZDcAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjQuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8rg+JYAAAACXBIWXMAAAsTAAALEwEAmpwYAAAqMUlEQVR4nO3de3hV5Zn38e8PEC0iIhL7MgSKGBEEETWiOFXRiqhVFIuKp8HTOPLqWB214qhobZ1ibW2r4Phaj7WWqHiAsYg4Kh5arYkWqYACQloSrCAVkVoJCff7x17EnRCSkGTntH+f69qX63Cvte69DPveaz1rP48iAjMzy14dWjoBMzNrWS4EZmZZzoXAzCzLuRCYmWU5FwIzsyznQmBmluVcCMyamaSzJc1Nm/9nSUslbZB0iqSvS3pV0ueSftqI4/ynpPuaJmtrz1wIrMVI+rWkjyStl7RE0kVp60ZK2px8OG6QVCLpcUkH17K/fpJCUqdqyx+S9MMmyHekpJI6Yh6SVJZ8iH8u6T1JP5K065aYiHg0Io5N2+wWYGpEdI2IZ4CLgU+AbhFxVUPzjYj/ioiL6o60bOdCYC3pR0C/iOgGjAF+KOmgtPWrIqIrsAtwKPA+8JqkbzV/qtvlxxGxC5ADnE8q999J2nkb8d8AFlabXxT+tac1ExcCazERsTAiNm6ZTV571RAXEVESEZOB+4DbGnNcSYdK+r2kdZLelTQybd35khYn3+aXS/q3ZPnOwHPAP6VdpfxTHe/vy4goJFXkdidVFJB0nqTXk+kPgf7A/yT7nA5MAL6XzB9T/Yqm+pWJpGsllSY5f7ClUEq6WdKv0+LGSFqYvO95kgalrSuWdLWkBZI+k/SYpJ0aeIqtjXEhsBYl6W5JX5D6tv8RMLuOTZ4CDqzl23Vdx+sN/Bb4IdADuBp4UlJOErIaOBHoRuqD+2eSDoyIvwPHk1ylJK9V9TlmRHwOvAAcXsO6vYC/ACcl+zwTeJTUVUXXiPjfOt7PPsBlwMHJVchooLiGuAHAdOAKUlcqs0kVn85pYacDxwF7AkOB8+rz/qztcyGwFhUR/5fUrZ/DSX3Ib6x9C1YBArrXEvNJ8q13naR1wFlp684BZkfE7IjYHBEvAEXACUk+v42ID5OrkFeAudTwAd4Aq0gVnqZWAewI7Ctph4gojogPa4g7A/htRLwQEZuAnwBfAw5Li7kzIlZFxN+A/wGGZSBfa4VcCKzFRURFRLwO5AIT6wjvTeoW0rpaYnpGRPctL+A3aeu+AZxWrVB8E+gFIOl4SW9K+luy7gSgZwPeVk15/60J9lNFRCwj9S3/ZmC1pIJt3LL6J+DPadttBlYmeW3x17TpL4CuTZ2vtU4uBNaadKKGNoJqxgLvJLdqGmIl8Eh6oYiInSNiiqQdgSdJfVv+elJEZpO6AoFUAdpukroCxwCvNTDnvwNd0ub/T/rKiPhNRHyTVJELam5DWZWs35KTgD5AaQNzsnbEhcBahKQ9JI2X1FVSR0mjgTOBF2uIlaTekm4CLgL+sxGH/jVwkqTRyXF3Shpfc4HOpG6zrAHKJR0PpD/m+TGwe/qjoHW8xx2Tp6CeAT4FHmxgzvOBEyT1kPR/SF0BbDnGPpKOTorYl8A/gM017ONx4NuSviVpB+AqUrfhft/AnKwdcSGwlhKkbgOVkPqQ/AlwRUTMSov5J0kbgA1AIbAfMDIi5lbfWb0PGrESOJlUMVlD6grhGqBD0qh7OakPzU9JtS3MStv2fVINrsuT20rbemroe5I+B9YCvwLeBg5rxFXMI8C7pBqB5wKPpa3bEZhC6ncHfwX2AK6rvoOI+IBU+8hdSexJpBqoyxqYk7Uj8qPKZmbZzVcEZmZZzoXAzCzLuRCYmWU5FwIzsyzXqe6Q1qVnz57Rr1+/lk7DzKxNefvttz+JiJya1rW5QtCvXz+KiopaOg0zszZF0p+3tc63hmowZ84c9tlnH/Ly8pgyZcpW66+88kqGDRvGsGHDGDBgAN27d69cd+211zJkyBCGDBnCY4999bj3Sy+9xIEHHsiQIUOYMGEC5eXlAHz66aeMHTuWoUOHMnz4cN577z0AvvzyS4YPH87+++/P4MGDuemmm7bK4/LLL6drV/cCYGaNFBFt6nXQQQdFJpWXl0f//v3jww8/jI0bN8bQoUNj4cKF24y/88474/zzz4+IiGeffTaOOeaY2LRpU2zYsCHy8/Pjs88+i4qKisjNzY0PPvggIiJuvPHGuO+++yIi4uqrr46bb745IiIWL14cRx99dEREbN68OT7//POIiCgrK4vhw4fHG2+8UXncwsLCOOecc2LnnXdu+pNgZu0OUBTb+Fz1FUE1b731Fnl5efTv35/OnTszfvx4Zs6cuc346dOnc+aZZwKwaNEijjjiCDp16sTOO+/M0KFDmTNnDmvXrqVz584MGDAAgFGjRvHkk09WbnP00UcDMHDgQIqLi/n444+RVPltf9OmTWzatIlU9zBQUVHBNddcw49//OOMnQczyx4uBNWUlpbSp0+fyvnc3FxKS2vul+vPf/4zK1asqPwg33///ZkzZw5ffPEFn3zyCS+//DIrV66kZ8+elJeXV7ZtzJgxg5UrV1Zu89RTTwGpIvTnP/+ZkpLUmCMVFRUMGzaMPfbYg1GjRnHIIYcAMHXqVMaMGUOvXr0ycxLMLKu0ucbi1qSgoIBx48bRsWNHAI499lgKCws57LDDyMnJYcSIEXTs2BFJFBQUcOWVV7Jx40aOPfbYym0mTZrEd7/7XYYNG8Z+++3HAQccULmuY8eOzJ8/n3Xr1jF27Fjee+89evTowRNPPMG8efNa6m2bWTvjQlBN7969K7+tA5SUlNC7d+8aYwsKCpg2bVqVZddffz3XX389AGeddVbl7aARI0bw2mupXojnzp3LkiVLAOjWrRsPPpjqlDIi2HPPPenfv3+VfXbv3p2jjjqKOXPmMGjQIJYtW0ZeXh4AX3zxBXl5eSxbtqyxb93MspRvDVVz8MEHs3TpUlasWEFZWRkFBQWMGTNmq7j333+fTz/9lBEjRlQuq6ioYO3atQAsWLCABQsWcOyxqV6MV69eDcDGjRu57bbbuOSSSwBYt24dZWWpDiDvu+8+jjjiCLp168aaNWtYt24dAP/4xz944YUXGDhwIN/+9rf561//SnFxMcXFxXTp0sVFwMwaxVcE1XTq1ImpU6cyevRoKioquOCCCxg8eDCTJ08mPz+/sigUFBQwfvz4ygZcSDXqHn54alTDbt268etf/5pOnVKn+Pbbb+fZZ59l8+bNTJw4sbJdYfHixUyYMAFJDB48mPvvvx+Ajz76iAkTJlBRUcHmzZs5/fTTOfHEE5vzVJhZlshoN9SSjgN+AXQE7ouIKdXW9wUeJjX+bEdgUkTUOnh5fn5++AdlZmbbR9LbEZFf07qM3RqS1BGYBhwP7AucKWnfamE3AI9HxAHAeODuTOXT0kaOHMnIkSNbOg0zs61kso1gOLAsIpZHahSkAlIjQ6ULoFsyvSupcVXNzKwZZbKNoDepYQC3KAEOqRZzMzBX0r8DO5Ma4NvMzJpRSzcWnwk8FBE/lTQCeETSkIioMvi2pIuBiwH69u3b4INt+v5Vjcm1UaL4wxbPYYebftpixzaz1iuTt4ZKgT5p87nJsnQXkhoonIh4A9gJ6Fl9RxFxb0TkR0R+Tk6NvaiamVkDZbIQFAJ7S9pTUmdSjcGzqsX8BfgWgKRBpArBmgzmZGZm1WTs1lBElEu6DHie1KOhD0TEQkm3kOoFbxZwFfBLSVeSajg+LzL5PGsL+t/zT2/pFMzMapTRNoLkNwGzqy2bnDa9CPjnTOZgZma1cxcTZmZZzoXAzCzLuRCYmWU5FwIzsyznQmBmluVcCMzMspwLgZlZlnMhMDPLci4EZmZZzoXAzCzLuRCYmWU5FwIzsyznQmBmluVcCMzMspwLgZlZlnMhMDPLci4EZmZZLqOFQNJxkj6QtEzSpBrW/0zS/OS1RNK6TOZjZmZby9hQlZI6AtOAUUAJUChpVjI8JQARcWVa/L8DB2QqHzMzq1kmrwiGA8siYnlElAEFwMm1xJ8JTM9gPmZmVoNMFoLewMq0+ZJk2VYkfQPYE3hpG+svllQkqWjNmjVNnqiZWTZrLY3F44EZEVFR08qIuDci8iMiPycnp5lTMzNr3zJZCEqBPmnzucmymozHt4XMzFpEJgtBIbC3pD0ldSb1YT+repCkgcBuwBsZzMXMzLYhY4UgIsqBy4DngcXA4xGxUNItksakhY4HCiIiMpWLmZltW8YeHwWIiNnA7GrLJlebvzmTOZiZWe1aS2OxmZm1EBcCM7Ms50JgZpblXAjMzLKcC4GZWZZzITAzy3IuBGZmWc6FwMwsy7kQmJllORcCM7Ms50JgZpblXAjMzLKcC4GZWZZzITAzy3IuBGZmWc6FwMwsy2W0EEg6TtIHkpZJmrSNmNMlLZK0UNJvMpmPmZltLWMjlEnqCEwDRgElQKGkWRGxKC1mb+A64J8j4lNJe2QqHzMzq1kmrwiGA8siYnlElAEFwMnVYv4VmBYRnwJExOoM5mNmZjXIZCHoDaxMmy9JlqUbAAyQ9DtJb0o6LoP5mJlZDTI6eH09j783MBLIBV6VtF9ErEsPknQxcDFA3759mzlFM7P2LZNXBKVAn7T53GRZuhJgVkRsiogVwBJShaGKiLg3IvIjIj8nJydjCZuZZaNMFoJCYG9Je0rqDIwHZlWLeYbU1QCSepK6VbQ8gzmZmVk1GSsEEVEOXAY8DywGHo+IhZJukTQmCXseWCtpEfAycE1ErM1UTmZmtrWMthFExGxgdrVlk9OmA/iP5GVmZi3Avyw2M8tyLgRmZlnOhcDMLMu5EJiZZTkXAjOzLOdCYGaW5VwIzMyynAuBmVmWcyEwM8tyLgRmZlnOhcDMLMu5EJiZZblaO52T1KMe+9hcfSAZMzNrO+rqfXRV8lItMR0BDxtmZtZG1VUIFkfEAbUFSPpjE+ZjZmbNrK42ghH12Ed9YszMrJWqtRBExJcAkvaStGMyPVLS5ZK6p8eYmVnbVN+nhp4EKiTlAfeSGpT+N3VtJOk4SR9IWiZpUg3rz5O0RtL85HXRdmVvZmaNVt+hKjdHRLmkscBdEXFXXW0DkjoC04BRQAlQKGlWRCyqFvpYRFy23ZmbmVmTqO8VwSZJZwITgGeTZTvUsc1wYFlELI+IMqAAOLlhaZqZWabUtxCcT6pR+NaIWCFpT+CROrbpDaxMmy9JllX3HUkLJM2Q1KemHUm6WFKRpKI1a9bUM2UzM6uPehWCiFgUEZdHxPRkfkVE3NYEx/8foF9EDAVeAB7exvHvjYj8iMjPyclpgsOamdkWtRYCSffWtYNaYkpJNSpvkZssqxQRayNiYzJ7H3BQXcczM7OmVVdj8SmSans8VMBR21hXCOyd3EYqBcYDZ1XZWOoVER8ls2OAxXWnbGZmTamuQnBNPfbxWk0Lk6eMLgOeJ9UNxQMRsVDSLUBRRMwCLpc0BigH/gacV+/MzcysSdRaCCKixnv29RURs4HZ1ZZNTpu+DriuMccwM7PGcTfUZmZZzoXAzCzLbVchkNQlU4mYmVnLqFchkHSYpEXA+8n8/pLuzmhmZmbWLOp7RfAzYDSwFiAi3gWOyFRSZmbWfOp9aygiVlZbVNHEuZiZWQuob++jKyUdBoSkHYDv4h9/mZm1C/W9IrgEuJRUp3GlwLBk3szM2rh6XRFExCfA2RnOxczMWkC9CkHSX9C/A/3St4mIMZlJy8zMmkt92wieAe4n1W305oxlY2Zmza6+heDLiLgzo5mYmVmLqG8h+IWkm4C5wJbxA4iIdzKSlZmZNZv6FoL9gHOBo/nq1lAk82Zm1obVtxCcBvRPBqE3M7N2pL6/I3gP6J7BPMzMrIXU94qgO/C+pEKqthH48VEzszauvoXgpobsXNJxwC9IDVV5X0RM2Ubcd4AZwMERUdSQY5mZWcPU95fFr2zvjiV1BKYBo4ASoFDSrIhYVC1uF1J9F/1he49hZmaNV2sbgaTXk/9+Lml92utzSevr2PdwYFlELE8amQuAk2uI+wFwG/BlA/I3M7NGqrUQRMQ3k//uEhHd0l67RES3OvbdG0jvurokWVZJ0oFAn4j4bW07knSxpCJJRWvWrKnjsGZmtj3qO0LZI/VZtj0kdQDuAK6qKzYi7o2I/IjIz8nJacxhzcysmvo+Pjo4fUZSJ+CgOrYpBfqkzecmy7bYBRgCzJNUDBwKzJKUX8+czMysCdTVRnCdpM+BoentA8DHwMw69l0I7C1pT0mdgfHArC0rI+KziOgZEf0ioh/wJjDGTw2ZmTWvutoIfhQRuwC3V2sf2D0irqtj23LgMuB5UqOZPR4RCyXdIsm/PzAzayXq+/jodZJ6A9+g6ngEr9ax3WxgdrVlk7cRO7I+uZiZWdOq78A0U0jd2lnEV4PWB1BrITAzs9avvr8sHgvsExEb64w0M7M2pb5PDS0HdshkImZm1jLqe0XwBTBf0otU7XTu8oxkZWZmzaa+hWAWaY9+mplZ+1Hfp4YeznQiZmbWMur71NAKUk8JVRER/Zs8IzMza1b1vTWU3u3DTqSGruzR9OmYmVlzq9dTQxGxNu1VGhE/B76d2dTMzKw51PfW0IFpsx1IXSHU92rCzMxasfp+mP80bbocKCZ1e8jMzNq4+j41dFT6fDIM5XhgSSaSMjOz5lNXN9Tdkq6op0oapZTLgGXA6c2TopmZZVJdVwSPAJ8CbwD/ClwPCBgbEfMzm5qZmTWHugpB/4jYD0DSfcBHQN+I8EDzZmbtRF2Pj27aMhERFUCJi4CZWftSVyHYv9oQlVuGrPxc0vq6di7pOEkfSFomaVIN6y+R9CdJ8yW9Lmnfhr4RMzNrmFpvDUVEx4buOHmyaBowCigBCiXNiohFaWG/iYh7kvgxwB3AcQ09ppmZbb/6jkfQEMOBZRGxPCLKgALg5PSAiEi/qtiZGvozMjOzzMrkr4N7AyvT5kuAQ6oHSboU+A+gM3B0TTuSdDFwMUDfvn2bPFEzs2yWySuCeomIaRGxF3AtcMM2Yu6NiPyIyM/JyWneBM3M2rlMFoJSoE/afG6ybFsKgFMymI+ZmdUgk4WgENhb0p6SOpPqkqLKKGeS9k6b/TawNIP5mJlZDTLWRhAR5Ul3FM8DHYEHImKhpFuAooiYBVwm6RhSv1f4FJiQqXzMzKxmGe1KOiJmA7OrLZucNv3dTB7fzMzq1uKNxWZm1rJcCMzMspwLgZlZlnMhMDPLci4EZmZZzoXAzCzLuRCYmWU5FwIzsyznQmBmluVcCMzMspwLgZlZlnMhMDPLci4EZmZZzoXAzCzLuRCYWbszZ84c9tlnH/Ly8pgyZcpW61999VUOPPBAOnXqxIwZM6qs+973vsfgwYMZNGgQl19+ORFRZf2YMWMYMmRI5fyNN97I0KFDGTZsGMceeyyrVq2qEl9YWLjVceo6RnNzITCzdqWiooJLL72U5557jkWLFjF9+nQWLVpUJaZv37489NBDnHXWWVWW//73v+d3v/sdCxYs4L333qOwsJBXXnmlcv1TTz1F165dq2xzzTXXsGDBAubPn8+JJ57ILbfcUiWXa6+9lmOPPbbex2gJLgRm1q689dZb5OXl0b9/fzp37sz48eOZOXNmlZh+/foxdOhQOnSo+hEoiS+//JKysjI2btzIpk2b+PrXvw7Ahg0buOOOO7jhhhuqbNOtW7fK6b///e9Iqpy/6667+M53vsMee+xRr2O0lIwWAknHSfpA0jJJk2pY/x+SFklaIOlFSd/IZD5m1v6VlpbSp0+fyvnc3FxKS0vrte2IESM46qij6NWrF7169WL06NEMGjQISN0Cuuqqq+jSpctW211//fX06dOHRx99tPKKoLS0lKeffpqJEyfW+xgtJWOFQFJHYBpwPLAvcKakfauF/RHIj4ihwAzgx5nKx8ysLsuWLWPx4sWUlJRQWlrKSy+9xGuvvcb8+fP58MMPGTt2bI3b3XrrraxcuZKzzz6bqVOnAnDFFVdw2223bXXVsa1jtKRMjlk8HFgWEcsBJBUAJwOVN+si4uW0+DeBczKYj5llgd69e7Ny5crK+ZKSEnr37l2vbZ9++mkOPfTQynaA448/njfeeINddtmFoqIi+vXrR3l5OatXr2bkyJHMmzevyvZnn302J5xwAt///vcpKipi/PjxAHzyySfMnj2bTp06sXTp0hqPcfjhhzfBu2+YTN4a6g2sTJsvSZZty4XAczWtkHSxpCJJRWvWrGnCFM2svTn44INZunQpK1asoKysjIKCAsaMGVOvbfv27csrr7xCeXk5mzZt4pVXXmHQoEFMnDiRVatWUVxczOuvv86AAQMqi8DSpUsrt585cyYDBw4EYMWKFRQXF1NcXMy4ceO4++67OeWUU7Z5jJbUKhqLJZ0D5AO317Q+Iu6NiPyIyM/JyWne5MysTenUqRNTp06tvPd++umnM3jwYCZPnsysWbOA1COdubm5PPHEE/zbv/0bgwcPBmDcuHHstdde7Lfffuy///7sv//+nHTSSbUeb9KkSQwZMoShQ4cyd+5cfvGLX9Qa35BjZJoy9fyqpBHAzRExOpm/DiAiflQt7hjgLuDIiFhd137z8/OjqKioQTlt+v5VDdquvdjhpp+2dApm1kIkvR0R+TWty+QVQSGwt6Q9JXUGxgOzqiV2APD/gDH1KQJmZtb0MtZYHBHlki4Dngc6Ag9ExEJJtwBFETGL1K2grsATybO3f4mI+t3MM7Os46v6zFzVZ/KpISJiNjC72rLJadPHZPL4ZmZWt1bRWGxmZi3HhcDMLMu5EJiZZTkXAjOzLOdCYGaW5VwIzMyynAuBmVmWcyEwM8tyLgRmZlnOhcDMLMu5EJiZZTkXAjOzLOdCYGaW5VwIzMyynAuBmVmWcyEwawXmzJnDPvvsQ15eHlOmTNlq/caNGznjjDPIy8vjkEMOobi4GIBHH32UYcOGVb46dOjA/PnzASgrK+Piiy9mwIABDBw4kCeffBKAhx56iJycnMpt7rvvvsrjHHfccXTv3p0TTzyxyvFXrFjBIYccQl5eHmeccQZlZWV17svajowWAknHSfpA0jJJk2pYf4SkdySVSxqXyVzMWquKigouvfRSnnvuORYtWsT06dNZtGhRlZj777+f3XbbjWXLlnHllVdy7bXXAnD22Wczf/585s+fzyOPPMKee+7JsGHDALj11lvZY489WLJkCYsWLeLII4+s3N8ZZ5xRud1FF11Uufyaa67hkUce2SrHa6+9liuvvJJly5ax2267cf/999e5L2s7MlYIJHUEpgHHA/sCZ0rat1rYX4DzgN9kKg+z1u6tt94iLy+P/v3707lzZ8aPH8/MmTOrxMycOZMJEyYAMG7cOF588UUiokrM9OnTGT9+fOX8Aw88wHXXXQdAhw4d6NmzZ525fOtb32KXXXapsiwieOmllxg3LvVdbcKECTzzzDPb/T6t9crkFcFwYFlELI+IMqAAODk9ICKKI2IBsDmDeZi1aqWlpfTp06dyPjc3l9LS0m3GdOrUiV133ZW1a9dWiXnsscc488wzAVi3bh0AN954IwceeCCnnXYaH3/8cWXsk08+ydChQxk3bhwrV66sNb+1a9fSvXt3OnXqVGN+27Mva50yWQh6A+l/FSXJsu0m6WJJRZKK1qxZ0yTJmbUnf/jDH+jSpQtDhgwBoLy8nJKSEg477DDeeecdRowYwdVXXw3ASSedRHFxMQsWLGDUqFGVVxoN0ZT7spbTJhqLI+LeiMiPiPycnJyWTsesSfXu3bvKN+mSkhJ69+69zZjy8nI+++wzdt9998r1BQUFlVcDALvvvjtdunTh1FNPBeC0007jnXfeqVy34447AnDRRRfx9ttv15rf7rvvzrp16ygvL98qv+3dl7VOmSwEpUCftPncZJmZpTn44INZunQpK1asoKysjIKCAsaMGVMlZsyYMTz88MMAzJgxg6OPPhpJAGzevJnHH3+8SvuAJE466STmzZsHwIsvvsi++6aa6D766KPKuFmzZjFo0KBa85PEUUcdxYwZMwB4+OGHOfnkkxu0L2udOmVw34XA3pL2JFUAxgNnZfB4Zm1Sp06dmDp1KqNHj6aiooILLriAwYMHM3nyZPLz8xkzZgwXXngh5557Lnl5efTo0YOCgoLK7V999VX69OlD//79q+z3tttu49xzz+WKK64gJyeHBx98EIA777yTWbNm0alTJ3r06MFDDz1Uuc3hhx/O+++/z4YNG8jNzeX+++9n9OjR3HbbbYwfP54bbriBAw44gAsvvLDOfVnboepPHjTpzqUTgJ8DHYEHIuJWSbcARRExS9LBwNPAbsCXwF8jYnBt+8zPz4+ioqIG5bPp+1c1aLv2YoebftrSKZg1iv8NN/zfsKS3IyK/pnWZvCIgImYDs6stm5w2XUjqlpGZZdjIkSMBKm8XmW2R0UJgZl9p6W+zUfxhi+fhq9LWyYXALEv87/mnt3QK1kq1icdHzcwsc1wIzMyynAuBmVmWcyEwM8tyLgRmZlnOhcDMLMu5EFiTaOgIW2vXruWoo46ia9euXHbZZVW2efvtt9lvv/3Iy8vj8ssvr+x//91332XEiBHst99+nHTSSaxfvx6A4uJivva1r1WOlnXJJZdslceYMWMqe+jc4q677mLgwIEMHjyY733ve01xOszaFBcCa7TGjLC100478YMf/ICf/OQnW+134sSJ/PKXv2Tp0qUsXbqUOXPmAKleLqdMmcKf/vQnxo4dy+233165zV577VU5WtY999xTZX9PPfUUXbt2rbLs5ZdfZubMmbz77rssXLiwsqtms2ziQmCN1pgRtnbeeWe++c1vstNOO1WJ/+ijj1i/fj2HHnookviXf/mXylGxlixZwhFHHAHAqFGjKsfirc2GDRu44447uOGGG6os/+///m8mTZpU2ZXyHnvs0aBzYNaWuRBYozXVCFvV43Nzv+qGKn2fgwcPriw0TzzxRJW+/FesWMEBBxzAkUceyWuvvVa5/MYbb+Sqq66iS5cuVY6zZMkSXnvtNQ455BCOPPJICgsLt/ftm7V5LgTW5jzwwAPcfffdHHTQQXz++ed07twZgF69evGXv/yFP/7xj9xxxx2cddZZrF+/nvnz5/Phhx8yduzYrfZVXl7O3/72N958801uv/12Tj/99K3GAjZr79zXkDXa9oywlZubW+MIWzXts6SkpMZ9Dhw4kLlz5wKpb/S//e1vAdhxxx0rb/EcdNBB7LXXXixZsoTCwkKKioro168f5eXlrF69mpEjRzJv3jxyc3M59dRTkcTw4cPp0KEDn3zyCR4Jz7KJrwis0Ro7wlZNevXqRbdu3XjzzTeJCH71q19Vjoq1evVqIDUy1w9/+MPKp4PWrFlDRUUFAMuXL2fp0qX079+fiRMnsmrVKoqLi3n99dcZMGBAZVfMp5xyCi+//DKQKiplZWX07Nmz6U6OWRvgKwJrtMaOsNWvXz/Wr19PWVkZzzzzDHPnzmXffffl7rvv5rzzzuMf//gHxx9/PMcffzwA06dPZ9q0aQCceuqpnH/++UBqpK7Jkyezww470KFDB+655x569OhRa+4XXHABF1xwAUOGDKFz5848/PDDtRYos/YooyOUZYJHKGs49wXfsrL97w8a/zeY7eewTY5QJuk44Bekhqq8LyKmVFu/I/Ar4CBgLXBGRBRnMidruB89taSlU2hR1506oKVTMMuIjLURSOoITAOOB/YFzpS0b7WwC4FPIyIP+BlwW6byMTOzmmWysXg4sCwilkdEGVAAnFwt5mTg4WR6BvAt+QatmVmzylgbgaRxwHERcVEyfy5wSERclhbzXhJTksx/mMR8Um1fFwMXJ7P7AB9kJOnM6wl8UmeUbYvPX+P5HDZOWz5/34iIGp+LbhNPDUXEvcC9LZ1HY0kq2lZjjdXN56/xfA4bp72ev0zeGioF+qTN5ybLaoyR1AnYlVSjsZmZNZNMFoJCYG9Je0rqDIwHZlWLmQVMSKbHAS9FW3ue1cysjcvYraGIKJd0GfA8qcdHH4iIhZJuAYoiYhZwP/CIpGXA30gVi/aszd/eamE+f43nc9g47fL8tbkflJmZWdNyX0NmZlnOhcDMLMu5EFirIulmSR4vshlJmiep3T0S2VCS/jNtul/ye6d2zYXAzKyq/6w7pKrk8fc2y4VgOyTfDhZL+qWkhZLmSvpa+jcqST0lFSfT50l6RtILkoolXSbpPyT9UdKbknokcfMk/ULSfEnvSRouqYOkpZJykpgOkpZtmW9PJF0vaYmk10n9chxJ/yqpUNK7kp6U1CVZ/nVJTyfL35V0WLL8HElvJefw/yV9XbUrmfr7S5yb/veXbD9c0htJ/O8l7dP877rxkvP2vqRHk/M3Q9IJkp5JixmV/F1NAb6WnItHk9Udq5/zZJt5kn4uqQj4rqSTJP0hOV//K+nrSdzNkh5I4pdLury5z0FdXAi2397AtIgYDKwDvlNH/BDgVOBg4Fbgi4g4AHgD+Je0uC4RMQz4v6Qetd0M/Bo4O1l/DPBuRKxpovfRKkg6iNRjw8OAE0idJ4CnIuLgiNgfWEyqg0KAO4FXkuUHAgslDQLOAP45OYcVfHXe2ptm+ftLlr0PHJ7ETwb+q4neQ0vYB7g7IgYB64HBwMC0L1bnk/p3Nwn4R0QMi4gtf0O1nfPOEZEfET8FXgcOTc5XAfC9tLiBwGhSfbDdJGmHjLzLBmrTlzMtZEVEzE+m3wb61RH/ckR8Dnwu6TPgf5LlfwKGpsVNB4iIVyV1k9Sd1D/ImcDPgQuAB5sg/9bmcODpiPgCQNKWHx0OkfRDoDvQldTvUQCOJvkAi4gK4DOl+rE6CChUqs/CrwGrm+sNNLPm/PvbBXhY0t5AAK3qw2s7rYyI3yXTvwYuBx4BzpH0IDCCqoUxXW3n/LG06VzgMUm9gM7AirR1v42IjcBGSauBrwMltBIuBNtvY9p0BakPnXK+urraqZb4zWnzm6l6/qv/oCMiYqWkjyUdTeqbRHv9lluTh4BTIuJdSecBI2uJFfBwRFzXDHm1tGb7+wN+QKqQjJXUD5jX4KxbXk3v70FShfFL4ImIKN/GtjWd8y3+njZ9F3BHRMySNBK4uZZ9tKrPXt8aahrFpL6RQqqrjIY4A0DSN4HPIuKzZPl9pL7BPJF8A25vXgVOSe517wKclCzfBfgouYROL4AvAhMhNeaFpF2TZeMk7ZEs7yHpG832DlpeMZn5+9uVr/oHO68R+bUGfSWNSKbPAl6PiFXAKuAGql5tb2rgrZv08zWhtsDWxoWgafwEmCjpj6S6qW2IL5Pt7+Gr++GQ6o+pK+3zthAR8Q6py+t3gedI9VEFcCPwB+B3pO5Vb/Fd4ChJfyJ1mb5vRCwi9Y95rqQFwAtAr+Z5B61Cpv7+fgz8KFneqr7BNsAHwKWSFgO7Af+dLH+U1G2jxWmx9wIL0hqL6+tm4AlJb9PGuqp2FxOtgKR5wNURsdVgzMnTID+LiMObPTGzdiC5rfVsRAypYd1U4I8RcX+zJ9aKtPUq365JmkTqNkg2tQ2YNYvkm/vfgataOpeW5isCM7Ms5zYCM7Ms50JgZpblXAjMzLKcC4G1OZI2NNF+Rkp6tin21YBj95N01vbGScqXdGdms7Ns40Jg1jL6kfph03bFRURRRLS6TsusbXMhsDYr+Ub/iqSZSa+OUySdrVQvpH+StFcS95CkeyQVKdXL6Yk17GvnpIfIt5LeI09Olte3B9m9JM2R9Lak1yQNTDv2nUr13rlc0pZf/k4BDk96ubwy+eb/mqR3ktdh24irvIpJfkH9jKQFSS5Dk+WtvrdLa2Uiwi+/2tQL2JD8dySp3iB7ATuS+nn/95N13wV+nkw/BMwh9cVnb1Kdfe2UbP9sEvNfwDnJdHdgCbAzqa4VlpHq8iIH+Ay4JIn7GXBFMv0isHcyfQjwUtqxn0iOvS+wLC33Z9PeUxdgp2R6b6BoG3HpOd8F3JRMHw3MT6ZvBn6fnJOewFpgh5b+/+ZX6335B2XW1hVGxEcAkj4E5ibL/wQclRb3eKS69l4qaTmpboHTHQuM0Vejo+0E9E2mX45aevCU1BU4jFT3Alv2t2Pavp9Jjr1ISR/1NdgBmCppGKlOyQbU/db5JkmXyBHxkqTdJXVL1rXq3i6tdXEhsLauMb1rphPwnYj4oMpC6ZB6HKMDsC5S/fnXlaO2EXMl8DGwf7K/L7cRV1+turdLa13cRmDZ4jSlRnnbC+hPqhOydM8D/67kK72kA+q744hYD6yQdFqyrSTtX8dmn5O63bTFrsBHyZXDuUDHbcSle42k+5Gk2+NPklzMtosLgWWLvwBvkerh9JKIqP6N+wekbs8skLQwmd8eZwMXSnoXWAicXEf8AqBCqeE2rwTuBiYk2w/kq37uq8eluxk4KOlxdQptrOtjaz3c15C1e5IeItXAOqOlczFrjXxFYGaW5XxFYGaW5XxFYGaW5VwIzMyynAuBmVmWcyEwM8tyLgRmZlnu/wOb+O/AAHYh6AAAAABJRU5ErkJggg==\n", + "text/plain": [ + "
" + ] + }, + "metadata": { + "needs_background": "light" + }, + "output_type": "display_data" + } + ], + "source": [ + "barplot('3D Heat Diffusion', labels=True)" + ] + }, + { + "cell_type": "markdown", + "id": "c08e4fd2", + "metadata": {}, + "source": [ + "## Benchmarking and Instrumentation API\n", + "\n", + "When optimizing programs in DaCe, it is useful to know the raw time the compiled program takes or any of its components. For this purpose, DaCe includes an instrumentation API, which allows you to time each SDFG, state, map, or tasklet directly from the code.\n", + "\n", + "The instrumentation providers given in DaCe can measure different metrics: wall-clock time, GPU (CUDA/HIP) events, PAPI performance counters, and more (it's extensible).\n", + "\n", + "Performance results are saved as report files in CSV format or the `chrome://tracing` JSON format for easy timeline view." + ] + }, + { + "cell_type": "markdown", + "id": "8cc5a3cd", + "metadata": {}, + "source": [ + "### Profiling API\n", + "First, we demonstrate the profiling API, which is a simple low-level timer that will run every called DaCe program a number of times and print out the median runtime." + ] + }, + { + "cell_type": "code", + "execution_count": 47, + "id": "06675f18", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n", + "Profiling...\n", + "Profiling: 100%|██████████| 100/100 [00:00<00:00, 1106.55it/s]\n", + "DaCe 0.2954999217763543 ms\n" + ] + } + ], + "source": [ + "# Temporarily set the DACE_profiling config to True\n", + "with dace.config.set_temporary('profiling', value=True):\n", + " # You can control the number of times a program is run with the treps configuration\n", + " with dace.config.set_temporary('treps', value=100):\n", + " daceloop(a)" + ] + }, + { + "cell_type": "markdown", + "id": "b624ef29", + "metadata": {}, + "source": [ + "This can also be controlled with environment variables. Setting `DACE_profiling=1` and `DACE_treps=100` achieves the same effect on the entire script.\n", + "\n", + "The report is saved as a CSV file in the `.dacecache//profiling` folder, where `` is the program or SDFG name." + ] + }, + { + "cell_type": "code", + "execution_count": 48, + "id": "e8f7f6b1", + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
ProgramOptimizationProblem_SizeRuntime_sec
0someforloop_0DaCe-f0.006098
1someforloop_0DaCe-f0.000454
2someforloop_0DaCe-f0.000398
3someforloop_0DaCe-f0.000367
4someforloop_0DaCe-f0.000271
5someforloop_0DaCe-f0.000304
6someforloop_0DaCe-f0.000249
7someforloop_0DaCe-f0.004182
8someforloop_0DaCe-f0.000413
9someforloop_0DaCe-f0.000379
\n", + "
" + ], + "text/plain": [ + " Program Optimization Problem_Size Runtime_sec\n", + "0 someforloop_0 DaCe -f 0.006098\n", + "1 someforloop_0 DaCe -f 0.000454\n", + "2 someforloop_0 DaCe -f 0.000398\n", + "3 someforloop_0 DaCe -f 0.000367\n", + "4 someforloop_0 DaCe -f 0.000271\n", + "5 someforloop_0 DaCe -f 0.000304\n", + "6 someforloop_0 DaCe -f 0.000249\n", + "7 someforloop_0 DaCe -f 0.004182\n", + "8 someforloop_0 DaCe -f 0.000413\n", + "9 someforloop_0 DaCe -f 0.000379" + ] + }, + "execution_count": 48, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "import pandas as pd\n", + "\n", + "df = pd.read_csv('.dacecache/someforloop/profiling/results-1644308750891.csv')\n", + "df.head(10)" + ] + }, + { + "cell_type": "markdown", + "id": "68cef56c", + "metadata": {}, + "source": [ + "### Instrumentation API\n", + "\n", + "The Instrumentation API allows more fine-grained control over measuring program metrics. It creates a JSON report in `.dacecache//perf`, which can be obtained with the API or viewed with any Chrome Tracing capable viewer. More usage information and how to use the API to tune programs can be found in the [program tuning sample](https://github.com/spcl/dace/blob/master/samples/optimization/tuning.py)." + ] + }, + { + "cell_type": "code", + "execution_count": 49, + "id": "6ecccc92", + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "\n", + "
\n", + "" + ], + "text/plain": [ + "" + ] + }, + "execution_count": 49, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "@dace.program\n", + "def twomaps(A):\n", + " B = np.sin(A)\n", + " return B * 2.0\n", + "\n", + "sdfg = twomaps.to_sdfg(a)\n", + "sdfg" + ] + }, + { + "cell_type": "markdown", + "id": "91a124d6", + "metadata": {}, + "source": [ + "We will now instrument the each of the maps in the program separately, so see which one is a potential bottleneck:" + ] + }, + { + "cell_type": "code", + "execution_count": 50, + "id": "4dc5531f", + "metadata": {}, + "outputs": [], + "source": [ + "# Get all maps\n", + "maps = [n for n, _ in sdfg.all_nodes_recursive() if isinstance(n, dace.nodes.MapEntry)]\n", + "\n", + "# Instrument with wall-clock timer\n", + "for m in maps:\n", + " m.instrument = dace.InstrumentationType.Timer" + ] + }, + { + "cell_type": "code", + "execution_count": 51, + "id": "429a942d", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "array([[0.57072424, 1.62590182, 0.54045806, ..., 1.42865334, 0.74420338,\n", + " 1.34051505],\n", + " [0.56169953, 0.33241204, 1.18265858, ..., 1.18433834, 0.45687267,\n", + " 0.03173654],\n", + " [0.21026808, 1.38539332, 1.13363577, ..., 1.20282264, 1.26179853,\n", + " 0.94529241],\n", + " ...,\n", + " [0.58080043, 1.38410909, 1.12745291, ..., 1.54076988, 0.73878048,\n", + " 0.76149314],\n", + " [1.34720999, 1.08957421, 0.75846927, ..., 1.01317063, 0.13351551,\n", + " 1.13468273],\n", + " [1.2947957 , 1.0325859 , 1.50298925, ..., 0.56601298, 1.08368357,\n", + " 1.29880744]])" + ] + }, + "execution_count": 51, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# Run SDFG and create report\n", + "sdfg(A)" + ] + }, + { + "cell_type": "code", + "execution_count": 52, + "id": "bc980e7c", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Instrumentation report\n", + "SDFG Hash: 0f02b642249b861dc94b7cbc729190d4b27cab79607b8f28c7de3946e62d5977\n", + "---------------------------------------------------------------------------\n", + "Element Runtime (ms) \n", + " Min Mean Median Max \n", + "---------------------------------------------------------------------------\n", + "SDFG (0) \n", + "|-State (0) \n", + "| |-Node (0) \n", + "| | |Map _numpy_sin__map: \n", + "| | | 11.654 11.654 11.654 11.654 \n", + "| |-Node (5) \n", + "| | |Map _Mult__map: \n", + "| | | 1.524 1.524 1.524 1.524 \n", + "---------------------------------------------------------------------------\n", + "\n" + ] + } + ], + "source": [ + "# Get the latest instrumentation report from .dacecache/twomaps/perf\n", + "report = sdfg.get_latest_report()\n", + "\n", + "# Print report in a nicely readable format\n", + "print(report)" + ] + }, + { + "cell_type": "markdown", + "id": "5e2a0e77", + "metadata": {}, + "source": [ + "As we can see, the `np.sin` statement is more expensive than the multiplication statement." + ] + }, + { + "cell_type": "markdown", + "id": "590a0a2d", + "metadata": {}, + "source": [ + "These reports can also be loaded directly to the Visual Studio code plugin to overlay the information on the graph, as shown below:\n", + "\n", + "![](https://raw.githubusercontent.com/spcl/dace-vscode/master/images/analysis.gif)" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.8.5" + }, + "metadata": { + "interpreter": { + "hash": "ef60a094ca1873cf2e62a8dbe2e76beaf211a154f1b9ff0db0c7157806bcfce0" + } + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/tutorials/performance.png b/tutorials/performance.png new file mode 100644 index 0000000000..aa9546dde1 Binary files /dev/null and b/tutorials/performance.png differ diff --git a/tutorials/transformations.ipynb b/tutorials/transformations.ipynb index 17c39ebda2..13b0711485 100644 --- a/tutorials/transformations.ipynb +++ b/tutorials/transformations.ipynb @@ -539,7 +539,7 @@ "\n", "Selecting nodes (through single click, ctrl-click, or the box select mode at the top pane) will add transformations and matching subgraph transformations to the \"Selection\" pane. History appears at the bottom and saves as part of the SDFG file, which you can then use to revert and apply new transformations. See the example below:\n", "\n", - "![vscode plugin](vscode.gif \"vscode plugin\")" + "![vscode plugin](https://raw.githubusercontent.com/spcl/dace-vscode/master/images/sdfg_optimization.gif \"vscode plugin\")" ] }, { @@ -779,7 +779,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.7.7" + "version": "3.8.5" } }, "nbformat": 4, diff --git a/tutorials/vscode.gif b/tutorials/vscode.gif deleted file mode 100644 index b23c5b5db5..0000000000 Binary files a/tutorials/vscode.gif and /dev/null differ