From c5742c6e43f77eeb873d9a104c9aa362f9496f45 Mon Sep 17 00:00:00 2001 From: Arne Symons Date: Mon, 25 Nov 2024 13:05:22 +0100 Subject: [PATCH] update getting started --- docs/source/getting-started.rst | 66 ++++----------------------------- 1 file changed, 8 insertions(+), 58 deletions(-) diff --git a/docs/source/getting-started.rst b/docs/source/getting-started.rst index 9134764b..6011b9b6 100644 --- a/docs/source/getting-started.rst +++ b/docs/source/getting-started.rst @@ -2,66 +2,16 @@ Getting Started =============== -Stream allows you to run a design space exploration for both, traditional layer-by-layer processing as well as layer-fused processing of DNN workloads. The framework can be used to explore the performace of a workload on multi-core and single-core architectures. +Tutorial +-------- -In a first run, we are going to run ResNet-18 on quad-core architecture similar to a TPU like hardware [1]. We provide an `onnx `_ model of this network in ``stream/inputs/examples/workload/resnet18.onnx`` and the HW architecture in ``stream/inputs/examples/hardware/TPU_like_quad_core.py``. +The recommended way to get started with Stream is through the tutorial labs. You can find them in the `tutorial` branch of the repository. You should start with lab1 `here `_. -The onnx model has been shape inferred, which means that besied the input and output tensor shapes, all intermediate tensor shapes have been inferred, which is information required by Stream. +Manual run +---------- -.. warning:: - ZigZag requires an inferred onnx model, as it needs to know the shapes of all intermediate tensors to correctly infer the layer shapes. You can find more information on how to infer an onnx model `here `_. +You can also run the main script directly. This will run a genetic algorithm on the inputs defined in the main file. -Besides the workload and HW architecture, a mapping file must be provided which, as the name suggests, provides information about which layer can be mapped to which core in the hardware architecture. The mapping is provided in ``stream/inputs/examples/mapping/tpu_like_quad_core.py``. +.. code-block:: bash -The framework is generally ran through a main file which parses the provided inputs and contains the program flow through the stages defined in the main file. - -.. note:: - - You can find more information in the :doc:`stages` document. - -Layer-by-layer processing of workload -===================================== - -Now, we would like to run the previously introduced workload in a layer-by-layer fashion, which means that one layer is exectued at once on a certain core and the next layer can only start as soon as all previous layers are completely done. - -For this we have to exectue - -.. code:: sh - - python main_stream.py - -which parses the given workload, hw architecture and the corresponding mapping. Stream will now evaluate how efficently the workload can be executed on the given hardware with a layer-by-layer approach. - -Layer-fused processing of workload -================================== - -In a second run, we would like to run the same workload on the same HW with the same mapping. The difference will be that a layer-fused approach is used instead of a layer-by-layer approach. - -For this we have to execute - -.. code:: sh - - python main_stream_layer_splitting.py - -which starts another run of Stream. Now the given inputs are processed in a layer-fused approach which means that each layer is split in several smaller parts. - -Analyzing results -================= - -During the run of each experiement, Streams saves the results in the ``outputs`` folder based on the paths provided in the ``main_stream.py`` and ``main_stream_layer_splitting.py`` files. In this folder, there will be four ``.png`` files. Two of them show the schedule of workload's layer on the different cores of the hw architecture (one file for the layer-by-layer approach and one file for the layer-fused approach). Besides this, the other two ``.png`` files show the memory utilization of the different cores in the system for the two different experiements. More explanation about the results can be found on the :doc:`outputs` page. - -[1] N. P. Jouppi, C. Young, N. Patil, D. Patterson, G. Agrawal, R. Bajwa, -S. Bates, S. Bhatia, N. Boden, A. Borchers, R. Boyle, P.-l. Cantin, -C. Chao, C. Clark, J. Coriell, M. Daley, M. Dau, J. Dean, B. Gelb, T. V. -Ghaemmaghami, R. Gottipati, W. Gulland, R. Hagmann, C. R. Ho, -D. Hogberg, J. Hu, R. Hundt, D. Hurt, J. Ibarz, A. Jaffey, A. Jaworski, -A. Kaplan, H. Khaitan, D. Killebrew, A. Koch, N. Kumar, S. Lacy, -J. Laudon, J. Law, D. Le, C. Leary, Z. Liu, K. Lucke, A. Lundin, -G. MacKean, A. Maggiore, M. Mahony, K. Miller, R. Nagarajan, -R. Narayanaswami, R. Ni, K. Nix, T. Norrie, M. Omernick, -N. Penukonda, A. Phelps, J. Ross, M. Ross, A. Salek, E. Samadiani, -C. Severn, G. Sizikov, M. Snelham, J. Souter, D. Steinberg, A. Swing, -M. Tan, G. Thorson, B. Tian, H. Toma, E. Tuttle, V. Vasudevan, -R. Walter, W. Wang, E. Wilcox, and D. H. Yoon, β€œIn-datacenter -performance analysis of a tensor processing unit,” SIGARCH Comput. -Archit. News, vol. 45, no. 2, p. 1–12, jun 2017. \ No newline at end of file + $ python main_stream_ga.py