Prepare for release, merge main (AMDResearch#7)

* Fix link to Ryzen AI webpage (AMDResearch#31) * extra symbols to wsl path resolution (AMDResearch#10) * Latest version of numpy causing issues, constraining it to v1.* (AMDResearch#35) * Latest version of numpy causing issues, constraining it to v1.* * readability fix * Enable Kernel fusion (AMDResearch#34) * Include library of kernels in the compilation * Fix typos * Include kernels header file * Prepare kernels for superkernel * Include test name * Initial superkernel test * Simplyfy test * Add second superkernel * Rename test to better reflect the nature of PR * Rename test to better reflect the nature of PR * Align code * Fix call * More fixes * Remove fake kernels * Kernel fusion working!!!! * Fix duplicate variable name * Add another test * Remove unnecesary flag * Remove * revert to original * Revert to original * Revert to original * Revert to original * Include kernels * Copy kernel * Import shutil * Allow to load the same Riallto app up to 4 times (AMDResearch#36) * Allow multiple instances of the same app to be run * Try to fix test * Add more tests * Flake8 * More appropiate test name * Make sure AppRunner has device, device is not found is previous call had an space issue * Made code more pythonic * Handle AppRunner object even if it fails * Flake8 * Report app start column (AMDResearch#37) * Allow multiple instances of the same app to be run * Try to fix test * Add more tests * Flake8 * More appropiate test name * Make sure AppRunner has device, device is not found is previous call had an space issue * Made code more pythonic * Handle AppRunner object even if it fails * Flake8 * Try to add start column * Add starting column * Flake8 * Update check * Fix where assert is done * Fix issue with counting * Check that xbutil count is working * Do not run xbutil test for Linux * Add missing os import * Skip test in Linux --------- Co-authored-by: Sarunas Kalade <[email protected]> Co-authored-by: Shane Fleming <[email protected]>
skalade · Jun 26, 2024 · 78b0c7e · 78b0c7e
1 parent 0069c6b
commit 78b0c7e
Show file tree

Hide file tree

Showing 40 changed files with 538 additions and 287 deletions.
diff --git a/notebooks/1_1_ryzenai.ipynb b/notebooks/1_1_ryzenai.ipynb
@@ -16,7 +16,7 @@
     "\n",
     "## References\n",
     "\n",
-    "**[AMD Ryzen™ AI - Windows Laptops with AI Built In](https://www.amd.com/en/products/ryzen-ai)**\n",
+    "**[AMD Ryzen™ AI - Windows Laptops with AI Built In](https://www.amd.com/en/products/processors/consumer/ryzen-ai.html)**\n",
     "\n",
     "**[AMD Ryzen™ AI: What It Is, and How It Will Change What Your Laptop Can Do](https://webinar.amd.com/loRg4PP11pg9ZLp1O0pU/en)**\n",
     "\n",
@@ -134,7 +134,7 @@
    "source": [
     "## Ryzen AI NPU\n",
     "\n",
-    "The [Ryzen 7000 desktop and laptop chips](https://www.amd.com/en/processors/ryzen) were introduced in 2023. Alongside the main x86 CPU, Ryzen 7000 has a new type of coprocessor, a *Neural Processing Unit* (NPU), based on the XDNA™ AI Engine architecture. This new NPU is called [Ryzen AI](https://www.amd.com/en/products/ryzen-ai).\n",
+    "The [Ryzen 7000 desktop and laptop chips](https://www.amd.com/en/processors/ryzen) were introduced in 2023. Alongside the main x86 CPU, Ryzen 7000 has a new type of coprocessor, a *Neural Processing Unit* (NPU), based on the XDNA™ AI Engine architecture. This new NPU is called [Ryzen AI](https://www.amd.com/en/products/processors/consumer/ryzen-ai.html).\n",
     "\n",
     "<center><img src=\"./images/png/ryzen_ai_labels.png\"  style=\"max-width:40%\"></center>\n",
     "<center><strong>Ryzen 7040 'Phoenix' mobile processor</strong></center>\n",

diff --git a/notebooks/3_1_Color_threshold_example.ipynb b/notebooks/3_1_Color_threshold_example.ipynb
@@ -62,7 +62,7 @@
     "\n",
     "## System Architecture\n",
     "\n",
-    "These examples assume you are using a laptop with a [Ryzen 7040 \"Phoenix\" APU with the Ryzen AI NPU](https://www.amd.com/en/products/ryzen-ai) and an integrated webcam,  or compatible hardware. The typical architecture of the system for the examples is shown below."
+    "These examples assume you are using a laptop with a [Ryzen 7040 \"Phoenix\" APU with the Ryzen AI NPU](https://www.amd.com/en/products/processors/consumer/ryzen-ai.html) and an integrated webcam,  or compatible hardware. The typical architecture of the system for the examples is shown below."
    ]
   },
   {

diff --git a/npu/__init__.py b/npu/__init__.py
@@ -26,15 +26,15 @@
 """
 
 from .utils.test_device import get_driver_version, version_to_tuple
-from sys import platform
+import platform
 
 __supported_driver__ = "10.1109.8.100"
 
-if not platform == 'linux':
+if platform.system() == 'Windows':
     __installed_driver__ = get_driver_version()
 
     if version_to_tuple(__installed_driver__) < version_to_tuple(__supported_driver__):
-        raise ValueError(f"""Detected driver: {__installed_driver__}, supported driver version is >={__supported_driver__}, 
+        raise ValueError(f"""Detected driver: {__installed_driver__}, supported driver version is >={__supported_driver__},
                   go to https://riallto.ai/prerequisites-driver.html for driver setup instructions.""")
 
 from .repr_dict import ReprDict

diff --git a/npu/build/appbuilder.py b/npu/build/appbuilder.py
@@ -9,7 +9,7 @@
 from .appxclbinbuilder import AppXclbinBuilder
 from .utils import check_wsl_install
 from typing import Optional
-from sys import platform
+import platform
 import json
 
 class AppBuilder:
@@ -42,7 +42,7 @@ class AppBuilder:
     def __init__(self, name=None) -> None:
         """Return a new AppBuilder object."""
 
-        if not platform == 'linux': 
+        if platform.system() == 'Windows':
             check_wsl_install()
 
         self.name = type(self).__name__ if name is None else name
@@ -69,27 +69,27 @@ def callgraph(self):
     def to_metadata(self, *args):
         """ The application is converted into the AppMetadata after tracing the callgraph() call."""
         self.previous_build_args = args
-        self.kernels, self.connections = self.fxtracer.to_trace(*args)     
+        self.kernels, self.connections = self.fxtracer.to_trace(*args)
 
         return AppMetada(self.name,
-                         self.unique_named(self.kernels), 
+                         self.unique_named(self.kernels),
                          self.unique_named(self.connections),
                          self.to_sequence())
 
     def to_handoff(self, *args, file=None):
         """ Converts the application into a serializable JSON file."""
-        self.previous_build_args = args        
+        self.previous_build_args = args
         with open(file, 'w') as f:
             json.dump(self.to_json(*args), f, default = lambda o: '<not serialisable>')
 
     def to_json(self, *args):
         """ Converts the application into JSON."""
         self.previous_build_args = args
         return self.to_metadata(*args).to_json()
-    
+
     @property
     def metadata(self, *args):
-        """ Generates the application JSON and displays inside a IPython environment."""                
+        """ Generates the application JSON and displays inside a IPython environment."""
         from npu import ReprDict
         self.validate_previous_build_args()
         return ReprDict(self.to_json(*self.previous_build_args), rootname=self.name)
@@ -106,7 +106,7 @@ def to_sequence(self):
 
     def display(self)->None:
         """ Generates the application SVG and displays inside a IPython environment."""
-        from npu.utils.appviz import AppViz        
+        from npu.utils.appviz import AppViz
         self.validate_previous_build_args()
         _viz = AppViz(self.to_json(*self.previous_build_args))
         _viz.show
@@ -133,15 +133,15 @@ def build(self, *args, debug=False, mlir:Optional[str]=None):
             self.ab.build(self.name, f"{self.name}.mlir", self.kernels, debug)
         else:
             self.ab.build(self.name, mlir, self.kernels, debug)
-            
+
     def __add__(self, app_component):
         if isinstance(app_component, Connection):
             self.merge_applications(app_component.kernels, [app_component])
             return self
-        
+
         if isinstance(app_component, AppBuilder):
             self.merge_applications(app_component.kernels, app_component.connections)
-            return self    
+            return self
 
         raise TypeError(f"{app_component} of type {type(app_component)} is not supported")
 
@@ -152,7 +152,7 @@ def validate_previous_build_args(self):
 
     def merge_applications(self, newkernels, newconnections):
         self.connections.extend(newconnections)
-        self.kernels.extend(newkernels)        
+        self.kernels.extend(newkernels)
 
     def unique_named(self, objs):
         unique_objs = list(set(objs))
@@ -162,4 +162,4 @@ def unique_named(self, objs):
 
         unique_objs_byname_list.sort(key= lambda x : x.name)
 
-        return unique_objs_byname_list 
+        return unique_objs_byname_list
diff --git a/npu/build/build_template/kernel_build.sh b/npu/build/build_template/kernel_build.sh
@@ -8,6 +8,6 @@ cd $SCRIPT_DIR
 
 source /opt/mlir_settings.sh
 
-xchesscc $CHESSCC2_FLAGS -c $1.cc -o $1.o #2>&1 | tee $1.log
+xchesscc $CHESSCC2_FLAGS -I kernels -c $1.cc -o $1.o #2>&1 | tee $1.log
 
 echo "Successfully built $1.o"
diff --git a/npu/build/kernelbuilder.py b/npu/build/kernelbuilder.py
@@ -7,6 +7,10 @@
 from .utils import wsl_prefix
 import hashlib
 import glob
+import shutil
+
+
+KERNELS_DIR = os.path.join(os.path.dirname(os.path.abspath(__file__)), '..', 'lib', 'kernels', 'cpp')
 
 class KernelObjectBuilder(WSLBuilder):
     """This class builds ComputeTile kernel C/C++ into object files for linking into applications.
@@ -63,6 +67,7 @@ def build(self, debug=False):
 
             with open(os.path.join(self.build_path, f"{self.name}.cc"), "w") as fp:
                 fp.write(self.srccode)
+            shutil.copytree(KERNELS_DIR, self.build_path + '/kernels/')
 
             if self.srcfile is not None or self.getheaders:
                 for extension in ['*.h', '*.hh', '*.hpp', '*.hxx', '*.h++']:

diff --git a/npu/build/mlirsequencebuilder.py b/npu/build/mlirsequencebuilder.py
@@ -286,7 +286,7 @@ def _generate_rtps(self, indent='')->str:
         return s
 
     def _to_seq_portsig(self)->str:
-        """ Generates the portsignature for the sequence func.func call in the generated MLIR."""
+        """ Generates the port signature for the sequence func.func call in the generated MLIR."""
         s = ''
         for i,ub in enumerate(self._ingress_egress_ub.values()):
             s += f"%{ub.ubname} : memref<{self._generate_ub_memref(ub)}>"

diff --git a/npu/build/utils.py b/npu/build/utils.py
@@ -1,16 +1,15 @@
 # Copyright (C) 2023 Advanced Micro Devices, Inc. All rights reserved.
 # SPDX-License-Identifier: MIT
 
-import os
-import re
+import platform
 import subprocess
 
 def is_win()->bool:
-    """ Returns true if we are running this on windows."""
-    return os.name == "nt"
+    """ Returns true if we are running this on Windows."""
+    return platform.system() == 'Windows'
 
 def is_win_path(path:str)->bool:
-    """ Returns true if the path above is a windows path """
+    """ Returns true if the path above is a Windows path """
     newpath = path.split('\\')
     return newpath[0].endswith(':')
 

diff --git a/npu/lib/kernels/cpp/addWeighted.cpp b/npu/lib/kernels/cpp/addWeighted.cpp
@@ -6,13 +6,13 @@
 #include <stdlib.h>
 #include <aie_api/aie.hpp>
 
-const int32_t SRS_SHIFT = 14;
+const int32_t SHIFT = 14;
 
 template <typename T, int N, int MAX>
 void addweighted_aie(const T* in_buffer1, const T*  in_buffer2, T* out_buffer,
                         const uint32_t nbytes,
                         const int16_t alphaFixedPoint, const int16_t betaFixedPoint, const T gamma) {
-    
+
     ::aie::set_saturation(aie::saturation_mode::saturate); // Needed to saturate properly to uint8
 
     ::aie::vector<int16_t, N> coeff(alphaFixedPoint, betaFixedPoint);
@@ -31,9 +31,9 @@ void addweighted_aie(const T* in_buffer1, const T*  in_buffer2, T* out_buffer,
             in_buffer2 += N;
             ::aie::accum<acc32, N> acc = ::aie::accumulate<N>(
                 gamma_acc, coeff, 0, data_buf1, data_buf2); // weight[0] * data_buf1 + weight[1] * data_buf2
-            ::aie::store_v(out_buffer, acc.template to_vector<T>(SRS_SHIFT));
+            ::aie::store_v(out_buffer, acc.template to_vector<T>(SHIFT));
             out_buffer += N;
-        }        
+        }
 }
 
 extern "C" {

diff --git a/npu/lib/kernels/cpp/bitwiseOr.cpp b/npu/lib/kernels/cpp/bitwiseOr.cpp
@@ -31,4 +31,4 @@ extern "C" {
     void bitwiseOr(uint8_t *in_buffer1, uint8_t *in_buffer2, uint8_t *out_buffer, int32_t nbytes) {
         bitwiseOR_aie<uint8_t, 64>(in_buffer1, in_buffer2, out_buffer, nbytes);
     }
-} 
+}
diff --git a/npu/lib/kernels/cpp/filter2d.h b/npu/lib/kernels/cpp/filter2d.h
@@ -3,7 +3,7 @@
 
 #include <aie_api/aie.hpp>
 
-const int32_t SRS_SHIFT = 12;
+const int32_t ACC_SHIFT = 12;
 
 #define KERNEL_WIDTH 3
 
@@ -64,7 +64,7 @@ void filter2d_3lines_aie(uint8_t *lineIn0, uint8_t *lineIn1, uint8_t *lineIn2, u
     acc       = mul_ops::mac(acc, kernel_vec, 2*Points, data_buf3, 0);
 
     // Store result
-    ::aie::store_v(out_buffer, acc.to_vector<uint8>(SRS_SHIFT-8)); out_buffer+=VecFactor;
+    ::aie::store_v(out_buffer, acc.to_vector<uint8>(ACC_SHIFT-8)); out_buffer+=VecFactor;
 
     // middle of line, no border extension needed
     for (int i = 2*VecFactor; i < nbytes-1; i+=VecFactor) {
@@ -90,7 +90,7 @@ void filter2d_3lines_aie(uint8_t *lineIn0, uint8_t *lineIn1, uint8_t *lineIn2, u
         acc       = mul_ops::mac(acc, kernel_vec, 2*Points, data_buf3, 0);
 
         // Store result
-        ::aie::store_v(out_buffer, acc.to_vector<uint8>(SRS_SHIFT-8)); out_buffer+=VecFactor;
+        ::aie::store_v(out_buffer, acc.to_vector<uint8>(ACC_SHIFT-8)); out_buffer+=VecFactor;
     }
 
     // right of line, border extension by mirroring
@@ -113,6 +113,6 @@ void filter2d_3lines_aie(uint8_t *lineIn0, uint8_t *lineIn1, uint8_t *lineIn2, u
     acc       = mul_ops::mac(acc, kernel_vec, 2*Points, data_buf3, 0);
 
     // Store result
-    ::aie::store_v(out_buffer, acc.to_vector<uint8>(SRS_SHIFT-8)); out_buffer+=VecFactor;
+    ::aie::store_v(out_buffer, acc.to_vector<uint8>(ACC_SHIFT-8)); out_buffer+=VecFactor;
 }
 
diff --git a/npu/lib/kernels/cpp/filter2d_1080p.cpp b/npu/lib/kernels/cpp/filter2d_1080p.cpp
@@ -4,12 +4,10 @@
 #include "linebuffer.h"
 #include <aie_api/aie.hpp>
 
-extern "C" {
-
-void filter2d_1080p(uint8_t *in_buffer, uint8_t *out_buffer,
-	      int16_t coeff_0_0, int16_t coeff_0_1, int16_t coeff_0_2,
-	      int16_t coeff_1_0, int16_t coeff_1_1, int16_t coeff_1_2,
-	      int16_t coeff_2_0, int16_t coeff_2_1, int16_t coeff_2_2) {
+void filter2d_1080p_aie(uint8_t *in_buffer, uint8_t *out_buffer,
+        int16_t coeff_0_0, int16_t coeff_0_1, int16_t coeff_0_2,
+        int16_t coeff_1_0, int16_t coeff_1_1, int16_t coeff_1_2,
+        int16_t coeff_2_0, int16_t coeff_2_1, int16_t coeff_2_2) {
 
      int16_t filter[3][3];
      filter[0][0] = coeff_0_0;
@@ -25,7 +23,18 @@ void filter2d_1080p(uint8_t *in_buffer, uint8_t *out_buffer,
 
      linebuffer_t lb = linebuffer<1920>(in_buffer, 1079);
      filter2d_3lines_aie(lb.line0, lb.line1, lb.line2, out_buffer, 1920, filter_ptr);
+}
+
+extern "C" {
+
+void filter2d_1080p(uint8_t *in_buffer, uint8_t *out_buffer,
+        int16_t coeff_0_0, int16_t coeff_0_1, int16_t coeff_0_2,
+        int16_t coeff_1_0, int16_t coeff_1_1, int16_t coeff_1_2,
+        int16_t coeff_2_0, int16_t coeff_2_1, int16_t coeff_2_2) {
+
+     filter2d_1080p_aie(in_buffer, out_buffer, coeff_0_0, coeff_0_1, coeff_0_2,
+                        coeff_1_0, coeff_1_1, coeff_1_2, coeff_2_0, coeff_2_1,
+                        coeff_2_2);
   }
 
 }
-
diff --git a/npu/lib/kernels/cpp/filter2d_1080p_scalar.cpp b/npu/lib/kernels/cpp/filter2d_1080p_scalar.cpp
@@ -4,12 +4,10 @@
 #include "linebuffer.h"
 #include <aie_api/aie.hpp>
 
-extern "C" {
-
-void filter2d_1080p(uint8_t *in_buffer, uint8_t *out_buffer,
-	      int16_t coeff_0_0, int16_t coeff_0_1, int16_t coeff_0_2,
-	      int16_t coeff_1_0, int16_t coeff_1_1, int16_t coeff_1_2,
-	      int16_t coeff_2_0, int16_t coeff_2_1, int16_t coeff_2_2) {
+void filter2d_1080p_aie_scalar(uint8_t *in_buffer, uint8_t *out_buffer,
+        int16_t coeff_0_0, int16_t coeff_0_1, int16_t coeff_0_2,
+        int16_t coeff_1_0, int16_t coeff_1_1, int16_t coeff_1_2,
+        int16_t coeff_2_0, int16_t coeff_2_1, int16_t coeff_2_2) {
 
      int16_t filter[3][3];
      filter[0][0] = coeff_0_0;
@@ -27,5 +25,16 @@ void filter2d_1080p(uint8_t *in_buffer, uint8_t *out_buffer,
      filter2d_3lines_aie_scalar(lb.line0, lb.line1, lb.line2, out_buffer, 1920, filter_ptr);
   }
 
-}
+extern "C" {
 
+void filter2d_1080p(uint8_t *in_buffer, uint8_t *out_buffer,
+        int16_t coeff_0_0, int16_t coeff_0_1, int16_t coeff_0_2,
+        int16_t coeff_1_0, int16_t coeff_1_1, int16_t coeff_1_2,
+        int16_t coeff_2_0, int16_t coeff_2_1, int16_t coeff_2_2) {
+
+     filter2d_1080p_aie_scalar(in_buffer, out_buffer, coeff_0_0, coeff_0_1,
+                               coeff_0_2, coeff_1_0, coeff_1_1, coeff_1_2,
+                               coeff_2_0, coeff_2_1, coeff_2_2);
+  }
+
+}
diff --git a/npu/lib/kernels/cpp/filter2d_720p.cpp b/npu/lib/kernels/cpp/filter2d_720p.cpp
@@ -4,12 +4,10 @@
 #include "linebuffer.h"
 #include <aie_api/aie.hpp>
 
-extern "C" {
-
-void filter2d_720p(uint8_t *in_buffer, uint8_t *out_buffer,
-	      int16_t coeff_0_0, int16_t coeff_0_1, int16_t coeff_0_2,
-	      int16_t coeff_1_0, int16_t coeff_1_1, int16_t coeff_1_2,
-	      int16_t coeff_2_0, int16_t coeff_2_1, int16_t coeff_2_2) {
+void filter2d_720p_aie(uint8_t *in_buffer, uint8_t *out_buffer,
+        int16_t coeff_0_0, int16_t coeff_0_1, int16_t coeff_0_2,
+        int16_t coeff_1_0, int16_t coeff_1_1, int16_t coeff_1_2,
+        int16_t coeff_2_0, int16_t coeff_2_1, int16_t coeff_2_2) {
 
      int16_t filter[3][3];
      filter[0][0] = coeff_0_0;
@@ -27,5 +25,15 @@ void filter2d_720p(uint8_t *in_buffer, uint8_t *out_buffer,
      filter2d_3lines_aie(lb.line0, lb.line1, lb.line2, out_buffer, 1280, filter_ptr);
   }
 
-}
+extern "C" {
+void filter2d_720p(uint8_t *in_buffer, uint8_t *out_buffer,
+        int16_t coeff_0_0, int16_t coeff_0_1, int16_t coeff_0_2,
+        int16_t coeff_1_0, int16_t coeff_1_1, int16_t coeff_1_2,
+        int16_t coeff_2_0, int16_t coeff_2_1, int16_t coeff_2_2) {
 
+     filter2d_720p_aie(in_buffer, out_buffer, coeff_0_0, coeff_0_1, coeff_0_2,
+                       coeff_1_0, coeff_1_1, coeff_1_2, coeff_2_0, coeff_2_1,
+                       coeff_2_2);
+   }
+
+}
-Original file line number
+Diff line change
@@ Expand Up / @@ -62,7 +62,7 @@ @@
         "\n",
         "## System Architecture\n",
         "\n",
-        "These examples assume you are using a laptop with a [Ryzen 7040 \"Phoenix\" APU with the Ryzen AI NPU](https://www.amd.com/en/products/ryzen-ai) and an integrated webcam,  or compatible hardware. The typical architecture of the system for the examples is shown below."
+        "These examples assume you are using a laptop with a [Ryzen 7040 \"Phoenix\" APU with the Ryzen AI NPU](https://www.amd.com/en/products/processors/consumer/ryzen-ai.html) and an integrated webcam,  or compatible hardware. The typical architecture of the system for the examples is shown below."
        ]
       },
       {
@@ Expand Down @@