Skip to content

Commit

Permalink
Prepare for release, merge main (AMDResearch#7)
Browse files Browse the repository at this point in the history
* Fix link to Ryzen AI webpage (AMDResearch#31)

* extra symbols to wsl path resolution (AMDResearch#10)

* Latest version of numpy causing issues, constraining it to v1.* (AMDResearch#35)

* Latest version of numpy causing issues, constraining it to v1.*

* readability fix

* Enable Kernel fusion (AMDResearch#34)

* Include library of kernels in the compilation

* Fix typos

* Include kernels header file

* Prepare kernels for superkernel

* Include test name

* Initial superkernel test

* Simplyfy test

* Add second superkernel

* Rename test to better reflect the nature of PR

* Rename test to better reflect the nature of PR

* Align code

* Fix call

* More fixes

* Remove fake kernels

* Kernel fusion working!!!!

* Fix duplicate variable name

* Add another test

* Remove unnecesary flag

* Remove

* revert to original

* Revert to original

* Revert to original

* Revert to original

* Include kernels

* Copy kernel

* Import shutil

* Allow to load the same Riallto app up to 4 times (AMDResearch#36)

* Allow multiple instances of the same app to be run

* Try to fix test

* Add more tests

* Flake8

* More appropiate test name

* Make sure AppRunner has device, device is not found is previous call had an space issue

* Made code more pythonic

* Handle AppRunner object even if it fails

* Flake8

* Report app start column (AMDResearch#37)

* Allow multiple instances of the same app to be run

* Try to fix test

* Add more tests

* Flake8

* More appropiate test name

* Make sure AppRunner has device, device is not found is previous call had an space issue

* Made code more pythonic

* Handle AppRunner object even if it fails

* Flake8

* Try to add start column

* Add starting column

* Flake8

* Update check

* Fix where assert is done

* Fix issue with counting

* Check that xbutil count is working

* Do not run xbutil test for Linux

* Add missing os import

* Skip test in Linux

---------

Co-authored-by: Sarunas Kalade <[email protected]>
Co-authored-by: Shane Fleming <[email protected]>
  • Loading branch information
3 people authored Jun 26, 2024
1 parent 0069c6b commit 78b0c7e
Show file tree
Hide file tree
Showing 40 changed files with 538 additions and 287 deletions.
4 changes: 2 additions & 2 deletions notebooks/1_1_ryzenai.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@
"\n",
"## References\n",
"\n",
"**[AMD Ryzen™ AI - Windows Laptops with AI Built In](https://www.amd.com/en/products/ryzen-ai)**\n",
"**[AMD Ryzen™ AI - Windows Laptops with AI Built In](https://www.amd.com/en/products/processors/consumer/ryzen-ai.html)**\n",
"\n",
"**[AMD Ryzen™ AI: What It Is, and How It Will Change What Your Laptop Can Do](https://webinar.amd.com/loRg4PP11pg9ZLp1O0pU/en)**\n",
"\n",
Expand Down Expand Up @@ -134,7 +134,7 @@
"source": [
"## Ryzen AI NPU\n",
"\n",
"The [Ryzen 7000 desktop and laptop chips](https://www.amd.com/en/processors/ryzen) were introduced in 2023. Alongside the main x86 CPU, Ryzen 7000 has a new type of coprocessor, a *Neural Processing Unit* (NPU), based on the XDNA™ AI Engine architecture. This new NPU is called [Ryzen AI](https://www.amd.com/en/products/ryzen-ai).\n",
"The [Ryzen 7000 desktop and laptop chips](https://www.amd.com/en/processors/ryzen) were introduced in 2023. Alongside the main x86 CPU, Ryzen 7000 has a new type of coprocessor, a *Neural Processing Unit* (NPU), based on the XDNA™ AI Engine architecture. This new NPU is called [Ryzen AI](https://www.amd.com/en/products/processors/consumer/ryzen-ai.html).\n",
"\n",
"<center><img src=\"./images/png/ryzen_ai_labels.png\" style=\"max-width:40%\"></center>\n",
"<center><strong>Ryzen 7040 'Phoenix' mobile processor</strong></center>\n",
Expand Down
2 changes: 1 addition & 1 deletion notebooks/3_1_Color_threshold_example.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -62,7 +62,7 @@
"\n",
"## System Architecture\n",
"\n",
"These examples assume you are using a laptop with a [Ryzen 7040 \"Phoenix\" APU with the Ryzen AI NPU](https://www.amd.com/en/products/ryzen-ai) and an integrated webcam, or compatible hardware. The typical architecture of the system for the examples is shown below."
"These examples assume you are using a laptop with a [Ryzen 7040 \"Phoenix\" APU with the Ryzen AI NPU](https://www.amd.com/en/products/processors/consumer/ryzen-ai.html) and an integrated webcam, or compatible hardware. The typical architecture of the system for the examples is shown below."
]
},
{
Expand Down
6 changes: 3 additions & 3 deletions npu/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -26,15 +26,15 @@
"""

from .utils.test_device import get_driver_version, version_to_tuple
from sys import platform
import platform

__supported_driver__ = "10.1109.8.100"

if not platform == 'linux':
if platform.system() == 'Windows':
__installed_driver__ = get_driver_version()

if version_to_tuple(__installed_driver__) < version_to_tuple(__supported_driver__):
raise ValueError(f"""Detected driver: {__installed_driver__}, supported driver version is >={__supported_driver__},
raise ValueError(f"""Detected driver: {__installed_driver__}, supported driver version is >={__supported_driver__},
go to https://riallto.ai/prerequisites-driver.html for driver setup instructions.""")

from .repr_dict import ReprDict
Expand Down
26 changes: 13 additions & 13 deletions npu/build/appbuilder.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@
from .appxclbinbuilder import AppXclbinBuilder
from .utils import check_wsl_install
from typing import Optional
from sys import platform
import platform
import json

class AppBuilder:
Expand Down Expand Up @@ -42,7 +42,7 @@ class AppBuilder:
def __init__(self, name=None) -> None:
"""Return a new AppBuilder object."""

if not platform == 'linux':
if platform.system() == 'Windows':
check_wsl_install()

self.name = type(self).__name__ if name is None else name
Expand All @@ -69,27 +69,27 @@ def callgraph(self):
def to_metadata(self, *args):
""" The application is converted into the AppMetadata after tracing the callgraph() call."""
self.previous_build_args = args
self.kernels, self.connections = self.fxtracer.to_trace(*args)
self.kernels, self.connections = self.fxtracer.to_trace(*args)

return AppMetada(self.name,
self.unique_named(self.kernels),
self.unique_named(self.kernels),
self.unique_named(self.connections),
self.to_sequence())

def to_handoff(self, *args, file=None):
""" Converts the application into a serializable JSON file."""
self.previous_build_args = args
self.previous_build_args = args
with open(file, 'w') as f:
json.dump(self.to_json(*args), f, default = lambda o: '<not serialisable>')

def to_json(self, *args):
""" Converts the application into JSON."""
self.previous_build_args = args
return self.to_metadata(*args).to_json()

@property
def metadata(self, *args):
""" Generates the application JSON and displays inside a IPython environment."""
""" Generates the application JSON and displays inside a IPython environment."""
from npu import ReprDict
self.validate_previous_build_args()
return ReprDict(self.to_json(*self.previous_build_args), rootname=self.name)
Expand All @@ -106,7 +106,7 @@ def to_sequence(self):

def display(self)->None:
""" Generates the application SVG and displays inside a IPython environment."""
from npu.utils.appviz import AppViz
from npu.utils.appviz import AppViz
self.validate_previous_build_args()
_viz = AppViz(self.to_json(*self.previous_build_args))
_viz.show
Expand All @@ -133,15 +133,15 @@ def build(self, *args, debug=False, mlir:Optional[str]=None):
self.ab.build(self.name, f"{self.name}.mlir", self.kernels, debug)
else:
self.ab.build(self.name, mlir, self.kernels, debug)

def __add__(self, app_component):
if isinstance(app_component, Connection):
self.merge_applications(app_component.kernels, [app_component])
return self

if isinstance(app_component, AppBuilder):
self.merge_applications(app_component.kernels, app_component.connections)
return self
return self

raise TypeError(f"{app_component} of type {type(app_component)} is not supported")

Expand All @@ -152,7 +152,7 @@ def validate_previous_build_args(self):

def merge_applications(self, newkernels, newconnections):
self.connections.extend(newconnections)
self.kernels.extend(newkernels)
self.kernels.extend(newkernels)

def unique_named(self, objs):
unique_objs = list(set(objs))
Expand All @@ -162,4 +162,4 @@ def unique_named(self, objs):

unique_objs_byname_list.sort(key= lambda x : x.name)

return unique_objs_byname_list
return unique_objs_byname_list
2 changes: 1 addition & 1 deletion npu/build/build_template/kernel_build.sh
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,6 @@ cd $SCRIPT_DIR

source /opt/mlir_settings.sh

xchesscc $CHESSCC2_FLAGS -c $1.cc -o $1.o #2>&1 | tee $1.log
xchesscc $CHESSCC2_FLAGS -I kernels -c $1.cc -o $1.o #2>&1 | tee $1.log

echo "Successfully built $1.o"
5 changes: 5 additions & 0 deletions npu/build/kernelbuilder.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,10 @@
from .utils import wsl_prefix
import hashlib
import glob
import shutil


KERNELS_DIR = os.path.join(os.path.dirname(os.path.abspath(__file__)), '..', 'lib', 'kernels', 'cpp')

class KernelObjectBuilder(WSLBuilder):
"""This class builds ComputeTile kernel C/C++ into object files for linking into applications.
Expand Down Expand Up @@ -63,6 +67,7 @@ def build(self, debug=False):

with open(os.path.join(self.build_path, f"{self.name}.cc"), "w") as fp:
fp.write(self.srccode)
shutil.copytree(KERNELS_DIR, self.build_path + '/kernels/')

if self.srcfile is not None or self.getheaders:
for extension in ['*.h', '*.hh', '*.hpp', '*.hxx', '*.h++']:
Expand Down
2 changes: 1 addition & 1 deletion npu/build/mlirsequencebuilder.py
Original file line number Diff line number Diff line change
Expand Up @@ -286,7 +286,7 @@ def _generate_rtps(self, indent='')->str:
return s

def _to_seq_portsig(self)->str:
""" Generates the portsignature for the sequence func.func call in the generated MLIR."""
""" Generates the port signature for the sequence func.func call in the generated MLIR."""
s = ''
for i,ub in enumerate(self._ingress_egress_ub.values()):
s += f"%{ub.ubname} : memref<{self._generate_ub_memref(ub)}>"
Expand Down
9 changes: 4 additions & 5 deletions npu/build/utils.py
Original file line number Diff line number Diff line change
@@ -1,16 +1,15 @@
# Copyright (C) 2023 Advanced Micro Devices, Inc. All rights reserved.
# SPDX-License-Identifier: MIT

import os
import re
import platform
import subprocess

def is_win()->bool:
""" Returns true if we are running this on windows."""
return os.name == "nt"
""" Returns true if we are running this on Windows."""
return platform.system() == 'Windows'

def is_win_path(path:str)->bool:
""" Returns true if the path above is a windows path """
""" Returns true if the path above is a Windows path """
newpath = path.split('\\')
return newpath[0].endswith(':')

Expand Down
8 changes: 4 additions & 4 deletions npu/lib/kernels/cpp/addWeighted.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -6,13 +6,13 @@
#include <stdlib.h>
#include <aie_api/aie.hpp>

const int32_t SRS_SHIFT = 14;
const int32_t SHIFT = 14;

template <typename T, int N, int MAX>
void addweighted_aie(const T* in_buffer1, const T* in_buffer2, T* out_buffer,
const uint32_t nbytes,
const int16_t alphaFixedPoint, const int16_t betaFixedPoint, const T gamma) {

::aie::set_saturation(aie::saturation_mode::saturate); // Needed to saturate properly to uint8

::aie::vector<int16_t, N> coeff(alphaFixedPoint, betaFixedPoint);
Expand All @@ -31,9 +31,9 @@ void addweighted_aie(const T* in_buffer1, const T* in_buffer2, T* out_buffer,
in_buffer2 += N;
::aie::accum<acc32, N> acc = ::aie::accumulate<N>(
gamma_acc, coeff, 0, data_buf1, data_buf2); // weight[0] * data_buf1 + weight[1] * data_buf2
::aie::store_v(out_buffer, acc.template to_vector<T>(SRS_SHIFT));
::aie::store_v(out_buffer, acc.template to_vector<T>(SHIFT));
out_buffer += N;
}
}
}

extern "C" {
Expand Down
2 changes: 1 addition & 1 deletion npu/lib/kernels/cpp/bitwiseOr.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -31,4 +31,4 @@ extern "C" {
void bitwiseOr(uint8_t *in_buffer1, uint8_t *in_buffer2, uint8_t *out_buffer, int32_t nbytes) {
bitwiseOR_aie<uint8_t, 64>(in_buffer1, in_buffer2, out_buffer, nbytes);
}
}
}
8 changes: 4 additions & 4 deletions npu/lib/kernels/cpp/filter2d.h
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@

#include <aie_api/aie.hpp>

const int32_t SRS_SHIFT = 12;
const int32_t ACC_SHIFT = 12;

#define KERNEL_WIDTH 3

Expand Down Expand Up @@ -64,7 +64,7 @@ void filter2d_3lines_aie(uint8_t *lineIn0, uint8_t *lineIn1, uint8_t *lineIn2, u
acc = mul_ops::mac(acc, kernel_vec, 2*Points, data_buf3, 0);

// Store result
::aie::store_v(out_buffer, acc.to_vector<uint8>(SRS_SHIFT-8)); out_buffer+=VecFactor;
::aie::store_v(out_buffer, acc.to_vector<uint8>(ACC_SHIFT-8)); out_buffer+=VecFactor;

// middle of line, no border extension needed
for (int i = 2*VecFactor; i < nbytes-1; i+=VecFactor) {
Expand All @@ -90,7 +90,7 @@ void filter2d_3lines_aie(uint8_t *lineIn0, uint8_t *lineIn1, uint8_t *lineIn2, u
acc = mul_ops::mac(acc, kernel_vec, 2*Points, data_buf3, 0);

// Store result
::aie::store_v(out_buffer, acc.to_vector<uint8>(SRS_SHIFT-8)); out_buffer+=VecFactor;
::aie::store_v(out_buffer, acc.to_vector<uint8>(ACC_SHIFT-8)); out_buffer+=VecFactor;
}

// right of line, border extension by mirroring
Expand All @@ -113,6 +113,6 @@ void filter2d_3lines_aie(uint8_t *lineIn0, uint8_t *lineIn1, uint8_t *lineIn2, u
acc = mul_ops::mac(acc, kernel_vec, 2*Points, data_buf3, 0);

// Store result
::aie::store_v(out_buffer, acc.to_vector<uint8>(SRS_SHIFT-8)); out_buffer+=VecFactor;
::aie::store_v(out_buffer, acc.to_vector<uint8>(ACC_SHIFT-8)); out_buffer+=VecFactor;
}

23 changes: 16 additions & 7 deletions npu/lib/kernels/cpp/filter2d_1080p.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -4,12 +4,10 @@
#include "linebuffer.h"
#include <aie_api/aie.hpp>

extern "C" {

void filter2d_1080p(uint8_t *in_buffer, uint8_t *out_buffer,
int16_t coeff_0_0, int16_t coeff_0_1, int16_t coeff_0_2,
int16_t coeff_1_0, int16_t coeff_1_1, int16_t coeff_1_2,
int16_t coeff_2_0, int16_t coeff_2_1, int16_t coeff_2_2) {
void filter2d_1080p_aie(uint8_t *in_buffer, uint8_t *out_buffer,
int16_t coeff_0_0, int16_t coeff_0_1, int16_t coeff_0_2,
int16_t coeff_1_0, int16_t coeff_1_1, int16_t coeff_1_2,
int16_t coeff_2_0, int16_t coeff_2_1, int16_t coeff_2_2) {

int16_t filter[3][3];
filter[0][0] = coeff_0_0;
Expand All @@ -25,7 +23,18 @@ void filter2d_1080p(uint8_t *in_buffer, uint8_t *out_buffer,

linebuffer_t lb = linebuffer<1920>(in_buffer, 1079);
filter2d_3lines_aie(lb.line0, lb.line1, lb.line2, out_buffer, 1920, filter_ptr);
}

extern "C" {

void filter2d_1080p(uint8_t *in_buffer, uint8_t *out_buffer,
int16_t coeff_0_0, int16_t coeff_0_1, int16_t coeff_0_2,
int16_t coeff_1_0, int16_t coeff_1_1, int16_t coeff_1_2,
int16_t coeff_2_0, int16_t coeff_2_1, int16_t coeff_2_2) {

filter2d_1080p_aie(in_buffer, out_buffer, coeff_0_0, coeff_0_1, coeff_0_2,
coeff_1_0, coeff_1_1, coeff_1_2, coeff_2_0, coeff_2_1,
coeff_2_2);
}

}

23 changes: 16 additions & 7 deletions npu/lib/kernels/cpp/filter2d_1080p_scalar.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -4,12 +4,10 @@
#include "linebuffer.h"
#include <aie_api/aie.hpp>

extern "C" {

void filter2d_1080p(uint8_t *in_buffer, uint8_t *out_buffer,
int16_t coeff_0_0, int16_t coeff_0_1, int16_t coeff_0_2,
int16_t coeff_1_0, int16_t coeff_1_1, int16_t coeff_1_2,
int16_t coeff_2_0, int16_t coeff_2_1, int16_t coeff_2_2) {
void filter2d_1080p_aie_scalar(uint8_t *in_buffer, uint8_t *out_buffer,
int16_t coeff_0_0, int16_t coeff_0_1, int16_t coeff_0_2,
int16_t coeff_1_0, int16_t coeff_1_1, int16_t coeff_1_2,
int16_t coeff_2_0, int16_t coeff_2_1, int16_t coeff_2_2) {

int16_t filter[3][3];
filter[0][0] = coeff_0_0;
Expand All @@ -27,5 +25,16 @@ void filter2d_1080p(uint8_t *in_buffer, uint8_t *out_buffer,
filter2d_3lines_aie_scalar(lb.line0, lb.line1, lb.line2, out_buffer, 1920, filter_ptr);
}

}
extern "C" {

void filter2d_1080p(uint8_t *in_buffer, uint8_t *out_buffer,
int16_t coeff_0_0, int16_t coeff_0_1, int16_t coeff_0_2,
int16_t coeff_1_0, int16_t coeff_1_1, int16_t coeff_1_2,
int16_t coeff_2_0, int16_t coeff_2_1, int16_t coeff_2_2) {

filter2d_1080p_aie_scalar(in_buffer, out_buffer, coeff_0_0, coeff_0_1,
coeff_0_2, coeff_1_0, coeff_1_1, coeff_1_2,
coeff_2_0, coeff_2_1, coeff_2_2);
}

}
22 changes: 15 additions & 7 deletions npu/lib/kernels/cpp/filter2d_720p.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -4,12 +4,10 @@
#include "linebuffer.h"
#include <aie_api/aie.hpp>

extern "C" {

void filter2d_720p(uint8_t *in_buffer, uint8_t *out_buffer,
int16_t coeff_0_0, int16_t coeff_0_1, int16_t coeff_0_2,
int16_t coeff_1_0, int16_t coeff_1_1, int16_t coeff_1_2,
int16_t coeff_2_0, int16_t coeff_2_1, int16_t coeff_2_2) {
void filter2d_720p_aie(uint8_t *in_buffer, uint8_t *out_buffer,
int16_t coeff_0_0, int16_t coeff_0_1, int16_t coeff_0_2,
int16_t coeff_1_0, int16_t coeff_1_1, int16_t coeff_1_2,
int16_t coeff_2_0, int16_t coeff_2_1, int16_t coeff_2_2) {

int16_t filter[3][3];
filter[0][0] = coeff_0_0;
Expand All @@ -27,5 +25,15 @@ void filter2d_720p(uint8_t *in_buffer, uint8_t *out_buffer,
filter2d_3lines_aie(lb.line0, lb.line1, lb.line2, out_buffer, 1280, filter_ptr);
}

}
extern "C" {
void filter2d_720p(uint8_t *in_buffer, uint8_t *out_buffer,
int16_t coeff_0_0, int16_t coeff_0_1, int16_t coeff_0_2,
int16_t coeff_1_0, int16_t coeff_1_1, int16_t coeff_1_2,
int16_t coeff_2_0, int16_t coeff_2_1, int16_t coeff_2_2) {

filter2d_720p_aie(in_buffer, out_buffer, coeff_0_0, coeff_0_1, coeff_0_2,
coeff_1_0, coeff_1_1, coeff_1_2, coeff_2_0, coeff_2_1,
coeff_2_2);
}

}
Loading

0 comments on commit 78b0c7e

Please sign in to comment.