Merge remote-tracking branch 'upstream/main'

chentong319 · Mar 11, 2024 · 7c88851 · 7c88851
2 parents 42d0890 + 18f4e07
commit 7c88851
Show file tree

Hide file tree

Showing 96 changed files with 1,090 additions and 1,495 deletions.
diff --git a/.gitignore b/.gitignore
@@ -187,3 +187,4 @@ dmypy.json
 # Visual Studio Code Files
 .vscode
 .devcontainer
+.vs
diff --git a/docs/DevicePlacement-NNPA.md b/docs/DevicePlacement-NNPA.md
@@ -8,7 +8,7 @@ Device placement is how the compiler place one operation on CPU or NNPA.
 
 There are two ways to know which device an operation is placed on:
 - Using `onnx-mlir --EmitONNXIR --maccel=NNPA model.onnx`, or
-- Using `onnx-mlir --save-device-placement-file=cfg.json model.onnx`.
+- Using `onnx-mlir --nnpa-save-device-placement-file=cfg.json model.onnx`.
 
 1. Using `--EmitONNXIR --maccel=NNPA`
 
@@ -25,7 +25,7 @@ Below is an example of the output of `--EmitONNXIR --maccel=NNPA`:
 %3 = "onnx.Sigmoid"(%2) {device="nnpa", onnx_node_name = "Sigmoid_0"} : (tensor<?x?x?xf32>) -> tensor<?x?x?xf32>
 ```
 
-2. Using `--save-device-placement-file=cfg.json`
+2. Using `--nnpa-save-device-placement-file=cfg.json`
 
 The option is to save the device placement configuration into a JSON file. This option is convenient when users don't want to interrupt the compilation.
 
@@ -63,15 +63,15 @@ Below is one example of a JSON file:
 
 ## Set device placement manually.
 
-We allow users to force one opeartion to run on a specific device. However, at this moment, only placing on CPU is guaranted to be successful done. It means that even when `device=NNPA` is specified, it is not guaranted that the operation will run on NNPA. 
+We allow users to force one operation to run on a specific device. However, at this moment, only placing on CPU is guaranted to be successful done. It means that even when `device=NNPA` is specified, it is not guaranted that the operation will run on NNPA. 
 
 There are two ways to change device of an operation:
 - by editing the output of `--EmitONNXIR --maccel=NNPA` directly and compile again.
-- by passing a JSON file for device placement to the compiler by using `--load-device-placement-file=json`.
+- by passing a JSON file for device placement to the compiler by using `--nnpa-load-device-placement-file=json`.
 
 For the former option, it is straighforward, just changing the value of the `device` attribute of an operation, for example, changing `device=nnpa` to `device=cpu`.
 
-For the later option, users can obtain a template file from `--save-device-placement-file`, and use it as the starting point of modification.
+For the later option, users can obtain a template file from `--nnpa-save-device-placement-file`, and use it as the starting point of modification.
 We use C++ std::regex_match function to match operations based on `node_type` and `onnx_node_name`. Both `node_type` and `onnx_node_name` must be satisfied.
 The JSON file will contain a list of records for each operation matching. The order of the records does matter. If one operation matches a record and is set device, it will not be set device again even when it matches the later records in the list. If one operation does not match a record but matches a later record, the operation is still set device by the later record. In other words, the device of an operation is set by the first matched record.
 
@@ -161,7 +161,7 @@ func.func @test_load_config_file_all_on_cpu(%arg0: tensor<?x?x?xf32>) -> tensor<
       "onnx_node_name": "Sigmoid_0"
     },
     {
-      "device": "nnpa",
+      "device": "cpu",
       "node_type": "onnx.Relu",
       "onnx_node_name": "Relu_(1|2)"
     }

diff --git a/docs/Dialects/krnl.md b/docs/Dialects/krnl.md
@@ -313,37 +313,6 @@ intend to optimize.
 | :----: | ----------- |
 &laquo;unnamed&raquo; | variadic of any type
 
-### `krnl.dim` (KrnlDimOp)
-
-_Krnl dimensions operation._
-
-Emits the dimension of a MemRef independent of the MemRef alloc:
-
-```
-"krnl.dim"(%memref, %index)
-```
-
-The index identifies the dimension within the shape which is going to be emitted.
-Initially the krnl.dim operation depends on the alloc of the MemRef.
-Unlike the std.dim operation which maintains a dependency on the alloc of the MemRef, the dimension emitted by krnl.dim will not depend on the alloc operation of the MemRef once the krnl.dim operation is lowered.
-
-Any changes to the original MemRef size after the krnl.dim has been lowered will not be picked up by the emitted dimension. This allows the original MemRef to be safely modified via code transformations or affine map normalization without the risk of changing the value already emitted via krnl.dim.
-
-Traits: MemRefsNormalizable
-
-#### Operands:
-
-| Operand | Description |
-| :-----: | ----------- |
-| `alloc` | memref of any type values
-| `index` | index
-
-#### Results:
-
-| Result | Description |
-| :----: | ----------- |
-| `dimension` | index
-
 ### `krnl.entry_point` (KrnlEntryPointOp)
 
 _Indicate ONNX entry point_
@@ -429,34 +398,6 @@ current tile being iterated over.
 | :----: | ----------- |
 | `ind_var_vals` | variadic of any type
 
-### `krnl.getref` (KrnlGetRefOp)
-
-_Krnl a MemRef from within another MemRef starting at a specific offset._
-
-    Retrieves a MemRef from within another MemRef:
-
-```
-    "krnl.getref"(%memref, %offset)
-```
-    The offset is an integer which is used as an index into the input MemRef. It works
-    just like an array index.
-
-Traits: MemRefsNormalizable
-
-#### Operands:
-
-| Operand | Description |
-| :-----: | ----------- |
-| `mempool` | memref of any type values
-| `offset` | integer
-| `value` | variadic of index
-
-#### Results:
-
-| Result | Description |
-| :----: | ----------- |
-| `output` | memref of any type values
-
 ### `krnl.global` (KrnlGlobalOp)
 
 _Krnl global operation_
@@ -917,6 +858,28 @@ are nested imperfectly between an "eager" and a "lazy" loop.
 
 Traits: SingleBlock, SingleBlockImplicitTerminator<KrnlTerminatorOp>
 
+### `krnl.noValue` (KrnlNoneOp)
+
+_An operation representing the absence of a value._
+
+This operation can be used to represent the absence of a value. It is
+typically used as an argument to operators that have optional parameters,
+and converted into nullptr while krnl to llvm lowering.
+Typically it is used for optional arguments used in KrnlCallop.
+
+#### Attributes:
+
+<table>
+<tr><th>Attribute</th><th>MLIR Type</th><th>Description</th></tr>
+<tr><td><code>value</code></td><td>::mlir::UnitAttr</td><td>unit attribute</td></tr>
+</table>
+
+#### Results:
+
+| Result | Description |
+| :----: | ----------- |
+| `none_val` | none type
+
 ### `krnl.parallel` (KrnlParallelOp)
 
 _Mark Krnl loops as parallel loops_
@@ -1212,30 +1175,6 @@ Traits: MemRefsNormalizable
 | `seq` | memref of any type values
 | `index` | index
 
-### `krnl.shape` (KrnlShapeOp)
-
-_Krnl operation to retrieve the shape of a MemRef._
-
-Extracts the shape of a MemRef:
-```
-  "krnl.shape"(%memref)
-```
-The return result is of `shape.type`.
-
-Traits: MemRefsNormalizable
-
-#### Operands:
-
-| Operand | Description |
-| :-----: | ----------- |
-| `alloc` | memref of any type values
-
-#### Results:
-
-| Result | Description |
-| :----: | ----------- |
-| `shape` | memref of any type values
-
 ### `krnl.specialized_kernel` (KrnlSpecializedKernel)
 
 _Krnl specialized kernel op_

diff --git a/docs/Dialects/onnx.md b/docs/Dialects/onnx.md
@@ -3883,7 +3883,7 @@ Effects: MemoryEffects::Effect{}
 
 | Operand | Description |
 | :-----: | ----------- |
-| `X` | tensor of 32-bit float values or tensor of 64-bit float values
+| `X` | tensor of bfloat16 type values or tensor of 16-bit float values or tensor of 32-bit float values or tensor of 64-bit float values or tensor of f8E4M3FN type values or tensor of f8E4M3FNUZ type values or tensor of f8E5M2 type values or tensor of f8E5M2FNUZ type values
 
 #### Results:
 
@@ -3907,7 +3907,7 @@ Effects: MemoryEffects::Effect{}
 
 | Operand | Description |
 | :-----: | ----------- |
-| `X` | tensor of 16-bit float values or tensor of 32-bit float values or tensor of 64-bit float values or tensor of bfloat16 type values
+| `X` | tensor of bfloat16 type values or tensor of 16-bit float values or tensor of 32-bit float values or tensor of 64-bit float values or tensor of f8E4M3FN type values or tensor of f8E4M3FNUZ type values or tensor of f8E5M2 type values or tensor of f8E5M2FNUZ type values
 
 #### Results:
 

diff --git a/docs/SupportedONNXOps-NNPA.md b/docs/SupportedONNXOps-NNPA.md
@@ -3,11 +3,11 @@
 
 # Supported ONNX Operation for Target *NNPA*.
 
-Onnx-mlir currently supports ONNX operations targeting up to opset 19. Limitations are listed when applicable. This documentation highlights the minimum and maximum opset versions that are fully supported by onnx-mlir and not the version changes.
+Onnx-mlir currently supports ONNX operations targeting up to opset 20. Limitations are listed when applicable. This documentation highlights the minimum and maximum opset versions that are fully supported by onnx-mlir and not the version changes.
 
 * Operations are defined by the [ONNX Standard](https://github.com/onnx/onnx/blob/main/docs/Operators.md).
-* **Supported Opsets** indicates the lowest and highest opset a model may have for onnx-mlir to support compiling a model with the operator. 
-   * A * indicates onnx-mlir is compatible with the latest version of that operator available as of opset 19.
+* **Supported Opsets** indicates the lowest and highest opset a model may have for onnx-mlir to support compiling a model with the operator.
+   * A * indicates onnx-mlir is compatible with the latest version of that operator available as of opset 20.
 
 
 NNPA has hardware limitations in dimension index size and tensor size, which are described in [NNPALimit.h](../src/Accelerators/NNPA/Support/NNPALimit.h). They are large enough for normal use cases, but if your model exceeds the limitations, CPU is used instead of NNPA.

diff --git a/docs/SupportedONNXOps-cpu.md b/docs/SupportedONNXOps-cpu.md
@@ -6,7 +6,7 @@
 Onnx-mlir currently supports ONNX operations targeting up to opset 20. Limitations are listed when applicable. This documentation highlights the minimum and maximum opset versions that are fully supported by onnx-mlir and not the version changes.
 
 * Operations are defined by the [ONNX Standard](https://github.com/onnx/onnx/blob/main/docs/Operators.md).
-* **Supported Opsets** indicates the lowest and highest opset a model may have for onnx-mlir to support compiling a model with the operator. 
+* **Supported Opsets** indicates the lowest and highest opset a model may have for onnx-mlir to support compiling a model with the operator.
    * A * indicates onnx-mlir is compatible with the latest version of that operator available as of opset 20.
 
 
@@ -36,7 +36,7 @@ Onnx-mlir currently supports ONNX operations targeting up to opset 20. Limitatio
 | **BitwiseOr** |18 - * | | |
 | **BitwiseXor** |18 - * | | |
 | **BlackmanWindow** |none | | | |
-| **Cast** |6 - 18 |Cast only between float and double types. Only ppc64le and MacOS platforms support float16. | |
+| **Cast** |6 - * |Cast only between float and double types. Only ppc64le and MacOS platforms support float16. | |
 | **CastLike** |19 - * |CastLike only between float and double types. Only ppc64le and MacOS platforms support float16. | |
 | **CastMap** |none | | | |
 | **CategoryMapper** |none | | | |
@@ -48,15 +48,15 @@ Onnx-mlir currently supports ONNX operations targeting up to opset 20. Limitatio
 | **Compress** |9 - * | | |
 | **Concat** |6 - * | | |
 | **ConcatFromSequence** |none | | | |
-| **Constant** |6 - 18 | | |
-| **ConstantOfShape** |9 - * | | |
+| **Constant** |6 - * | | |
+| **ConstantOfShape** |9 - 19 | | |
 | **Conv** |6 - * | | |
 | **ConvInteger** |none | | | |
 | **ConvTranspose** |6 - * |Unknown dimension in spatial dimensions (such as H and W) not supported. | |
 | **Cos** |7 - * | | |
 | **Cosh** |9 - * | | |
 | **CumSum** |11 - * | | |
-| **DFT** |none | | | |
+| **DFT** |17 - 19 | | |
 | **DeformConv** |none | | | |
 | **DepthToSpace** |13 - * | | |
 | **DequantizeLinear** |10 - * |Only support for per-tensor or layer dequantization. No support for per-axis dequantization. | |
@@ -67,7 +67,7 @@ Onnx-mlir currently supports ONNX operations targeting up to opset 20. Limitatio
 | **DynamicQuantizeLinear** |11 - * | | |
 | **Einsum** |12 - * |Limited to the types supported by ReduceSum and MatMul (which we decompose to in most cases) which exclude integers with width < 32. | |
 | **Elu** |6 - * | | |
-| **Equal** |7 - 18 | | |
+| **Equal** |7 - * | | |
 | **Erf** |9 - * | | |
 | **Exp** |6 - * | | |
 | **Expand** |8 - * | | |
@@ -98,8 +98,8 @@ Onnx-mlir currently supports ONNX operations targeting up to opset 20. Limitatio
 | **If** |16 - * |Sequence and Optional outputs are not supported. | |
 | **Imputer** |none | | | |
 | **InstanceNormalization** |6 - * | | |
-| **IsInf** |10 - * | | |
-| **IsNaN** |9 - * | | |
+| **IsInf** |20 - * |Currently no support for float16 infinity value. Only for float32 and float64. | |
+| **IsNaN** |20 - * | | |
 | **LRN** |6 - * | | |
 | **LSTM** |7 - * | | |
 | **LabelEncoder** |none | | | |
@@ -142,11 +142,11 @@ Onnx-mlir currently supports ONNX operations targeting up to opset 20. Limitatio
 | **OptionalHasElement** |none | | | |
 | **Or** |7 - * | | |
 | **PRelu** |6 - * | | |
-| **Pad** |6 - 18 |axes input not supported. | |
+| **Pad** |6 - * |axes input not supported. | |
 | **Pow** |7 - * |No support for power with integer types. | |
 | **QLinearConv** |none | | | |
 | **QLinearMatMul** |none | | | |
-| **QuantizeLinear** |10 - 18 |Do not support per-axis and i8 quantization. | |
+| **QuantizeLinear** |10 - * |Do not support per-axis and i8 quantization. | |
 | **RNN** |7 - * | | |
 | **RandomNormal** |none | | | |
 | **RandomNormalLike** |none | | | |
@@ -158,15 +158,15 @@ Onnx-mlir currently supports ONNX operations targeting up to opset 20. Limitatio
 | **ReduceL2** |13 - * |do_not_keep_dim not supported. | |
 | **ReduceLogSum** |13 - * |do_not_keep_dim not supported. | |
 | **ReduceLogSumExp** |13 - * |do_not_keep_dim not supported. | |
-| **ReduceMax** |6 - * |do_not_keep_dim not supported. | |
+| **ReduceMax** |6 - 19 |do_not_keep_dim not supported. | |
 | **ReduceMean** |6 - * |do_not_keep_dim not supported. | |
-| **ReduceMin** |6 - * |do_not_keep_dim not supported. | |
+| **ReduceMin** |6 - 19 |do_not_keep_dim not supported. | |
 | **ReduceProd** |13 - * |do_not_keep_dim not supported. | |
 | **ReduceSum** |6 - * |Default axis and do_not_keep_dim not supported. |Default axis and do_not_keep_dim temporarily removed due to changes in onnx 1.8.1. |
 | **ReduceSumSquare** |13 - * |Default axis and do_not_keep_dim not supported. | |
 | **Relu** |6 - * | | |
 | **Reshape** |6 - * |allowzero not supported. | |
-| **Resize** |10 - 18 |Missing support for linear, cubic, crop, pytorch_half_pixel, and floor. Attributes antialias, axes and keep_aspect_ratio_policy are not supported. | |
+| **Resize** |10 - * |Missing support for linear, cubic, crop, pytorch_half_pixel, and floor. Attributes antialias, axes and keep_aspect_ratio_policy are not supported. | |
 | **ReverseSequence** |10 - * | | |
 | **RoiAlign** |none | | | |
 | **Round** |11 - * | | |
@@ -193,13 +193,13 @@ Onnx-mlir currently supports ONNX operations targeting up to opset 20. Limitatio
 | **Sin** |7 - * | | |
 | **Sinh** |9 - * | | |
 | **Size** |13 - * | | |
-| **Slice** |13 - 18 |Axis must be a constant argument. |Add tests to slices, currently have none. |
+| **Slice** |13 - * |Axis must be a constant argument. |Add tests to slices, currently have none. |
 | **Softmax** |6 - * | | |
 | **SoftmaxCrossEntropyLoss** |none | | | |
 | **Softplus** |6 - * | | |
 | **Softsign** |6 - * | | |
 | **SpaceToDepth** |13 - * | |Example works, the other is imprecise. To investigate. |
-| **Split** |6 - 18 |Does not support static and dynamic shape, zero size splits. |Temporally removed due to changes in onnx 1.8.1. |
+| **Split** |6 - * |Does not support static and dynamic shape, zero size splits. |Temporally removed due to changes in onnx 1.8.1. |
 | **SplitToSequence** |none | | | |
 | **Sqrt** |6 - * | | |
 | **Squeeze** |6 - * |Does not support static and dynamic shape. |Temporally removed due to changes in onnx 1.8.1. |

diff --git a/docs/mnist_example/requirements.txt b/docs/mnist_example/requirements.txt
@@ -1,4 +1,4 @@
 numpy~=1.22.2
-pillow~=10.0.1
+pillow~=10.2.0
 torch~=2.0.0
 torchvision~=0.15.1
diff --git a/src/Accelerators/NNPA/Compiler/NNPACompilerOptions.cpp b/src/Accelerators/NNPA/Compiler/NNPACompilerOptions.cpp
@@ -54,7 +54,8 @@ llvm::cl::opt<std::string> nnpaLoadDevicePlacementFile{
     llvm::cl::desc(
         "Load device placement configuration from a JSON file. To "
         "have a template for the JSON file, use "
-        "-save-device-placement-file=cfg.json. Note that we can use regex for "
+        "--nnpa-save-device-placement-file=cfg.json. Note that we can use "
+        "regex for "
         "string values in the JSON file to match operations. The compiler uses "
         "C++ std::regex_match function for matching."),
     llvm::cl::init(""), llvm::cl::cat(OnnxMlirOptions)};

diff --git a/src/Accelerators/NNPA/Runtime/CMakeLists.txt b/src/Accelerators/NNPA/Runtime/CMakeLists.txt
@@ -21,6 +21,6 @@ set_target_properties(RuntimeNNPA
   PROPERTIES
   LANGUAGE C
   POSITION_INDEPENDENT_CODE TRUE
-  COMPILE_OPTIONS "-O3;-fopenmp"
+  COMPILE_OPTIONS "-O3"
   )
-Original file line number
+Diff line change
@@ Expand Up / @@ -187,3 +187,4 @@ dmypy.json @@
     # Visual Studio Code Files
     .vscode
     .devcontainer
+    .vs