[ blas/neon ] Add NEON fp16 function for sdot #26

s-debadri · 2023-08-08T11:21:26Z

Commits to be reviewd in this PR

Add NEON fp16 function for sdot

Enable neon sdot function for Android (ARM) fp16 computation.
Add unit test for fp16 sdot function in Android(ARM).

Signed-off-by:s-debadri [email protected]

This PR enables the tensor type in model property as "tensor_type=NHWC" or "tensor_type=NCHW". This information goes to network_grap and layer node & manager. Then, each layer can get the model tensor type information and it can be used to request tensor or just using temporal tensor. Resolves: **Self evaluation:** 1. Build test: [X]Passed [ ]Failed [ ]Skipped 2. Run test: [X]Passed [ ]Failed [ ]Skipped Signed-off-by: jijoong.moon <[email protected]>

This PR enables the Mixed Precision computation. - Add the data_type property in Tensor : FP16, FP32 - Memory_Data only handle void * - In Tensor, there were several member function with template : getAddress<float>() , getData<__fp16>, etc. - Need to implement Blas Interface function **Self evaluation:** 1. Build test: [X]Passed [ ]Failed [ ]Skipped 2. Run test: [X]Passed [ ]Failed [ ]Skipped Signed-off-by: jijoong.moon <[email protected]>

This PR enables the gtest for Android. Especially half precision test. **Self evaluation:** 1. Build test: [X]Passed [ ]Failed [ ]Skipped 2. Run test: [X]Passed [ ]Failed [ ]Skipped Signed-off-by: jijoong.moon <[email protected]>

Modification for Mixed Tensor Data Type **Self evaluation:** 1. Build test: [X]Passed [ ]Failed [ ]Skipped 2. Run test: [X]Passed [ ]Failed [ ]Skipped Signed-off-by: jijoong.moon <[email protected]>

Add Gtest codes **Self evaluation:** 1. Build test: [X]Passed [ ]Failed [ ]Skipped 2. Run test: [X]Passed [ ]Failed [ ]Skipped Signed-off-by: jijoong.moon <[email protected]>

This PR includes changes of Tensor and TensorDim to support NHWC computation for dot, add_strided, multiply_strided, cat, split, and transpose. It also includes unittests to evaluate. **Self evaluation:** 1. Build test: [X]Passed [ ]Failed [ ]Skipped 2. Run test: [X]Passed [ ]Failed [ ]Skipped Signed-off-by: Adwaith Anand <[email protected]> Signed-off-by: Manohara HK <[email protected]> Signed-off-by: jijoong.moon <[email protected]>

This PR enables the tensor type and tensor format in model property as "tensor_format=NHWC" or "tensor_type=FP16". This information goes to network_grap and layer node & manager. Then, each layer can get the model tensor type information and it can be used to request tensor or just using temporal tensor. Resolves: **Self evaluation:** 1. Build test: [X]Passed [ ]Failed [ ]Skipped 2. Run test: [X]Passed [ ]Failed [ ]Skipped Signed-off-by: jijoong.moon <[email protected]>

* add if-elseif code block to each Tensor member function * (trivial) fix trivial typos * TODO: check for missed functions Signed-off-by: skykongkong8 <[email protected]>

* add if-elsif code block to each Tensor member function * fix trivial missed functions Signed-off-by: skykongkong8 <[email protected]>

* Add __fp16 support with #ifdef, and parameter overloading * (trivial) fix typo * TODO: replace with valid __fp16 supporting functions Signed-off-by: skykongkong8 <[email protected]>

- add tensortype to avoid error in initialization Signed-off-by: Donghyeon Jeong <[email protected]>

- replace Tformat & Tdatatype with TensorType - include missing Tdatatype Signed-off-by: Donghyeon Jeong <[email protected]>

- add Tdatatype to avoid error - default datda type is FP32 - Tformat & Tdatatype is used to create TensorType Signed-off-by: Donghyeon Jeong <[email protected]>

- static_cast<__fp16> is needed to avoid narrowing conversion error Signed-off-by: Donghyeon Jeong <[email protected]>

* Uncomment __fp16 testcases, then verify & debug * fix missing functions or varibles in tensor and blas_interface * TODO: do the last, fix setDist function, find erf function Signed-off-by: skykongkong8 <[email protected]>

Signed-off-by: Donghyeon Jeong <[email protected]>

- Previously memory access to tensor data was incorrect - Change to direct access to data with index instead of calculating the index Signed-off-by: Donghyeon Jeong <[email protected]>

This PR enables the FP32 unittest cases. It includes various fixes and adding compiler preprocessor pragmas. **Self evaluation:** 1. Build test: [X]Passed [ ]Failed [ ]Skipped 2. Run test: [X]Passed [ ]Failed [ ]Skipped Signed-off-by: jijoong.moon <[email protected]>

- Match FP16 types to avoid greater conversion rank error - Replace deprecated functions in gcc-13 - Add apply function for FP16 in Tensor Signed-off-by: Donghyeon Jeong <[email protected]>

Signed-off-by: Donghyeon Jeong <[email protected]>

- divide in tensor now supports FP16 - ranged in test util supports FP16 - fix zoneout_rate from fp16 to float Signed-off-by: Donghyeon Jeong <[email protected]>

This pr includes bug fixes for mixed tensor supports **Self evaluation:** 1. Build test: [X]Passed [ ]Failed [ ]Skipped 2. Run test: [X]Passed [ ]Failed [ ]Skipped Signed-off-by: jijoong.moon <[email protected]>

This PR add the tensor type (Format, Weight Tensor DataType, Activation Tensor DataType) in initContext. - Remove the tensor type variables and setter, getter member function in layer, layer_devel, loss layer etc. - add tensor type setter in initContext - set the var_grad ( input & ouput ) Tensor Type according to model Tensor Data Type. - Add ModelTensorTypeInfo : eg. FP16_FP16 ( Weight FP16, Activation FP16 ) **Self evaluation:** 1. Build test: [X]Passed [ ]Failed [ ]Skipped 2. Run test: [X]Passed [ ]Failed [ ]Skipped Signed-off-by: jijoong.moon <[email protected]>

Describe a commit content (Until 80 colums per line) in detail ASAP. **Changes proposed in this PR:** - Added TOC generator for README.md Resolves: **Self evaluation:** 1. Build test: [X]Passed [ ]Failed [ ]Skipped 2. Run test: [X]Passed [ ]Failed [ ]Skipped Signed-off-by: jijoong.moon <[email protected]>

In order to support gcc-13 & ndk-build, the apply member function needs to be templetize. And also it makes sence define apply function. **Self evaluation:** 1. Build test: [X]Passed [ ]Failed [ ]Skipped 2. Run test: [X]Passed [ ]Failed [ ]Skipped Signed-off-by: jijoong.moon <[email protected]>

- Gradient tensor values are inconsistently set to NaN - NaN values caused incorrect backwarding in Neural Net - Replacing malloc with calloc prevents memory allocation with value set to NaN Signed-off-by: Donghyeon Jeong <[email protected]>

* Enable neon sgemv function in Android (ARM) __fp16 computation * note: this pr includes a significant part of PR#1981 of nnstreamer/nntrainer Signed-off-by: skykongkong8 <[email protected]>

FP16 is seperated from FP32 in apply function. Signed-off-by: Donghyeon Jeong <[email protected]>

The transfer_learning variable is a variable set by the user and does not change during execution. Changed bool to const bool. Signed-off-by: SeoHyungjun <[email protected]>

This PR includes changes of Tensor and TensorDim to support NHWC computation for dot, add_strided, multiply_strided, cat, split, and transpose. It also includes unittests to evaluate. **Self evaluation:** 1. Build test: [X]Passed [ ]Failed [ ]Skipped 2. Run test: [X]Passed [ ]Failed [ ]Skipped Signed-off-by: Adwaith Anand <[email protected]> Signed-off-by: Manohara HK <[email protected]> Signed-off-by: jijoong.moon <[email protected]>

- Explicitly provide the parameter as default parameter for stl iterator is deprecated. Signed-off-by: hyeonseok lee <[email protected]>

This patch includes gcc-13 compatible fixes. **Self evaluation:** 1. Build test: [X]Passed [ ]Failed [ ]Skipped 2. Run test: [X]Passed [ ]Failed [ ]Skipped Signed-off-by: jijoong.moon <[email protected]>

Fix some issues of svace and coverity. **Self evaluation:** 1. Build test: [X]Passed [ ]Failed [ ]Skipped 2. Run test: [X]Passed [ ]Failed [ ]Skipped Signed-off-by: Seungbaek Hong <[email protected]>

- Remove warning flags which helps to compile with gcc 13. - Remove multiout testcase cause this test cannot guarantees the multiout layer order Signed-off-by: hyeonseok lee <[email protected]>

- Remove unused variables Signed-off-by: hyeonseok lee <[email protected]>

When we print the model architecture using summarize method, nntrainer prints input dimension of each layer. But, tensorflow and pytorch are printing output dimmension of each layer in the summary, thus it is inconvenient to compare each layer with tf and torch models. Thus, I suggest to print output dimension of each layer instead of input dimension in the model summary. **Self evaluation:** 1. Build test: [X]Passed [ ]Failed [ ]Skipped 2. Run test: [X]Passed [ ]Failed [ ]Skipped Signed-off-by: Seungbaek Hong <[email protected]>

for Export Tflite format with Fused Op add some Variable and Function 1. Add getter, setter, replace to weights - for Fused Op we need to adjust weights after made Opnode 2. Add isToBeRemove variable - After made Opnode, check condition and mark as to be remove 3. Add additional_props - for BatchNormalization Fused Op we need additional props from nntrainer - made vector<float> variable for save additional data Signed-off-by: DongHak Park <[email protected]>

For Fused OP Made Realized Path 1. Check Trainable - check node is trainable or not for fusing 2. Conv + ReLU Fusing 3. Batch Normalization Fusing Signed-off-by: DongHak Park <[email protected]>

Add Epsilon Props to additional_props for fusing - For Fusing we need Epsilon for batch norm Add padding, stride props to props_vector - For Conv Fusing we need to made new BuiltinOption and for building new BuiltinOption with FUSED activation we need padding,stride Signed-off-by: DongHak Park <[email protected]>

- Added Wno-maybe-uninitialized flag Signed-off-by: hyeonseok lee <[email protected]>

Create multiout nodes with a given connection order in building a frequency map. Signed-off-by: Donghyeon Jeong <[email protected]>

This path checks requested memory is weight gradient which information will be used for planning. Signed-off-by: Jiho Chu <[email protected]>

It adds timeout option to adjust meson test timeout. Signed-off-by: Jiho Chu <[email protected]>

* generation : work with genLayerTests.py and use record_single_fp16 * data comparison : from sizeCheckedReadTensor, read with _FP16 memory size offset Signed-off-by: skykongkong8 <[email protected]>

Enable neon saxpy function for Android (ARM) __fp16 computation Signed-off-by: Debadri Samaddar <[email protected]>

Enables the gtest for half precision NEON functions in Android(ARM). **Self evaluation:** 1. Build test: [X]Passed [ ]Failed [ ]Skipped 2. Run test: [X]Passed [ ]Failed [ ]Skipped Signed-off-by: Debadri Samaddar <[email protected]>

Added conditions for handling function call based USE__FP16 identifier. Signed-off-by: Debadri Samaddar <[email protected]>

Enable neon sdot function for Android (ARM) fp16 computation. Add unit test for fp16 sdot function in Android(ARM). **Self evaluation:** 1. Build test: [X]Passed [ ]Failed [ ]Skipped 2. Run test: [X]Passed [ ]Failed [ ]Skipped Signed-off-by: Debadri Samaddar <[email protected]>

s-debadri · 2023-08-10T11:42:22Z

Opened another clean PR (#31) with the changes.

jijoongmoon and others added 30 commits July 4, 2023 10:05

[GTEST] add gtest for tensor unittest in Android

7bf39f6

This PR enables the gtest for Android. Especially half precision test. **Self evaluation:** 1. Build test: [X]Passed [ ]Failed [ ]Skipped 2. Run test: [X]Passed [ ]Failed [ ]Skipped Signed-off-by: jijoong.moon <[email protected]>

[ WIP ] Mixed Tensor Data Type

149e087

Modification for Mixed Tensor Data Type **Self evaluation:** 1. Build test: [X]Passed [ ]Failed [ ]Skipped 2. Run test: [X]Passed [ ]Failed [ ]Skipped Signed-off-by: jijoong.moon <[email protected]>

[ GTEST ] Add gtest to run gtest in android device

4356071

Add Gtest codes **Self evaluation:** 1. Build test: [X]Passed [ ]Failed [ ]Skipped 2. Run test: [X]Passed [ ]Failed [ ]Skipped Signed-off-by: jijoong.moon <[email protected]>

[WIP] [Tensor] Add __fp16 to Tensor member functions

ae56ecf

* add if-elseif code block to each Tensor member function * (trivial) fix trivial typos * TODO: check for missed functions Signed-off-by: skykongkong8 <[email protected]>

[WIP] [Tensor] Add __fp16 to Tensor member functions

acabe63

* add if-elsif code block to each Tensor member function * fix trivial missed functions Signed-off-by: skykongkong8 <[email protected]>

[WIP] [Tensor] Add __fp16 supporting functions in blas_interface

631f500

* Add __fp16 support with #ifdef, and parameter overloading * (trivial) fix typo * TODO: replace with valid __fp16 supporting functions Signed-off-by: skykongkong8 <[email protected]>

[Application] provide default tensortype

3863352

- add tensortype to avoid error in initialization Signed-off-by: Donghyeon Jeong <[email protected]>

[unittest] include excluded tensor type in test cases

47d3dab

- replace Tformat & Tdatatype with TensorType - include missing Tdatatype Signed-off-by: Donghyeon Jeong <[email protected]>

[unittest] Add data type for testing tensor

8e0f10e

- add Tdatatype to avoid error - default datda type is FP32 - Tformat & Tdatatype is used to create TensorType Signed-off-by: Donghyeon Jeong <[email protected]>

[unittest] static cast answer data to fp16

1858694

- static_cast<__fp16> is needed to avoid narrowing conversion error Signed-off-by: Donghyeon Jeong <[email protected]>

[WIP] [__fp16] Verify through __fp16 unittests

619d407

* Uncomment __fp16 testcases, then verify & debug * fix missing functions or varibles in tensor and blas_interface * TODO: do the last, fix setDist function, find erf function Signed-off-by: skykongkong8 <[email protected]>

[Tensor] check data allocation in add/multiply_strided

830c3db

Signed-off-by: Donghyeon Jeong <[email protected]>

[Bug] Fix memory access error in addValue

6a9661b

- Previously memory access to tensor data was incorrect - Change to direct access to data with index instead of calculating the index Signed-off-by: Donghyeon Jeong <[email protected]>

Enable gcc-13 compile with FP16

a495f7b

- Match FP16 types to avoid greater conversion rank error - Replace deprecated functions in gcc-13 - Add apply function for FP16 in Tensor Signed-off-by: Donghyeon Jeong <[email protected]>

[Bug] Fix tensor_pool unittest error

6b8674c

Signed-off-by: Donghyeon Jeong <[email protected]>

[Tensor] Enable FP16 in gcc-13

cdaf767

- divide in tensor now supports FP16 - ranged in test util supports FP16 - fix zoneout_rate from fp16 to float Signed-off-by: Donghyeon Jeong <[email protected]>

[ Mixed Tensor ] Bug Fixes

ed4f7e4

This pr includes bug fixes for mixed tensor supports **Self evaluation:** 1. Build test: [X]Passed [ ]Failed [ ]Skipped 2. Run test: [X]Passed [ ]Failed [ ]Skipped Signed-off-by: jijoong.moon <[email protected]>

[Bug] Fix generating nan values in tensor

75c9cc8

- Gradient tensor values are inconsistently set to NaN - NaN values caused incorrect backwarding in Neural Net - Replacing malloc with calloc prevents memory allocation with value set to NaN Signed-off-by: Donghyeon Jeong <[email protected]>

[ blas/neon ] Add neon_blas files

9b4a35b

* Enable neon sgemv function in Android (ARM) __fp16 computation * note: this pr includes a significant part of PR#1981 of nnstreamer/nntrainer Signed-off-by: skykongkong8 <[email protected]>

[Bug] Fix unchanged work in Apply template

42a1caf

FP16 is seperated from FP32 in apply function. Signed-off-by: Donghyeon Jeong <[email protected]>

[fix_ahub] Fix Ahub Defect

824c38f

The transfer_learning variable is a variable set by the user and does not change during execution. Changed bool to const bool. Signed-off-by: SeoHyungjun <[email protected]>

lhs8928 and others added 18 commits August 4, 2023 13:54

[graph_node] handle deprecated stl iterator

6704aa1

- Explicitly provide the parameter as default parameter for stl iterator is deprecated. Signed-off-by: hyeonseok lee <[email protected]>

[Toolchain] Enable gcc-13 support

6b6d221

This patch includes gcc-13 compatible fixes. **Self evaluation:** 1. Build test: [X]Passed [ ]Failed [ ]Skipped 2. Run test: [X]Passed [ ]Failed [ ]Skipped Signed-off-by: jijoong.moon <[email protected]>

[ahub] fix ahub issues

82d07af

Fix some issues of svace and coverity. **Self evaluation:** 1. Build test: [X]Passed [ ]Failed [ ]Skipped 2. Run test: [X]Passed [ ]Failed [ ]Skipped Signed-off-by: Seungbaek Hong <[email protected]>

remove warning flags related to compile with gcc-13

99d8db9

- Remove warning flags which helps to compile with gcc 13. - Remove multiout testcase cause this test cannot guarantees the multiout layer order Signed-off-by: hyeonseok lee <[email protected]>

remove unused variable

1ab236a

- Remove unused variables Signed-off-by: hyeonseok lee <[email protected]>

[TFLite Export] Add Realized Path for Fused Op

0b212f2

For Fused OP Made Realized Path 1. Check Trainable - check node is trainable or not for fusing 2. Conv + ReLU Fusing 3. Batch Normalization Fusing Signed-off-by: DongHak Park <[email protected]>

[bugfix] added warning flag to compile with gcc 13

6b46753

- Added Wno-maybe-uninitialized flag Signed-off-by: hyeonseok lee <[email protected]>

[Compiler] Preserve connection order in multi-out realizer

2b162a7

Create multiout nodes with a given connection order in building a frequency map. Signed-off-by: Donghyeon Jeong <[email protected]>

[FIX] modified for checking weight grad

dbbe0ef

This path checks requested memory is weight gradient which information will be used for planning. Signed-off-by: Jiho Chu <[email protected]>

[TEST] Add timeout option

d3a04c5

It adds timeout option to adjust meson test timeout. Signed-off-by: Jiho Chu <[email protected]>

[test] Enable fp16 golden test data

9a560c1

* generation : work with genLayerTests.py and use record_single_fp16 * data comparison : from sizeCheckedReadTensor, read with _FP16 memory size offset Signed-off-by: skykongkong8 <[email protected]>

[ blas/neon ] Add NEON fp16 function for saxpy

cfb9558

Enable neon saxpy function for Android (ARM) __fp16 computation Signed-off-by: Debadri Samaddar <[email protected]>

[Bug] Fix redundant call to sgemv fp16 function

7ada193

Added conditions for handling function call based USE__FP16 identifier. Signed-off-by: Debadri Samaddar <[email protected]>

s-debadri requested a review from jijoongmoon as a code owner August 8, 2023 11:21

jijoongmoon force-pushed the tensor_type_in_dim branch 4 times, most recently from 24d0d5b to d454dca Compare August 9, 2023 06:56

s-debadri closed this Aug 10, 2023

s-debadri deleted the tensor_type_in_dim branch August 10, 2023 11:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ blas/neon ] Add NEON fp16 function for sdot #26

[ blas/neon ] Add NEON fp16 function for sdot #26

s-debadri commented Aug 8, 2023

s-debadri commented Aug 10, 2023

[ blas/neon ] Add NEON fp16 function for sdot #26

[ blas/neon ] Add NEON fp16 function for sdot #26

Conversation

s-debadri commented Aug 8, 2023

Commits to be reviewd in this PR

s-debadri commented Aug 10, 2023