Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ blas/neon ] Add NEON fp16 function for sdot #26

Closed

Conversation

s-debadri
Copy link

Commits to be reviewd in this PR

Add NEON fp16 function for sdot

Enable neon sdot function for Android (ARM) fp16 computation.
Add unit test for fp16 sdot function in Android(ARM).

Signed-off-by:s-debadri [email protected]

jijoongmoon and others added 30 commits July 4, 2023 10:05
This PR enables the tensor type in model property as
"tensor_type=NHWC" or "tensor_type=NCHW". This information goes to
network_grap and layer node & manager.

Then, each layer can get the model tensor type information and it can
be used to request tensor or just using temporal tensor.

Resolves:

**Self evaluation:**
1. Build test:	 [X]Passed [ ]Failed [ ]Skipped
2. Run test:	 [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: jijoong.moon <[email protected]>
This PR enables the Mixed Precision computation.
- Add the data_type property in Tensor : FP16, FP32
- Memory_Data only handle void *
- In Tensor, there were several member function with template
   : getAddress<float>() , getData<__fp16>, etc.
- Need to implement Blas Interface function

**Self evaluation:**
1. Build test:	 [X]Passed [ ]Failed [ ]Skipped
2. Run test:	 [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: jijoong.moon <[email protected]>
This PR enables the gtest for Android. Especially half precision
test.

**Self evaluation:**
1. Build test:	 [X]Passed [ ]Failed [ ]Skipped
2. Run test:	 [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: jijoong.moon <[email protected]>
Modification for Mixed Tensor Data Type

**Self evaluation:**
1. Build test:	 [X]Passed [ ]Failed [ ]Skipped
2. Run test:	 [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: jijoong.moon <[email protected]>
Add Gtest codes

**Self evaluation:**
1. Build test:	 [X]Passed [ ]Failed [ ]Skipped
2. Run test:	 [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: jijoong.moon <[email protected]>
This PR includes changes of Tensor and TensorDim to support NHWC
computation for dot, add_strided, multiply_strided, cat, split,
and transpose. It also includes unittests to evaluate.

**Self evaluation:**
1. Build test:	 [X]Passed [ ]Failed [ ]Skipped
2. Run test:	 [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: Adwaith Anand <[email protected]>
Signed-off-by: Manohara HK <[email protected]>
Signed-off-by: jijoong.moon <[email protected]>
This PR enables the tensor type and tensor format in model property as
"tensor_format=NHWC" or "tensor_type=FP16". This information goes to
network_grap and layer node & manager.

Then, each layer can get the model tensor type information and it can
be used to request tensor or just using temporal tensor.

Resolves:

**Self evaluation:**
1. Build test:	 [X]Passed [ ]Failed [ ]Skipped
2. Run test:	 [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: jijoong.moon <[email protected]>
* add if-elseif code block to each Tensor member function
* (trivial) fix trivial typos
* TODO: check for missed functions

Signed-off-by: skykongkong8 <[email protected]>
* add if-elsif code block to each Tensor member function
* fix trivial missed functions

Signed-off-by: skykongkong8 <[email protected]>
* Add __fp16 support with #ifdef, and parameter overloading
* (trivial) fix typo
* TODO: replace with valid __fp16 supporting functions

Signed-off-by: skykongkong8 <[email protected]>
- add tensortype to avoid error in initialization

Signed-off-by: Donghyeon Jeong <[email protected]>
- replace Tformat & Tdatatype with TensorType
- include missing Tdatatype

Signed-off-by: Donghyeon Jeong <[email protected]>
- add Tdatatype to avoid error
- default datda type is FP32
- Tformat & Tdatatype is used to create TensorType

Signed-off-by: Donghyeon Jeong <[email protected]>
- static_cast<__fp16> is needed to avoid narrowing conversion error

Signed-off-by: Donghyeon Jeong <[email protected]>
* Uncomment __fp16 testcases, then verify & debug
* fix missing functions or varibles in tensor and blas_interface
* TODO: do the last, fix setDist function, find erf function

Signed-off-by: skykongkong8 <[email protected]>
- Previously memory access to tensor data was incorrect
- Change to direct access to data with index instead of calculating the index

Signed-off-by: Donghyeon Jeong <[email protected]>
This PR enables the FP32 unittest cases. It includes various fixes and
adding compiler preprocessor pragmas.

**Self evaluation:**
1. Build test:	 [X]Passed [ ]Failed [ ]Skipped
2. Run test:	 [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: jijoong.moon <[email protected]>
- Match FP16 types to avoid greater conversion rank error
- Replace deprecated functions in gcc-13
- Add apply function for FP16 in Tensor

Signed-off-by: Donghyeon Jeong <[email protected]>
- divide in tensor now supports FP16
- ranged in test util supports FP16
- fix zoneout_rate from fp16 to float

Signed-off-by: Donghyeon Jeong <[email protected]>
This pr includes bug fixes for mixed tensor supports

**Self evaluation:**
1. Build test:	 [X]Passed [ ]Failed [ ]Skipped
2. Run test:	 [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: jijoong.moon <[email protected]>
This PR add the tensor type (Format, Weight Tensor DataType,
Activation Tensor DataType) in initContext.
- Remove the tensor type variables and setter, getter member function
in layer, layer_devel, loss layer etc.
- add tensor type setter in initContext
- set the var_grad ( input & ouput ) Tensor Type according to model
Tensor Data Type.
- Add ModelTensorTypeInfo : eg. FP16_FP16 ( Weight FP16, Activation
FP16 )

**Self evaluation:**
1. Build test:	 [X]Passed [ ]Failed [ ]Skipped
2. Run test:	 [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: jijoong.moon <[email protected]>
Describe a commit content (Until 80 colums per line) in detail ASAP.

**Changes proposed in this PR:**
- Added TOC generator for README.md

Resolves:

**Self evaluation:**
1. Build test:	 [X]Passed [ ]Failed [ ]Skipped
2. Run test:	 [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: jijoong.moon <[email protected]>
In order to support gcc-13 & ndk-build, the apply member function
needs to be templetize. And also it makes sence define apply
function.

**Self evaluation:**
1. Build test:	 [X]Passed [ ]Failed [ ]Skipped
2. Run test:	 [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: jijoong.moon <[email protected]>
- Gradient tensor values are inconsistently set to NaN
- NaN values caused incorrect backwarding in Neural Net
- Replacing malloc with calloc prevents memory allocation with value set to NaN

Signed-off-by: Donghyeon Jeong <[email protected]>
* Enable neon sgemv function in Android (ARM) __fp16 computation
* note: this pr includes a significant part of PR#1981 of nnstreamer/nntrainer

Signed-off-by: skykongkong8 <[email protected]>
FP16 is seperated from FP32 in apply function.

Signed-off-by: Donghyeon Jeong <[email protected]>
The transfer_learning variable is a variable set by the
user and does not change during execution.
Changed bool to const bool.

Signed-off-by: SeoHyungjun <[email protected]>
This PR includes changes of Tensor and TensorDim to support NHWC
computation for dot, add_strided, multiply_strided, cat, split,
and transpose. It also includes unittests to evaluate.

**Self evaluation:**
1. Build test:	 [X]Passed [ ]Failed [ ]Skipped
2. Run test:	 [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: Adwaith Anand <[email protected]>
Signed-off-by: Manohara HK <[email protected]>
Signed-off-by: jijoong.moon <[email protected]>
lhs8928 and others added 18 commits August 4, 2023 13:54
 - Explicitly provide the parameter as default parameter for stl iterator is deprecated.

Signed-off-by: hyeonseok lee <[email protected]>
This patch includes gcc-13 compatible fixes.

**Self evaluation:**
1. Build test:	 [X]Passed [ ]Failed [ ]Skipped
2. Run test:	 [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: jijoong.moon <[email protected]>
Fix some issues of svace and coverity.

**Self evaluation:**
1. Build test:	 [X]Passed [ ]Failed [ ]Skipped
2. Run test:	 [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: Seungbaek Hong <[email protected]>
 - Remove warning flags which helps to compile with gcc 13.
 - Remove multiout testcase cause this test cannot guarantees the multiout layer order

Signed-off-by: hyeonseok lee <[email protected]>
 - Remove unused variables

Signed-off-by: hyeonseok lee <[email protected]>
When we print the model architecture using summarize method,
nntrainer prints input dimension of each layer.

But, tensorflow and pytorch are printing output dimmension
of each layer in the summary, thus it is inconvenient
to compare each layer with tf and torch models.

Thus, I suggest to print output dimension of each layer
instead of input dimension in the model summary.

**Self evaluation:**
1. Build test:	 [X]Passed [ ]Failed [ ]Skipped
2. Run test:	 [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: Seungbaek Hong <[email protected]>
for Export Tflite format with Fused Op add some Variable and Function

1. Add getter, setter, replace to weights
- for Fused Op we need to adjust weights after made Opnode

2. Add isToBeRemove variable
- After made Opnode, check condition and mark as to be remove

3. Add additional_props
- for BatchNormalization Fused Op we need additional props from nntrainer
- made vector<float> variable for save additional data

Signed-off-by: DongHak Park <[email protected]>
For Fused OP Made Realized Path

1. Check Trainable
 - check node is trainable or not for fusing
2. Conv + ReLU Fusing
3. Batch Normalization Fusing

Signed-off-by: DongHak Park <[email protected]>
Add Epsilon Props to additional_props for fusing
- For Fusing we need Epsilon for batch norm
Add padding, stride props to props_vector
- For Conv Fusing we need to made new BuiltinOption and for building new BuiltinOption with FUSED activation we need padding,stride

Signed-off-by: DongHak Park <[email protected]>
 - Added Wno-maybe-uninitialized flag

Signed-off-by: hyeonseok lee <[email protected]>
Create multiout nodes with a given connection order in building a frequency map.

Signed-off-by: Donghyeon Jeong <[email protected]>
This path checks requested memory is weight gradient which information
will be used for planning.

Signed-off-by: Jiho Chu <[email protected]>
It adds timeout option to adjust meson test timeout.

Signed-off-by: Jiho Chu <[email protected]>
* generation : work with genLayerTests.py and use record_single_fp16
* data comparison : from sizeCheckedReadTensor, read with _FP16 memory size offset

Signed-off-by: skykongkong8 <[email protected]>
Enable neon saxpy function for Android (ARM) __fp16 computation

Signed-off-by: Debadri Samaddar <[email protected]>
Enables the gtest for half precision NEON functions in Android(ARM).

**Self evaluation:**
1. Build test:	 [X]Passed [ ]Failed [ ]Skipped
2. Run test:	 [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: Debadri Samaddar <[email protected]>
Added conditions for handling function call based USE__FP16 identifier.

Signed-off-by: Debadri Samaddar <[email protected]>
Enable neon sdot function for Android (ARM) fp16 computation.
Add unit test for fp16 sdot function in Android(ARM).

**Self evaluation:**
1. Build test:	 [X]Passed [ ]Failed [ ]Skipped
2. Run test:	 [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: Debadri Samaddar <[email protected]>
@jijoongmoon jijoongmoon force-pushed the tensor_type_in_dim branch 4 times, most recently from 24d0d5b to d454dca Compare August 9, 2023 06:56
@s-debadri
Copy link
Author

Opened another clean PR (#31) with the changes.

@s-debadri s-debadri closed this Aug 10, 2023
@s-debadri s-debadri deleted the tensor_type_in_dim branch August 10, 2023 11:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants