forked from nnstreamer/nntrainer
-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ blas/neon ] Add NEON fp16 function for sdot #26
Closed
s-debadri
wants to merge
48
commits into
jijoongmoon:tensor_type_in_dim
from
s-debadri:tensor_type_in_dim
Closed
[ blas/neon ] Add NEON fp16 function for sdot #26
s-debadri
wants to merge
48
commits into
jijoongmoon:tensor_type_in_dim
from
s-debadri:tensor_type_in_dim
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This PR enables the tensor type in model property as "tensor_type=NHWC" or "tensor_type=NCHW". This information goes to network_grap and layer node & manager. Then, each layer can get the model tensor type information and it can be used to request tensor or just using temporal tensor. Resolves: **Self evaluation:** 1. Build test: [X]Passed [ ]Failed [ ]Skipped 2. Run test: [X]Passed [ ]Failed [ ]Skipped Signed-off-by: jijoong.moon <[email protected]>
This PR enables the Mixed Precision computation. - Add the data_type property in Tensor : FP16, FP32 - Memory_Data only handle void * - In Tensor, there were several member function with template : getAddress<float>() , getData<__fp16>, etc. - Need to implement Blas Interface function **Self evaluation:** 1. Build test: [X]Passed [ ]Failed [ ]Skipped 2. Run test: [X]Passed [ ]Failed [ ]Skipped Signed-off-by: jijoong.moon <[email protected]>
This PR enables the gtest for Android. Especially half precision test. **Self evaluation:** 1. Build test: [X]Passed [ ]Failed [ ]Skipped 2. Run test: [X]Passed [ ]Failed [ ]Skipped Signed-off-by: jijoong.moon <[email protected]>
Modification for Mixed Tensor Data Type **Self evaluation:** 1. Build test: [X]Passed [ ]Failed [ ]Skipped 2. Run test: [X]Passed [ ]Failed [ ]Skipped Signed-off-by: jijoong.moon <[email protected]>
Add Gtest codes **Self evaluation:** 1. Build test: [X]Passed [ ]Failed [ ]Skipped 2. Run test: [X]Passed [ ]Failed [ ]Skipped Signed-off-by: jijoong.moon <[email protected]>
This PR includes changes of Tensor and TensorDim to support NHWC computation for dot, add_strided, multiply_strided, cat, split, and transpose. It also includes unittests to evaluate. **Self evaluation:** 1. Build test: [X]Passed [ ]Failed [ ]Skipped 2. Run test: [X]Passed [ ]Failed [ ]Skipped Signed-off-by: Adwaith Anand <[email protected]> Signed-off-by: Manohara HK <[email protected]> Signed-off-by: jijoong.moon <[email protected]>
This PR enables the tensor type and tensor format in model property as "tensor_format=NHWC" or "tensor_type=FP16". This information goes to network_grap and layer node & manager. Then, each layer can get the model tensor type information and it can be used to request tensor or just using temporal tensor. Resolves: **Self evaluation:** 1. Build test: [X]Passed [ ]Failed [ ]Skipped 2. Run test: [X]Passed [ ]Failed [ ]Skipped Signed-off-by: jijoong.moon <[email protected]>
* add if-elseif code block to each Tensor member function * (trivial) fix trivial typos * TODO: check for missed functions Signed-off-by: skykongkong8 <[email protected]>
* add if-elsif code block to each Tensor member function * fix trivial missed functions Signed-off-by: skykongkong8 <[email protected]>
* Add __fp16 support with #ifdef, and parameter overloading * (trivial) fix typo * TODO: replace with valid __fp16 supporting functions Signed-off-by: skykongkong8 <[email protected]>
- add tensortype to avoid error in initialization Signed-off-by: Donghyeon Jeong <[email protected]>
- replace Tformat & Tdatatype with TensorType - include missing Tdatatype Signed-off-by: Donghyeon Jeong <[email protected]>
- add Tdatatype to avoid error - default datda type is FP32 - Tformat & Tdatatype is used to create TensorType Signed-off-by: Donghyeon Jeong <[email protected]>
- static_cast<__fp16> is needed to avoid narrowing conversion error Signed-off-by: Donghyeon Jeong <[email protected]>
* Uncomment __fp16 testcases, then verify & debug * fix missing functions or varibles in tensor and blas_interface * TODO: do the last, fix setDist function, find erf function Signed-off-by: skykongkong8 <[email protected]>
Signed-off-by: Donghyeon Jeong <[email protected]>
- Previously memory access to tensor data was incorrect - Change to direct access to data with index instead of calculating the index Signed-off-by: Donghyeon Jeong <[email protected]>
This PR enables the FP32 unittest cases. It includes various fixes and adding compiler preprocessor pragmas. **Self evaluation:** 1. Build test: [X]Passed [ ]Failed [ ]Skipped 2. Run test: [X]Passed [ ]Failed [ ]Skipped Signed-off-by: jijoong.moon <[email protected]>
- Match FP16 types to avoid greater conversion rank error - Replace deprecated functions in gcc-13 - Add apply function for FP16 in Tensor Signed-off-by: Donghyeon Jeong <[email protected]>
Signed-off-by: Donghyeon Jeong <[email protected]>
- divide in tensor now supports FP16 - ranged in test util supports FP16 - fix zoneout_rate from fp16 to float Signed-off-by: Donghyeon Jeong <[email protected]>
This pr includes bug fixes for mixed tensor supports **Self evaluation:** 1. Build test: [X]Passed [ ]Failed [ ]Skipped 2. Run test: [X]Passed [ ]Failed [ ]Skipped Signed-off-by: jijoong.moon <[email protected]>
This PR add the tensor type (Format, Weight Tensor DataType, Activation Tensor DataType) in initContext. - Remove the tensor type variables and setter, getter member function in layer, layer_devel, loss layer etc. - add tensor type setter in initContext - set the var_grad ( input & ouput ) Tensor Type according to model Tensor Data Type. - Add ModelTensorTypeInfo : eg. FP16_FP16 ( Weight FP16, Activation FP16 ) **Self evaluation:** 1. Build test: [X]Passed [ ]Failed [ ]Skipped 2. Run test: [X]Passed [ ]Failed [ ]Skipped Signed-off-by: jijoong.moon <[email protected]>
Describe a commit content (Until 80 colums per line) in detail ASAP. **Changes proposed in this PR:** - Added TOC generator for README.md Resolves: **Self evaluation:** 1. Build test: [X]Passed [ ]Failed [ ]Skipped 2. Run test: [X]Passed [ ]Failed [ ]Skipped Signed-off-by: jijoong.moon <[email protected]>
In order to support gcc-13 & ndk-build, the apply member function needs to be templetize. And also it makes sence define apply function. **Self evaluation:** 1. Build test: [X]Passed [ ]Failed [ ]Skipped 2. Run test: [X]Passed [ ]Failed [ ]Skipped Signed-off-by: jijoong.moon <[email protected]>
- Gradient tensor values are inconsistently set to NaN - NaN values caused incorrect backwarding in Neural Net - Replacing malloc with calloc prevents memory allocation with value set to NaN Signed-off-by: Donghyeon Jeong <[email protected]>
* Enable neon sgemv function in Android (ARM) __fp16 computation * note: this pr includes a significant part of PR#1981 of nnstreamer/nntrainer Signed-off-by: skykongkong8 <[email protected]>
FP16 is seperated from FP32 in apply function. Signed-off-by: Donghyeon Jeong <[email protected]>
The transfer_learning variable is a variable set by the user and does not change during execution. Changed bool to const bool. Signed-off-by: SeoHyungjun <[email protected]>
This PR includes changes of Tensor and TensorDim to support NHWC computation for dot, add_strided, multiply_strided, cat, split, and transpose. It also includes unittests to evaluate. **Self evaluation:** 1. Build test: [X]Passed [ ]Failed [ ]Skipped 2. Run test: [X]Passed [ ]Failed [ ]Skipped Signed-off-by: Adwaith Anand <[email protected]> Signed-off-by: Manohara HK <[email protected]> Signed-off-by: jijoong.moon <[email protected]>
- Explicitly provide the parameter as default parameter for stl iterator is deprecated. Signed-off-by: hyeonseok lee <[email protected]>
This patch includes gcc-13 compatible fixes. **Self evaluation:** 1. Build test: [X]Passed [ ]Failed [ ]Skipped 2. Run test: [X]Passed [ ]Failed [ ]Skipped Signed-off-by: jijoong.moon <[email protected]>
Fix some issues of svace and coverity. **Self evaluation:** 1. Build test: [X]Passed [ ]Failed [ ]Skipped 2. Run test: [X]Passed [ ]Failed [ ]Skipped Signed-off-by: Seungbaek Hong <[email protected]>
- Remove warning flags which helps to compile with gcc 13. - Remove multiout testcase cause this test cannot guarantees the multiout layer order Signed-off-by: hyeonseok lee <[email protected]>
- Remove unused variables Signed-off-by: hyeonseok lee <[email protected]>
When we print the model architecture using summarize method, nntrainer prints input dimension of each layer. But, tensorflow and pytorch are printing output dimmension of each layer in the summary, thus it is inconvenient to compare each layer with tf and torch models. Thus, I suggest to print output dimension of each layer instead of input dimension in the model summary. **Self evaluation:** 1. Build test: [X]Passed [ ]Failed [ ]Skipped 2. Run test: [X]Passed [ ]Failed [ ]Skipped Signed-off-by: Seungbaek Hong <[email protected]>
for Export Tflite format with Fused Op add some Variable and Function 1. Add getter, setter, replace to weights - for Fused Op we need to adjust weights after made Opnode 2. Add isToBeRemove variable - After made Opnode, check condition and mark as to be remove 3. Add additional_props - for BatchNormalization Fused Op we need additional props from nntrainer - made vector<float> variable for save additional data Signed-off-by: DongHak Park <[email protected]>
For Fused OP Made Realized Path 1. Check Trainable - check node is trainable or not for fusing 2. Conv + ReLU Fusing 3. Batch Normalization Fusing Signed-off-by: DongHak Park <[email protected]>
Add Epsilon Props to additional_props for fusing - For Fusing we need Epsilon for batch norm Add padding, stride props to props_vector - For Conv Fusing we need to made new BuiltinOption and for building new BuiltinOption with FUSED activation we need padding,stride Signed-off-by: DongHak Park <[email protected]>
- Added Wno-maybe-uninitialized flag Signed-off-by: hyeonseok lee <[email protected]>
Create multiout nodes with a given connection order in building a frequency map. Signed-off-by: Donghyeon Jeong <[email protected]>
This path checks requested memory is weight gradient which information will be used for planning. Signed-off-by: Jiho Chu <[email protected]>
It adds timeout option to adjust meson test timeout. Signed-off-by: Jiho Chu <[email protected]>
* generation : work with genLayerTests.py and use record_single_fp16 * data comparison : from sizeCheckedReadTensor, read with _FP16 memory size offset Signed-off-by: skykongkong8 <[email protected]>
Enable neon saxpy function for Android (ARM) __fp16 computation Signed-off-by: Debadri Samaddar <[email protected]>
Enables the gtest for half precision NEON functions in Android(ARM). **Self evaluation:** 1. Build test: [X]Passed [ ]Failed [ ]Skipped 2. Run test: [X]Passed [ ]Failed [ ]Skipped Signed-off-by: Debadri Samaddar <[email protected]>
Added conditions for handling function call based USE__FP16 identifier. Signed-off-by: Debadri Samaddar <[email protected]>
Enable neon sdot function for Android (ARM) fp16 computation. Add unit test for fp16 sdot function in Android(ARM). **Self evaluation:** 1. Build test: [X]Passed [ ]Failed [ ]Skipped 2. Run test: [X]Passed [ ]Failed [ ]Skipped Signed-off-by: Debadri Samaddar <[email protected]>
jijoongmoon
force-pushed
the
tensor_type_in_dim
branch
4 times, most recently
from
August 9, 2023 06:56
24d0d5b
to
d454dca
Compare
Opened another clean PR (#31) with the changes. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Commits to be reviewd in this PR
Add NEON fp16 function for sdot
Enable neon sdot function for Android (ARM) fp16 computation.
Add unit test for fp16 sdot function in Android(ARM).
Signed-off-by:s-debadri [email protected]