[neon] Modify neon sgemv_fp16 #46

skykongkong8 · 2023-09-26T06:10:23Z

Previously, sgemv_fp16 was dependent on two conditions:
1. should have 8-divisible column or row
2. fully work with fp16 digit (which might raise accuracy issue)
In this commit, we expect sgemv to work like:
1. support every column length (with adaptive-compute optimization)
2. use temporal fp32 array to secure cumulative digit error in large scale Tensor
3. accelerate fp32 to fp16 copy and vice versa with neon to enhance time performance
some trivial typo fix included

Self evaluation:

Build test: [X]Passed [ ]Failed [ ]Skipped
Run test: [X]Passed [ ]Failed [ ]Skipped

- Previously, sgemv_fp16 was dependent of two conditions: 1. should have 8-divisible column or row 2. fully work with fp16 digit (which might raise accuracy issue) - In this commit, we expect sgemv to work like: 1. support every column length (with adaptive-compute optimization) 2. use temporal fp32 array to secure cumulative digit error in large scale Tensor 3. accelerate fp32 to fp16 copy and vice versa with neon to enhance time performance - some trivial typo fix included **Self evaluation:** 1. Build test: [X]Passed [ ]Failed [ ]Skipped 2. Run test: [X]Passed [ ]Failed [ ]Skipped Signed-off-by: skykongkong8 <[email protected]>

s-debadri

LGTM

- Instead of explicitly declaring float16x4_t and converting into float32x4_t, it is better to implement it in inline code considering the number of registers on device, and memory consumption. Resolves: **Self evaluation:** 1. Build test: [X]Passed [ ]Failed [ ]Skipped 2. Run test: [X]Passed [ ]Failed [ ]Skipped Signed-off-by: skykongkong8 <[email protected]>

s-debadri · 2023-09-27T05:24:29Z

nntrainer/tensor/blas_neon.cpp

-        float32x4_t b0_7_2_low = vcvt_f32_f16(vget_low_f16(b0_7_2));
-        float32x4_t b0_7_2_high = vcvt_f32_f16(vget_high_f16(b0_7_2));
+        float32x4_t b0_7_0_low = vcvt_f32_f16(vld1_f16(&B[k * N + n]));
+        float32x4_t b0_7_0_high = vcvt_f32_f16(vld1_f16(&B[k * N + n]));


This should be float32x4_t b0_7_0_high = vcvt_f32_f16(vld1_f16(&B[k * N + n + 4]));
Otherwise it will load lower bits.

Fixed. Thanks!

- Instead of explicitly declaring float16x4_t and converting into float32x4_t, it is better to implement it in inline code considering the number of registers on device, and memory consumption. **Self evaluation:** 1. Build test: [X]Passed [ ]Failed [ ]Skipped 2. Run test: [X]Passed [ ]Failed [ ]Skipped Signed-off-by: skykongkong8 <[email protected]>

jijoongmoon

It seems the changes of tensor.h and tensor.cpp is not related with this PR. Could you remove those files in this PR, so that we can merge this pr directly.
Also you can make this pr to upstream.

- Previously, we used full-fp16 variables for sgemv and sgemm loop code. - However, such practice might cause acummulation error that exceeds our expected epsilon. - Now, it uses inter-fp32 value to preseve accuracy and avoid precision loss. Self evaluation:** 1. Build test: [X]Passed [ ]Failed [ ]Skipped 2. Run test: [X]Passed [ ]Failed [ ]Skipped Signed-off-by: skykongkong8 <[email protected]>

skykongkong8 · 2023-10-16T03:29:36Z

Already merged to upstream.
Close.

skykongkong8 requested a review from jijoongmoon as a code owner September 26, 2023 06:10

skykongkong8 force-pushed the soft_pr branch from 0fc814a to cf32826 Compare September 26, 2023 06:26

s-debadri reviewed Sep 26, 2023

View reviewed changes

s-debadri reviewed Sep 27, 2023

View reviewed changes

skykongkong8 force-pushed the soft_pr branch from ca7ca69 to 4f55742 Compare September 27, 2023 05:27

jijoongmoon approved these changes Oct 4, 2023

View reviewed changes

skykongkong8 force-pushed the soft_pr branch from 017f73f to 8702b79 Compare October 5, 2023 02:25

skykongkong8 closed this Oct 16, 2023

skykongkong8 deleted the soft_pr branch November 9, 2023 00:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[neon] Modify neon sgemv_fp16 #46

[neon] Modify neon sgemv_fp16 #46

skykongkong8 commented Sep 26, 2023 •

edited

Loading

s-debadri left a comment

s-debadri Sep 27, 2023

skykongkong8 Sep 27, 2023

jijoongmoon left a comment •

edited

Loading

skykongkong8 commented Oct 16, 2023

[neon] Modify neon sgemv_fp16 #46

[neon] Modify neon sgemv_fp16 #46

Conversation

skykongkong8 commented Sep 26, 2023 • edited Loading

s-debadri left a comment

Choose a reason for hiding this comment

s-debadri Sep 27, 2023

Choose a reason for hiding this comment

skykongkong8 Sep 27, 2023

Choose a reason for hiding this comment

jijoongmoon left a comment • edited Loading

Choose a reason for hiding this comment

skykongkong8 commented Oct 16, 2023

skykongkong8 commented Sep 26, 2023 •

edited

Loading

jijoongmoon left a comment •

edited

Loading