Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[neon] Modify neon sgemv_fp16 #46

Closed
wants to merge 4 commits into from

Commits on Sep 26, 2023

  1. [neon] Modify neon sgemv_fp16

    - Previously, sgemv_fp16 was dependent of two conditions:
    	1. should have 8-divisible column or row
    	2. fully work with fp16 digit (which might raise accuracy issue)
    - In this commit, we expect sgemv to work like:
    	1. support every column length (with adaptive-compute optimization)
    	2. use temporal fp32 array to secure cumulative digit error in large scale Tensor
    	3. accelerate fp32 to fp16 copy and vice versa with neon to enhance time performance
    - some trivial typo fix included
    
    **Self evaluation:**
    1. Build test:     [X]Passed [ ]Failed [ ]Skipped
    2. Run test:     [X]Passed [ ]Failed [ ]Skipped
    
    Signed-off-by: skykongkong8 <[email protected]>
    skykongkong8 committed Sep 26, 2023
    Configuration menu
    Copy the full SHA
    cf32826 View commit details
    Browse the repository at this point in the history

Commits on Sep 27, 2023

  1. [neon] Memory optimization for fp32-array-using sgemv_fp16

    - Instead of explicitly declaring float16x4_t and converting into float32x4_t, it is better to implement it in inline code considering the number of registers on device, and memory consumption.
    
    Resolves:
    
    **Self evaluation:**
    1. Build test:     [X]Passed [ ]Failed [ ]Skipped
    2. Run test:     [X]Passed [ ]Failed [ ]Skipped
    
    Signed-off-by: skykongkong8 <[email protected]>
    skykongkong8 committed Sep 27, 2023
    Configuration menu
    Copy the full SHA
    2a73d01 View commit details
    Browse the repository at this point in the history
  2. [neon] Use inline code in sgemm_fp16 when converting into fp32

    - Instead of explicitly declaring float16x4_t and converting into float32x4_t, it is better to implement it in inline code considering the number of registers on device, and memory consumption.
    
    **Self evaluation:**
    1. Build test:     [X]Passed [ ]Failed [ ]Skipped
    2. Run test:     [X]Passed [ ]Failed [ ]Skipped
    
    Signed-off-by: skykongkong8 <[email protected]>
    skykongkong8 committed Sep 27, 2023
    Configuration menu
    Copy the full SHA
    4f55742 View commit details
    Browse the repository at this point in the history

Commits on Oct 5, 2023

  1. [blas] Fix sgemv and sgemm loop for fp16

    - Previously, we used full-fp16 variables for sgemv and sgemm loop code.
    - However, such practice might cause acummulation error that exceeds our expected epsilon.
    - Now, it uses inter-fp32 value to preseve accuracy and avoid precision loss.
    
    Self evaluation:**
    1. Build test:     [X]Passed [ ]Failed [ ]Skipped
    2. Run test:     [X]Passed [ ]Failed [ ]Skipped
    
    Signed-off-by: skykongkong8 <[email protected]>
    skykongkong8 committed Oct 5, 2023
    Configuration menu
    Copy the full SHA
    8702b79 View commit details
    Browse the repository at this point in the history