Optimize `ttnn.round` with a direct implementation #13385

jdh8 · 2024-10-02T20:05:08Z

Rounding is only supported by Wormhole, and Wormhole, and Wormhole has the exact function float_to_int16 if the value is in range.
https://github.com/tenstorrent/tt-metal/blob/main/docs/source/tt-metalium/tt_metal/apis/kernel_apis/sfpu/llk.rst#wormhole-only

However, ttnn.round is implemented as a combination of ttnn.floor, ttnn.add, etc.

tt-metal/ttnn/cpp/ttnn/operations/eltwise/unary/device/unary_composite_op.cpp

Lines 640 to 653 in 679b8d5

    
           } else {  // Bankers' Rounding 
        
               Tensor rounded_non_half = ttnn::floor( 
        
                   ttnn::add( 
        
                       input, 
        
                       ttnn::where(ttnn::logical_and(ttnn::ge(input, 0.4), ttnn::le(input, 0.5)), 0.4f, 0.5f, output_mem_config.value()), 
        
                       std::nullopt, 
        
                       output_mem_config), 
        
                   output_mem_config.value()); 
        
               Tensor fractional_part = ttnn::subtract(input, floor_res, std::nullopt, output_mem_config); 
        
               Tensor is_half = ttnn::eq(fractional_part, 0.5, std::nullopt, output_mem_config); 
        
               Tensor rounded_half = 
        
                   ttnn::add(floor_res, is_odd(floor_res, output_mem_config), std::nullopt, output_mem_config); 
        
               return ttnn::where(is_half, rounded_half, rounded_non_half, output_mem_config.value()); 
        
           }

Then in turn, ttnn.floor calls functions that effectively computes ttnn.round.

tt-metal/tt_metal/hw/ckernels/wormhole_b0/metal/llk_api/llk_sfpu/ckernel_sfpu_floor.h

Lines 25 to 26 in 679b8d5

    
           vInt tmp = float_to_int16(result, 0); //TODO: Replace float_to_int16 to float_to_int32 once it is available 
        
           result = int32_to_float(tmp, 0);

Rounding to a nearest integer is extremely useful for argument reduction. We can reuse a direct implementation in other mathematical functions (mostly elementary functions) such as:

Exponential functions
Trigonometric functions
ttnn.pow

The text was updated successfully, but these errors were encountered:

mouliraj-mcw · 2024-10-17T07:37:05Z

Hi @jdh8 ,
I examined your approach and found that it doesn't address rounding to a specific number of decimal places (i.e., 2 or 3 decimal places).
Could you please share your thoughts on how this could be managed?

jdh8 · 2024-10-17T09:27:03Z

Thanks for pointing it out! I missed the parameter decimals.

It can be managed with multiplication by 10ⁿ. To be specific,

round(x, n) = 10**-n * round(10**n * x)

jdh8 · 2024-10-17T09:31:26Z

I have two proposals:

Implement a native roundeven(x) as conceptually round(x, 0), and then make round(x, n) on top of roundeven. (Named after C23 roundeven)
Make a direct, native round(x, n).

Which approach looks better?

mouliraj-mcw · 2024-10-18T09:35:22Z

I think approach two would be more suitable, as it has a straightforward structure.

umadevimcw · 2025-01-07T11:34:27Z

@jdh8 Tested the rounding in the jdh8/direct-rounding branch with reference to this comment #13851 (review) and observed that

torch round uses banker's rounding algorithm where it follows round nearest even approach, for example

for the input 94.5

Torch result is 94 whereas in
TT the result is 95 which results in test case failure

94.5 is halfway between 94 and 95 so hence the value is rounded to the nearest even and the result becomes 94 which needs to handled in our TT implementation

Please find the image below (Red is TT's output and Green is Torch output)

jdh8 added the feature-request External feature request label Oct 2, 2024

github-actions bot added the community label Oct 2, 2024

jdh8 added op_cat: eltwise perf for issues tracking performance problems/improvements and removed community labels Oct 15, 2024

eyonland mentioned this issue Oct 15, 2024

Eltwise Master Tracking #13795

Open

jdh8 added a commit that referenced this issue Oct 16, 2024

#13385: Direct kernel function for ttnn.round

a9f8676

jdh8 linked a pull request Oct 16, 2024 that will close this issue

#13385: Direct kernel function for ttnn.round #13851

Open

8 tasks

jdh8 added a commit that referenced this issue Oct 16, 2024

#13385: Consolidate nested namespace as suggested by clang-tidy

0a0cf76

jdh8 added a commit that referenced this issue Oct 16, 2024

#13385: Direct kernel function for ttnn.round

bc25616

jdh8 added a commit that referenced this issue Oct 16, 2024

#13385: Consolidate nested namespace as suggested by clang-tidy

6426559

jdh8 added a commit that referenced this issue Oct 16, 2024

#13385: Remove current implementation of ttnn::round

9e28b02

jdh8 added a commit that referenced this issue Oct 16, 2024

#13385: Binding for ttnn.round

ddc42a1

jdh8 added a commit that referenced this issue Oct 16, 2024

#13385: Low-level interface for ttnn.round on Wormhole

7ec2b44

jdh8 added a commit that referenced this issue Oct 17, 2024

#13385: Avoid using APPROX, which is possibly a reserved word

2362fd4

jdh8 added a commit that referenced this issue Oct 19, 2024

#13385: Direct kernel function for ttnn.round

a0b7be9

jdh8 added a commit that referenced this issue Oct 19, 2024

#13385: Consolidate nested namespace as suggested by clang-tidy

e16a06a

jdh8 added a commit that referenced this issue Oct 19, 2024

#13385: Remove current implementation of ttnn::round

96e0361

jdh8 added a commit that referenced this issue Oct 19, 2024

#13385: Binding for ttnn.round

f50a2c3

jdh8 added a commit that referenced this issue Oct 19, 2024

#13385: Low-level interface for ttnn.round on Wormhole

1471c96

jdh8 added a commit that referenced this issue Oct 19, 2024

#13385: Avoid using APPROX, which is possibly a reserved word

9053c04

jdh8 added a commit that referenced this issue Oct 24, 2024

#13385: Direct kernel function for ttnn.round

d2b14a9

jdh8 added a commit that referenced this issue Oct 24, 2024

#13385: Consolidate nested namespace as suggested by clang-tidy

3732aa0

jdh8 added a commit that referenced this issue Oct 24, 2024

#13385: Remove current implementation of ttnn::round

da69555

jdh8 added a commit that referenced this issue Oct 24, 2024

#13385: Binding for ttnn.round

603fb53

jdh8 added a commit that referenced this issue Oct 24, 2024

#13385: Low-level interface for ttnn.round on Wormhole

60d363a

jdh8 added a commit that referenced this issue Oct 24, 2024

#13385: Avoid using APPROX, which is possibly a reserved word

cfb162a

jdh8 added a commit that referenced this issue Oct 24, 2024

#13385: Add decimals: int parameter to ttnn.round

ae03f5d

jdh8 added a commit that referenced this issue Oct 24, 2024

#13385: Direct kernel function for ttnn.round

ac30df5

jdh8 added a commit that referenced this issue Oct 24, 2024

#13385: Consolidate nested namespace as suggested by clang-tidy

268aace

jdh8 added a commit that referenced this issue Oct 24, 2024

#13385: Remove current implementation of ttnn::round

c11ae9a

jdh8 added a commit that referenced this issue Oct 24, 2024

#13385: Binding for ttnn.round

73680bd

jdh8 added a commit that referenced this issue Oct 24, 2024

#13385: Low-level interface for ttnn.round on Wormhole

eb6cc3b

jdh8 added a commit that referenced this issue Oct 24, 2024

#13385: Avoid using APPROX, which is possibly a reserved word

a272ecb

jdh8 added a commit that referenced this issue Oct 24, 2024

#13385: Add decimals: int parameter to ttnn.round

fe86a67

jdh8 added a commit that referenced this issue Oct 24, 2024

#13385: Try fixing registration of ttnn.round

e28ea65

jdh8 added a commit that referenced this issue Oct 24, 2024

#13385: Fix API for ttnn.round

f713ec5

jdh8 added a commit that referenced this issue Oct 24, 2024

#13385: Complete kernel of ttnn.round

cd424ac

jdh8 added a commit that referenced this issue Oct 25, 2024

#13385: Try rounding to decimal places in the kernel

86381b5

jdh8 added a commit that referenced this issue Oct 25, 2024

#13385: Find 10^x with LUT

32e76c9

jdh8 added a commit that referenced this issue Oct 27, 2024

#13385: Direct kernel function for ttnn.round

3bbd3bd

jdh8 added a commit that referenced this issue Oct 27, 2024

#13385: Consolidate nested namespace as suggested by clang-tidy

7f3da9a

jdh8 added a commit that referenced this issue Oct 27, 2024

#13385: Remove current implementation of ttnn::round

d587b15

jdh8 added a commit that referenced this issue Oct 27, 2024

#13385: Binding for ttnn.round

521d36a

jdh8 added a commit that referenced this issue Oct 27, 2024

#13385: Low-level interface for ttnn.round on Wormhole

144a5bd

jdh8 added a commit that referenced this issue Oct 27, 2024

#13385: Avoid using APPROX, which is possibly a reserved word

b0d1ef7

jdh8 added a commit that referenced this issue Oct 27, 2024

#13385: Add decimals: int parameter to ttnn.round

af777cf

jdh8 added a commit that referenced this issue Oct 27, 2024

#13385: Try fixing registration of ttnn.round

e9b3593

jdh8 added a commit that referenced this issue Oct 27, 2024

#13385: Fix API for ttnn.round

8a5be87

jdh8 added a commit that referenced this issue Oct 27, 2024

#13385: Complete kernel of ttnn.round

5e5f867

jdh8 added a commit that referenced this issue Oct 27, 2024

#13385: Try rounding to decimal places in the kernel

b4a4d6e

jdh8 added a commit that referenced this issue Oct 27, 2024

#13385: Find 10^x with LUT

20bc7ec

jdh8 added a commit that referenced this issue Oct 27, 2024

#13385: Implement ttnn.round for Blackhole as well

c15ea55

eyonland added the MCW label Dec 20, 2024

eyonland assigned umadevimcw Dec 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize `ttnn.round` with a direct implementation #13385

Optimize `ttnn.round` with a direct implementation #13385

jdh8 commented Oct 2, 2024

mouliraj-mcw commented Oct 17, 2024

jdh8 commented Oct 17, 2024

jdh8 commented Oct 17, 2024 •

edited

Loading

mouliraj-mcw commented Oct 18, 2024

umadevimcw commented Jan 7, 2025 •

edited

Loading

Optimize ttnn.round with a direct implementation #13385

Optimize ttnn.round with a direct implementation #13385

Comments

jdh8 commented Oct 2, 2024

mouliraj-mcw commented Oct 17, 2024

jdh8 commented Oct 17, 2024

jdh8 commented Oct 17, 2024 • edited Loading

mouliraj-mcw commented Oct 18, 2024

umadevimcw commented Jan 7, 2025 • edited Loading

Optimize `ttnn.round` with a direct implementation #13385

Optimize `ttnn.round` with a direct implementation #13385

jdh8 commented Oct 17, 2024 •

edited

Loading

umadevimcw commented Jan 7, 2025 •

edited

Loading