Add benchmarks for InitBlock/CopyBlock operations #2154

echesakov · 2021-11-24T01:51:44Z

This adds benchmark for GT_STORE_BLK node in the JIT that responsibly for two operations in .NET - InitBlock and CopyBlock.

echesakov · 2021-11-24T01:59:13Z

@DrewScoggins @adamsitnik PTAL
cc @dotnet/jit-contrib

adamsitnik

Benchmarks overall look good, but I am afraid that they are mostly testing the same code path. We should reduce the numbers of benchmarks. PTAL at my comments and links that I've provided.

@echesakovMSFT thank you!

src/benchmarks/micro/runtime/StoreBlock/StoreBlock.tt

echesakov

@adamsitnik Thank you for your review! As I mentioned in the replies, at the moment, there is no difference between heap and localAddr code paths for InitBlock and CopyBlock. However, I am trying to implement an optimization to see if we can utilize 16-byte alignment of sp (and known alignment of fp) in the JIT on Arm64 and make a use of appropriate store operations.

For context: according to Arm Cortex-A76 Software Optimization Guide
(and similar guides for other microarchitectures) store operations that cross a 16-byte boundary can incur additional latency on Arm64. Hence, we should use these only when the JIT can proof that the store won't cross such boundary. The JIT won't be able to do this in a general case, but can do this for locals.

If you prefer, I can hold off on the benchmark PR until after dotnet/runtime#61030 is merged.

src/benchmarks/micro/runtime/StoreBlock/StoreBlock.tt

adamsitnik · 2021-11-25T09:09:52Z

If you prefer, I can hold off on the benchmark PR until after dotnet/runtime#61030 is merged.

@echesakovMSFT thank you for detailed answers! In such case it would be better to merge the benchmarks before dotnet/runtime#61030 so our Reporting System can start gathering data and when the PR is merged we can see how it affects the performance for all configs.

DrewScoggins · 2021-11-30T17:27:38Z

I am a little confused by the last line of the .tt file. We don't use it when generating the tests, we use the loop for that, so what is it there for? I assume the plan is to instead of walking by eight bytes in the for loop we instead iterate over the array and use the values there to decide on the different sizes for the copying. If not, we should reduce the number of testcases that we generate across the different sizes and only use sizes we believe will give us meaningful differences between them.

echesakov · 2021-12-03T02:09:50Z

I am a little confused by the last line of the .tt file. We don't use it when generating the tests, we use the loop for that, so what is it there for? I assume the plan is to instead of walking by eight bytes in the for loop we instead iterate over the array and use the values there to decide on the different sizes for the copying. If not, we should reduce the number of testcases that we generate across the different sizes and only use sizes we believe will give us meaningful differences between them.

You are right, I was using the array at the last line before - but then I wanted to do some extra testing locally and forgot to revert the change. Updated and reduced the number of tests.

echesakov · 2021-12-03T02:10:46Z

@adamsitnik @DrewScoggins PTAL one more time - I believe I addressed all the suggestions.

…er limit used on x86/x64)

adamsitnik

LGTM, thank you @echesakovMSFT !

Add benchmarks for InitBlock/CopyBlock operations

93479ca

echesakov marked this pull request as ready for review November 24, 2021 01:58

adamsitnik reviewed Nov 24, 2021

View reviewed changes

echesakov commented Nov 24, 2021

View reviewed changes

Address feedback

2e80844

Extend the byteCounts array to include sizes up to 128 bytes (the upp…

e8e8f14

…er limit used on x86/x64)

echesakov force-pushed the InitBlock-CopyBlock-Benchmarks branch from 6e11a0e to e8e8f14 Compare December 3, 2021 02:26

echesakov mentioned this pull request Dec 3, 2021

Use SIMD operations in InitBlkUnroll/CopyBlkUnroll and increase unroll limit up to 128 bytes dotnet/runtime#61030

Merged

adamsitnik approved these changes Dec 3, 2021

View reviewed changes

adamsitnik merged commit 9e42179 into dotnet:main Dec 3, 2021

echesakov deleted the InitBlock-CopyBlock-Benchmarks branch December 3, 2021 15:53

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add benchmarks for InitBlock/CopyBlock operations #2154

Add benchmarks for InitBlock/CopyBlock operations #2154

echesakov commented Nov 24, 2021 •

edited

Loading

echesakov commented Nov 24, 2021

adamsitnik left a comment

echesakov left a comment

adamsitnik commented Nov 25, 2021

DrewScoggins commented Nov 30, 2021

echesakov commented Dec 3, 2021

echesakov commented Dec 3, 2021

adamsitnik left a comment

Add benchmarks for InitBlock/CopyBlock operations #2154

Add benchmarks for InitBlock/CopyBlock operations #2154

Conversation

echesakov commented Nov 24, 2021 • edited Loading

echesakov commented Nov 24, 2021

adamsitnik left a comment

Choose a reason for hiding this comment

echesakov left a comment

Choose a reason for hiding this comment

adamsitnik commented Nov 25, 2021

DrewScoggins commented Nov 30, 2021

echesakov commented Dec 3, 2021

echesakov commented Dec 3, 2021

adamsitnik left a comment

Choose a reason for hiding this comment

echesakov commented Nov 24, 2021 •

edited

Loading