Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance regression in ByteMark.BenchIDEAEncryption #41677

Closed
adamsitnik opened this issue Sep 1, 2020 · 15 comments · Fixed by #41838
Closed

Performance regression in ByteMark.BenchIDEAEncryption #41677

adamsitnik opened this issue Sep 1, 2020 · 15 comments · Fixed by #41838
Assignees
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI tenet-performance Performance related issue
Milestone

Comments

@adamsitnik
Copy link
Member

adamsitnik commented Sep 1, 2020

Repro

git clone https://github.com/dotnet/performance.git
py .\performance\scripts\benchmarks_ci.py -f netcoreapp3.1 netcoreapp5.0 --filter ByteMark.BenchIDEAEncryption

By looking at the full historical data it looks that this particular regression has been introduced in the last few weeks

obraz

And it's not related to memory or GC:

Method Runtime Mean Error StdDev Median Min Max Ratio Gen 0 Gen 1 Gen 2 Allocated
BenchIDEAEncryption .NET Core 3.1 949.9 ms 3.77 ms 3.53 ms 949.2 ms 942.4 ms 955.5 ms 1.00 - - - 610.2 KB
BenchIDEAEncryption .NET Core 5.0 1,452.8 ms 3.59 ms 3.36 ms 1,452.8 ms 1,446.5 ms 1,458.9 ms 1.53 - - - 610.2 KB

Docs

Profiling workflow for dotnet/runtime repository
Benchmarking workflow for dotnet/runtime repository

@DrewScoggins this regression did not get detected by the bot. I am not sure why, perhaps it's too fresh?

cc @AndyAyersMS it might be some interesting low-level regression I guess?

@adamsitnik adamsitnik added the tenet-performance Performance related issue label Sep 1, 2020
@adamsitnik adamsitnik added this to the 5.0.0 milestone Sep 1, 2020
@Dotnet-GitSync-Bot Dotnet-GitSync-Bot added area-System.Security untriaged New issue has not been triaged by the area owner labels Sep 1, 2020
@ghost
Copy link

ghost commented Sep 1, 2020

Tagging subscribers to this area: @bartonjs, @vcsjones, @krwq
See info in area-owners.md if you want to be subscribed.

@danmoseley
Copy link
Member

I assume this is Windows specific @adamsitnik ?

@vcsjones
Copy link
Member

vcsjones commented Sep 1, 2020

Note this isn't benchmarking anything in System.Security - its benchmarking a hand-rolled implementation of the IDEA algorithm.

@adamsitnik
Copy link
Member Author

I assume this is Windows specific @adamsitnik ?

@danmosemsft It looks like it's not Windows-specific

The Ubuntu historical data

obraz

@danmoseley
Copy link
Member

It looks like it's not Windows-specific

Ah, good info, I guess that means we can exclude machine configuration/the crypto work itself.

@bartonjs
Copy link
Member

bartonjs commented Sep 1, 2020

I guess that means we can exclude ... the crypto work itself.

It's not using the crypto stack, it's a custom algorithm implementation inside the performance repository just for use as a benchmark. So it's just low level things like array allocation, array read/write, field access, and GC.

https://github.com/dotnet/performance/blob/454476401e17ed7f4d8b899ecf7661eb6cd63bad/src/benchmarks/micro/runtime/Bytemark/ByteMark.cs

https://github.com/dotnet/performance/blob/8aed638c9ee65c034fe0cca4ea2bdc3a68d2a6b5/src/benchmarks/micro/runtime/Bytemark/idea.cs

@danmoseley
Copy link
Member

Disregard, I missed your comment @vcsjones

@AndyAyersMS AndyAyersMS added area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI and removed area-System.Security labels Sep 2, 2020
@AndyAyersMS
Copy link
Member

This is a jit specific benchmark, will relabel as codegen.

@AndyAyersMS
Copy link
Member

History strongly implicates #40535. Wonder if #40871 has any impact here?

cc @briansull @echesakovMSFT @dotnet/jit-contrib

@AndyAyersMS AndyAyersMS removed the untriaged New issue has not been triaged by the area owner label Sep 2, 2020
@briansull briansull self-assigned this Sep 2, 2020
@briansull
Copy link
Contributor

This is known and due to #40535

@briansull
Copy link
Contributor

briansull commented Sep 2, 2020

From comments in #40535

it has many unsafe casts and is a awkward port of the C code:
The regression occurs in
private static void cipher_idea(byte[] xin, byte[] xout, int offset, char[] Z)
Where we inline the MUL and low16 methods, into the inner loop.
This forces x1, x4, t2 and t1 to become GT_LCL_FLD and thus memory accesses.

            MUL(ref x1, Z[idx++]);
            MUL(ref x4, Z[idx++]);
            MUL(ref t2, Z[idx++]);
            MUL(ref t1, Z[idx++]);

// These were macros in the original C code

/* #define low16(x) ((x) & 0x0FFFF) */
private static char low16(int x)
{
    return (char)((x) & 0x0FFFF);
}

/* #define MUL(x,y) (x=mul(low16(x),y)) */
private static void MUL(ref char x, char y)
{
    x = mul(low16(x), y);
}

@briansull
Copy link
Contributor

briansull commented Sep 4, 2020

This regression is due to a necessary correctness fix for Issue #620

This item regressed with the first fix #40535
And the second fix #40871 also has the same regression
In both cases the only code in that has a regression is the ByteMark IDEA benchmark
Since 5.0 is locked down we will investigate this further and fix it in .Net 6.0

@echesakov
Copy link
Contributor

echesakov commented Sep 5, 2020

This regression is due to a necessary correctness fix for Issue #620

This item regressed with the first fix #40535
And the second fix #40871 also has the same regression

@briansull I don't think your conclusion was correct. I re-ran ByteMark.BenchIDEAEncryption benchmark with the following versions of .NET Core:

  1. 3.1.303
  2. 5.0.100-rc.1.20454.5 that includes Minimal fix for Issue 620 #40535
  3. 6.0.0-alpha.1.20453.28 that is based on 09c9d10 and includes Alternative fix for folding of *(typ*)&lclVar for small types #40607 #40871
  4. Private build of [release/5.0-rc2] Fix for folding of *(typ*)&lclVar for small types #41838 merged onto a916def

The results show that the performance with my fix in #40871 is no worse than 3.1.303. In fact, the performance with the fix is slightly better.

3.1.303

BenchmarkDotNet=v0.12.1.1405-nightly, OS=Windows 10.0.19041.450 (2004/May2020Update/20H1)
Intel Core i7-6700 CPU 3.40GHz (Skylake), 1 CPU, 8 logical and 4 physical cores
.NET Core SDK=3.1.303
  [Host]     : .NET Core 3.1.6 (CoreCLR 4.700.20.26901, CoreFX 4.700.20.31603), X64 RyuJIT
  Job-UVDUWY : .NET Core 3.1.6 (CoreCLR 4.700.20.26901, CoreFX 4.700.20.31603), X64 RyuJIT

PowerPlanMode=00000000-0000-0000-0000-000000000000  Runtime=.NET Core 3.1  Arguments=/p:DebugType=portable
Toolchain=netcoreapp3.1  IterationTime=250.0000 ms  MaxIterationCount=20
MinIterationCount=15  WarmupCount=1
Method Mean Error StdDev Median Min Max Gen 0 Gen 1 Gen 2 Allocated
BenchIDEAEncryption 952.8 ms 11.77 ms 10.43 ms 955.8 ms 928.0 ms 966.4 ms - - - 610.2 KB

5.0.100-rc.1.20454.5

BenchmarkDotNet=v0.12.1.1405-nightly, OS=Windows 10.0.19041.450 (2004/May2020Update/20H1)
Intel Core i7-6700 CPU 3.40GHz (Skylake), 1 CPU, 8 logical and 4 physical cores
.NET Core SDK=5.0.100-rc.1.20454.5
  [Host]     : .NET Core 5.0.0 (CoreCLR 5.0.20.45114, CoreFX 5.0.20.45114), X64 RyuJIT
  Job-HDRRWB : .NET Core 5.0.0 (CoreCLR 5.0.20.45114, CoreFX 5.0.20.45114), X64 RyuJIT

PowerPlanMode=00000000-0000-0000-0000-000000000000  Runtime=.NET Core 5.0  Arguments=/p:DebugType=portable
Toolchain=netcoreapp5.0  IterationTime=250.0000 ms  MaxIterationCount=20
MinIterationCount=15  WarmupCount=1
Method Mean Error StdDev Median Min Max Gen 0 Gen 1 Gen 2 Allocated
BenchIDEAEncryption 1.445 s 0.0110 s 0.0098 s 1.444 s 1.425 s 1.459 s - - - 610.2 KB

6.0.0-alpha.1.20453.28

BenchmarkDotNet=v0.12.1.1405-nightly, OS=Windows 10.0.19041.450 (2004/May2020Update/20H1)
Intel Core i7-6700 CPU 3.40GHz (Skylake), 1 CPU, 8 logical and 4 physical cores
.NET Core SDK=6.0.100-alpha.1.20454.7
  [Host]     : .NET Core 6.0.0 (CoreCLR 6.0.20.45328, CoreFX 6.0.20.45328), X64 RyuJIT
  Job-YABALU : .NET Core 6.0.0 (CoreCLR 6.0.20.45328, CoreFX 6.0.20.45328), X64 RyuJIT

PowerPlanMode=00000000-0000-0000-0000-000000000000  Runtime=.NET Core 5.0  Arguments=/p:DebugType=portable
Toolchain=netcoreapp5.0  IterationTime=250.0000 ms  MaxIterationCount=20
MinIterationCount=15  WarmupCount=1
Method Mean Error StdDev Median Min Max Gen 0 Gen 1 Gen 2 Allocated
BenchIDEAEncryption 905.3 ms 6.77 ms 5.65 ms 905.5 ms 894.4 ms 917.6 ms - - - 610.2 KB

#41838 merged onto release/5.0-rc2

BenchmarkDotNet=v0.12.1.1405-nightly, OS=Windows 10.0.19041.450 (2004/May2020Update/20H1)
Intel Core i7-6700 CPU 3.40GHz (Skylake), 1 CPU, 8 logical and 4 physical cores
.NET Core SDK=6.0.100-alpha.1.20454.7
  [Host]     : .NET Core 6.0.0 (CoreCLR 6.0.20.45328, CoreFX 6.0.20.45328), X64 RyuJIT
  Job-OWYMAW : .NET Core 5.0 (CoreCLR 42.42.42.42424, CoreFX 42.42.42.42424), X64 RyuJIT

PowerPlanMode=00000000-0000-0000-0000-000000000000  Arguments=/p:DebugType=portable  Toolchain=CoreRun
IterationTime=250.0000 ms  MaxIterationCount=20  MinIterationCount=15
WarmupCount=1
Method Mean Error StdDev Median Min Max Gen 0 Gen 1 Gen 2 Allocated
BenchIDEAEncryption 924.3 ms 13.43 ms 12.57 ms 921.3 ms 900.6 ms 946.2 ms - - - 611.04 KB

If #41838 is merged it will close this issue as well.

@echesakov
Copy link
Contributor

Closed by #41838

cc @JulieLeeMSFT

@JulieLeeMSFT
Copy link
Member

Great that we fixed this issue with 41848.

@adamsitnik adamsitnik removed this from the 6.0.0 milestone Sep 17, 2020
@adamsitnik adamsitnik added this to the 5.0.0 milestone Sep 17, 2020
@ghost ghost locked as resolved and limited conversation to collaborators Dec 7, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI tenet-performance Performance related issue
Projects
None yet
Development

Successfully merging a pull request may close this issue.

9 participants