-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
.NET 5.0 Microbenchmarks Performance Study Report #41871
Comments
I couldn't figure out the best area label to add to this issue. If you have write-permissions please help me learn by adding exactly one area label. |
Great work @adamsitnik . I am pleased that our systems have improved since last cycle, and that you and @DrewScoggins will be using this exercise to improve them such that more issues are found earlier and not at the end of the cycle.
I thought the perf lab did not cover Mac and only had one type of CPU. How can we catch such regression before the end of the cycle? |
You are right.
I meant that in my study I have gathered macOS and different CPU families results but I did not find any important regressions (specific to macOS etc), so adding macOS or more types of CPUs to perf lab is not needed |
Could an effort like this (script run over manual collected submissions) be made cheap enough that we could do it occasionally during the cycle? |
@adamsitnik Great job! Based on the data out of this exercise, we will add Alpine to .net perf lab. |
Is the collected data internal only or available publicly as well? |
We talked about this offline, but sharing the comment here too... We'll be putting effort into that between now and the first 6.0 previews with the goal of completing targeting manual runs for each of the 6.0 Preview/RC releases. |
Just for clarity here. We will have a brand new batch of AMD hardware this calendar year. So we will not be covering MacOS in the lab, but we will have both Intel and AMD hardware for coverage. |
@DrewScoggins, do we know what the hardware specs are? |
I do not, @billwert did the work of speccing out the machines. |
@adamsitnik -- that sharepoint link didn't work for me. Can you just publish the data to GH? |
My mistake. I was using the wrong browser profile. |
Goals
The main goal of my study was to ensure that we ship .NET 5.0 without any performance regressions and validate whether in the near future we can fully rely on the regressions auto-filing bot written by @DrewScoggins.
My other goal was to get .NET Library Team members involved and keep on growing the performance culture.
#tl;dr The bot is doing a great job in detecting regressions. Most serious regressions have been already fixed, however a few investigations are still in progress.
Methodology (and how it evolved)
In 2018 I had the pleasure to review @AndreyAkinshin "Pro .NET Benchmarking" book. The "Statistics for Performance Engineers" and "Performance Analysis and Performance Testing" chapters inspired me to implement a small tool called Results Comparer. The tool uses the Mann-Whitney U statistical test to detect performance regressions in results exported by BenchmarkDotNet. It's being used (or at least it should) as part of our benchmarking workflow to prevent introducing regressions to .NET.
In 2019 I was asked by @danmosemsft to verify .NET Core 3.0 performance. Initially, I’ve run all the microbenchmarks from dotnet/performance repository using a single machine with dual boot for Windows 10 and Ubuntu 18.04 x64 and used the Results Comparer to find regressions. It very quickly turned out that such a sample was way too small to make sure that we don’t have any regressions. Some benchmarks were simply unstable, some architectures like ARM and ARM64 were simply not covered. Other Linux distros and CPU families were also not covered.
Then I’ve run the benchmarks on all the PCs, laptops, and VMs that I could access. But I was still missing AMD and ARM results, so I've asked @tannergooding and @BruceForstall for help. @tannergooding has run the benchmarks on all his AMD machines. @BruceForstall has provided me access to a document that explains how to use ARM machines owned by the JIT Team. This turned out to be an invaluable help as I've used these machines many, many times. Including this year during the 5.0 investigation.
After having enough samples to cover our matrix of supported OSes and architectures, I’ve built a simple console app on top of
ResultsComparer
(source code available here). The tool uses the very same statistical test to detect regressions, aggregates the results from all different configurations, and sorts them from the biggest regression to the biggest improvement.Such approach allows for very quick identification of regressions of all kinds:
System.Linq.Tests.Perf_Enumerable.FirstWithPredicate_LastElementMatches(input: IOrderedEnumerable)
System.Globalization.Tests.StringSearch.IsPrefix_DifferentFirstChar(Options: (en-US, IgnoreSymbols, False))
System.Threading.Tests.Perf_CancellationToken.Cancel
System.Buffers.Text.Tests.Base64EncodeDecodeInPlaceTests.Base64EncodeInPlace(NumberOfBytes: 200000000)
Using the tool had one major flaw: it was not automated and hence we were finding out about the regressions only when we searched for them.
This has been recognized and a new project has been started. In 2020 @DrewScoggins started implementing a GitHub bot that would be using the data gathered from performance lab (a set of machines owned by .NET Performance Team) microbenchmark runs to detect and auto-file the regressions. So far the bot was reporting new issues in a dedicated repository and once a week the workgroup led by @DrewScoggins that consisted of @AndyAyersMS, @kunalspathak, @tannergooding any myself was going through the list and triaging the issues. Issues that were seemed as actual regressions were labeled as Needs Transfer and were later moved by @DrewScoggins to the runtime repo.
A few weeks ago we were getting close to "code freeze" for .NET 5 and I have asked myself a question: are we sure that the bot has reported all possible regressions for all the supported OS versions?
The bot is using different statistical methods to detect regressions and so far it has been enabled only for Windows 10 x64, Ubuntu 18.04 x64, and Windows 10 x86. So I've decided to spend some time and use the old tool that I wrote to verify it. To increase the sample size and get other .NET Libraries Team members involved, I've simply asked the Team to run the benchmarks and share the results with me.
Running the performance repo microbenchmarks against the latest .NET Core SDK is super easy thanks to a python script implemented by @jorive. The script downloads the right SDK and starts benchmarking with cleared environment variables.
Data
The data I've received from the .NET Libraries Team members allowed me a big part of the entire matrix of the supported configurations:
Everyone interested can download the data from here. The full report generated by the tool is available here.
Moreover, the full historical data turned out to be extremely useful. I've used it every time I was not sure whether something was a regression or just unstable|multimodal benchmark:
Regressions
Already fixed
System.Collections.Contains*
,System.Memory.SequenceReader.TryReadTo
,System.Text.Json.Tests.Perf_Segment.ReadSingleSegmentSequenceByN
x86
andARM
)System.Collections.CtorGivenSize<Int32>.Array(Size: 512)
System.Numerics.Tests.Perf_Quaternion.Conjugate
andSystem.Numerics.Tests.Perf_Quaternion.Negat*
Directory.EnumerateFiles
ByteMark.BenchIDEAEncryption
System.Text.Perf_Utf8Encoding
Investigation in progress
System.Memory.Slice
PerfLabTests.CastingPerf2.CastingPerf.IntObj
By design or Acceptable
System.Globalization.Tests.StringSearch
: detected by the bot, reported in [Perf -1,796%] System.Globalization.Tests.StringSearch (33) #37819System.Memory.ReadOnlySpan.IndexOfString
: detected by the bot, reported in [Perf -14%] System.Memory.ReadOnlySpan.IndexOfString (2) #39724System.Globalization.Tests.Perf_DateTimeCultureInfo.Parse(culturestring: ja)
: detected by the bot, reported in [Perf - 10-20x regression] System.Globalization.Tests.Perf_DateTimeCultureInfo.Parse in ja #37807System.Globalization.Tests.StringEquality
: detected by the bot, reported in [Perf -97%] System.Globalization.Tests.StringEquality (8) #39038OrdinalIgnoreCase
has been optimized in Port Ordinal Ignore Case Optimization changes #40962System.Linq.Tests.Perf_Enumerable.FirstWithPredicate_LastElementMatches(input: IOrderedEnumerable)
O(N log N)
cost of theOrderBy
[Perf -492%] System.Linq.Tests.Perf_Enumerable.FirstWithPredicate_LastElementMatches #39032 (comment)System.Collections.Tests.Perf_BitArray.*(Size: 4)
System.Threading.Tests.Perf_Thread.GetCurrentProcessorId
PerfLabTests.CastingPerf.CheckIsInstAnyIsInterfaceNo
,PerfLabTests.CastingPerf.CheckObjIsInterfaceNo
System.Net.NetworkInformation.Tests.PhysicalAddressTests.PAShort
System.Numerics.Tests.Perf_Vector*.GetHashCodeBenchmark
System.Net.Primitives.Tests.CredentialCacheTests.ForEach(uriCount: 0, hostPortCount: 0)
Moved to 6.0
System.Tests.Perf_Char.GetUnicodeCategory(c: '?')
PerfLabTests.StackWalk.Walk
System.Tests.Perf_String.Replace_Char(text: "Hello", oldChar: 'l', newChar: '!')
System.Text.Perf_Utf8String.IsAscii(Input: EnglishAllAscii)
Utf8String
is still only experimentalSystem.Text.Encodings.Web.Tests.Perf_Encoders.EncodeUtf8
Unstable or multimodal benchmarks
There was of course more of them, here are the ones that I've noted to use as Contract Tests in the near future (to reduce the noise produced by the bot):
System.Buffers.Tests.RentReturnArrayPoolTests<Byte>.ProducerConsumer
System.Memory.ReadOnlySequence.Slice_Repeat_StartPosition_And_EndPosition(Segment: Multiple)
PerfLabTests.BlockCopyPerf.CallBlockCopy
System.Tests.Perf_String.Trim_CharArr(s: "Test", c: [' ', ' '])
System.Threading.Tests.Perf_Interlocked.CompareExchange_long
10ns
, but sometimesx100
that. Only forx86
. I need logs to verify whether it's a BDN bug or not.System.Memory.Span<Int32>.IndexOfValue(Size: 512)
Benchstone.BenchI.Fib.Test
Summary
GNU libc
based Linux distros like Ubuntu is not enough to detectmusl libc
specific regressions. We should consider adding Alpine runs to the perf lab.macOS
and different CPU families were discovered. It has proven that the perf lab has good hardware coverage.Big thanks to everyone involved!
The text was updated successfully, but these errors were encountered: