-
Notifications
You must be signed in to change notification settings - Fork 273
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Port CoreClr benchmarks to BenchmarkDotNet #36
Conversation
…ve the Id which is exported for BenchView purpose
return fullPath; | ||
} | ||
|
||
internal static int GetFileLength(string filePath) => (int) new FileInfo(filePath).Length; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(int) [](start = 62, length = 5)
Does it need to be int
? Can we keep long
instead? The API user can cast the result when needed. Thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure thing, I added the cast here because it was hardcoded to an int
previously
src/benchmarks/Program.cs
Outdated
var baseJob = Job.ShortRun; // let's use the Short Run for better first user experience ;) | ||
var baseJob = Job.Default | ||
.WithWarmupCount(1) // 1 warmup is enough for our purpose | ||
.WithMaxTargetIterationCount(20); // we don't want to run more that 20 iterations |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
20 [](start = 45, length = 2)
Can we override this at runtime? Maybe it should be a command line option with default.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no as of today, but I can add it to command line
@@ -58,6 +61,10 @@ private static IEnumerable<Job> GetJobs(Options options, Job baseJob) | |||
|
|||
if (options.RunClr) | |||
yield return baseJob.With(Runtime.Clr); | |||
if (options.RunLegacyJitX64) | |||
yield return baseJob.With(Runtime.Clr).With(Jit.LegacyJit).With(Platform.X64); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LegacyJit [](start = 64, length = 9)
Wasn't legacy removed just recently from coreclr repo? Is it still needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, but this is for Legacy Jit for .NET (not Core), so you can easily compare both Jits (surprisingly the old one can be better in some benchmarks)
|
||
namespace Benchstone.BenchF | ||
{ | ||
public class DMath |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Classes cannot be static? Just wonder #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, BDN derives a type from the type with benchmarks to add all the boilerplate. So the class cannot be sealed either static.
However, if you want you can benchmark static type, but you need non-static type with benchmark to do that:
static class Sth
{
void Method() { }
}
public class SthBenchmarks
{
[Benchmark] public void Method() => Sth.Method();
}
#Resolved
@adamsitnik BenchView handles any metric you throw at it. It is a matter of passing the right units when you upload. Currently, we pass all results in milliseconds, and I would prefer that benchmarks run for longer than nano-seconds. |
@adamsitnik As long as we are avoiding duplication and not losing coverage of a particular scenario, I am all for trimming redundancy. |
…up benchmark, it allows to preserve ID
@jorive To avoid any scaling issues, I reverted the changes in PerfLabTests, so it's 1:1 |
…e it after we port all the benchmarks
…per fix has been applied to BDN
…vs no affinity and the permutation of it
…enchmarks more stable
…imized to an empty loop
…opy paste it to an excel with some formulas
…rite about default value, command line parser does it out of the box
I removed the workaround and added proper implementation of |
@jorive Ok, I believe that the port is COMPLETE ;) I have removed all previous comments to keep the issue more clean. I am going to post summary in a few minutes |
SummaryTimeBenchmarkDotNet needs 42 minutes to run all 304 CoreCLR benchmarks, xunit-performance needed 38 minutes (on my box of course). Initially BDN needed 4 hours, I had to improve few things to get it so close and I started the preparations to this port long time ago, more details here dotnet/BenchmarkDotNet#550 Quality of the resultsThe quality of the results has improved, the majority of the benchmarks have better (more narrow) distribution. Some of the benchmarks have much better distribution. Two benchmarks out of 304 have worse distribution ( Unstable benchmarksWe have at least two multimodal benchmarks: BinaryTrees_5 (#39) and SpectralNorm_3 (#41) BenchmarkDotNet prints some nice, user-friendly histograms and gives a very clear warning about the problem.
Result differences
A very good example of the difference is
But when executed with many other benchmarks in same process with xunit was reporting 50% more time:
Huge differences: Loop Aligment dependent benchmarksFew of the benchmarks are very heavy dependent on loop alignment: dotnet run -c Release -f netcoreapp2.1 -- --join --class IniArray --testAlignment
dotnet run -c Release -f netcoreapp2.1 -- --join --method Log10DoubleBenchmark --testAlignment
dotnet run -c Release -f netcoreapp2.1 -- --join --method SinhSingleBenchmark --testAlignment
Huge differences: GC dependent benchmarks:Single execution of
Removed benchmarksI have not ported the serializer benchmarks because the benchmarks repo contains a lot of them and there was no need to duplicate it. IdsAll the ids except of removed benchmarks have been preserved, they can still be used in BenchView to keep the track of the performance Bugs found and fixedWhen I was comparing the results I found few huge differences which turned out to be a bugs in existing benchmarks Bug: empty loopsFollowing benchmarks have been optimized by JIT to empty loops (the loop body was constant):
I have fixed them, now they measure the right thing. Bug: growing multicast delegateThe delegate (md) was growing with every inner iteration of public void MulticastDelegateCombineInvoke()
{
MultiDelegate md = null; // THIS was growing in every iteration
Object obj = new Object();
foreach (var iteration in Benchmark.Iterations)
{
MultiDelegate md1 = new MultiDelegate(this.Invocable2);
// more code removed for brevity
using (iteration.StartMeasurement())
{
for (int i = 0; i < Benchmark.InnerIterationCount; i++)
{
md = (MultiDelegate)Delegate.Combine(md1, md);
md = (MultiDelegate)Delegate.Combine(md2, md);
md = (MultiDelegate)Delegate.Combine(md3, md);
md = (MultiDelegate)Delegate.Combine(md4, md);
md = (MultiDelegate)Delegate.Combine(md5, md);
md = (MultiDelegate)Delegate.Combine(md6, md);
md = (MultiDelegate)Delegate.Combine(md7, md);
md = (MultiDelegate)Delegate.Combine(md8, md);
md = (MultiDelegate)Delegate.Combine(md9, md);
md = (MultiDelegate)Delegate.Combine(md10, md);
}
}
}
md(obj, 100, 100);
} Bugs found and not fixedhttps://github.com/dotnet/coreclr/issues/18560 - |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
benchmarks
folder and then ported to BDN