-
-
Notifications
You must be signed in to change notification settings - Fork 983
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Wildly different results from simply swapping the order of benchmarks #2004
Comments
@space-alien thanks for the report. Could you please share the source code of these benchmarks? |
Apologies for the delay; I have been investigating further with a view to producing a minimal repro. However, I think I have identified the issue. My benchmark-holding class has a static member, whose type has a static constructor that performs some non-trivial initialization. The type's static constructor is being invoked during the benchmarking runs, skewing the results. This can be solved by taking control of when the static member is initialized, e.g. by adding the following to the benchmark-holding class: [GlobalSetup]
public void Warmup()
{
Type myClass = typeof(ClassWithSlowStaticConstructor);
System.Runtime.CompilerServices.RuntimeHelpers.RunClassConstructor(myClass.TypeHandle);
} In my case, using a static member on my benchmark class was unnecessary. Not sure whether this kind of scenario might warrant a warning in the docs? |
Static constructors are only ran once. So only the first warm-up iteration should measure the impact and subsequent iterations should not. Are you measuring a cold start? A regular run should not include static constructors in the results. |
That's what I would have expected, and was confusing me too. I fiddled around a bit more and I now have a repro, although I may have inadvertently stumbled into a different issue... A slow static constructor seems to break the automatic selection of number of iterations in the pilot phase. In the code below, a microbenchmark only gets benchmarked over a single op, and the results are of course useless. A delay of >10s seems to be the point at which the pilot phase switches to just a single op for the benchmark runs. Perhaps the pilot phase could handle the possibility of a slow static constructor when determining the optimal number of iterations to run. For example, by ignoring the time taken by the first invocation, or detecting if subsequent invocations take vastly less time, and increasing the number of ops. Now... while this is interesting, I'm pretty sure this is NOT the cause of the issue I originally experienced - I would have noticed if my method was only being invoked once, and the static constructor in my original code certainly did not take so long to run. I will continue to investigate. using BenchmarkDotNet.Attributes;
namespace BenchmarkDotNetRepro2;
[DisassemblyDiagnoser]
public class MicroBenchmark
{
// Instance member:
ulong _instanceCount;
// Static member:
// Initialized when first accessed, which occurs before the start of the second benchmark: StaticMember()
// The initializer for the static field `_unusedStaticField` gets called at the same time.
// Which, in turn, invokes the SlowWarmupClass static constructor.
static ulong _staticCount;
// If this field is not static, it is instantiated along with the MicroBenchmark class,
// and the slow static constructor will not affect benchmarking.
static readonly SlowWarmupClass _unusedStaticField = new();
[Benchmark]
public ulong InstanceMember()
{
_instanceCount++;
return _instanceCount;
}
[Benchmark]
public ulong StaticMember()
{
_staticCount++;
return _staticCount;
}
}
public class SlowWarmupClass
{
static SlowWarmupClass()
{
Console.WriteLine(" <<<< SlowWarmup static constructor >>>> ");
Thread.Sleep(11000); // Simulate non-trivial initialization
}
} |
PS: I updated to 0.13.1.1796 on the BDN nightly feed, and the threshold above which we only get one op appears to have increased to 10s, although the issue still remains. I've updated the repro above (and the following comment above):
|
Seems related to #1210 |
Faced with seemingly inconsistent and inscrutable benchmark results, I followed a hunch and swapped the order of appearance of my two benchmark methods.
The following results are consistently repeatable.
A then B:
Swap the order:
Nothing has changed between these two runs, aside from swapping the order of the two benchmark methods on the benchmark class.
How should I diagnose what is going on here?
Environment info:
I'm running Windows 11, but the output reports Windows 10, so I have stripped that.
The text was updated successfully, but these errors were encountered: