Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MinIterationTime, WarmupCount, and tiered JIT #1993

Open
AndreyAkinshin opened this issue Apr 15, 2022 · 4 comments
Open

MinIterationTime, WarmupCount, and tiered JIT #1993

AndreyAkinshin opened this issue Apr 15, 2022 · 4 comments

Comments

@AndreyAkinshin
Copy link
Member

Currently, I'm working on reducing the default value for MinIterationTime (the current value is 500 ms). I believe that we could achieve the same level of accuracy with a lower value of MinIterationTime that will allow us to significantly reduce the total benchmarking time. The only issue I observe right now is about the tiered JIT. Let's consider the following benchmark:

[Config(typeof(MyConfig))]
public class TieredJitIssue
{
    public class MyConfig : ManualConfig
    {
        public MyConfig()
        {
            var baseJob = Job.Default.WithMinIterationTime(TimeInterval.Millisecond).WithId("Base");
            AddJob(baseJob);
            foreach (var warmupCount in new[] {10, 200})
                AddJob(baseJob.WithWarmupCount(warmupCount).WithId("W" + warmupCount)); ;
        }
    }
    
    [Params(1)]
    public int N { get; set; }
    
    [Benchmark]
    public int Foo() => DoIt(N);

    [MethodImpl(MethodImplOptions.NoInlining)]
    private static int DoIt(int m)
    {
        var sum = 0;
        for (var i = 0; i < m; i++)
            sum += i * i;
        return sum;
    }
}

Here are the results on my machine:

BenchmarkDotNet=v0.13.1, OS=Windows 10.0.22000
AMD Ryzen 9 5950X, 1 CPU, 32 logical and 16 physical cores
.NET SDK=6.0.200
  [Host] : .NET 6.0.2 (6.0.222.6406), X64 RyuJIT
  Base   : .NET 6.0.2 (6.0.222.6406), X64 RyuJIT
  W10    : .NET 6.0.2 (6.0.222.6406), X64 RyuJIT
  W200   : .NET 6.0.2 (6.0.222.6406), X64 RyuJIT

MinIterationTime=1.0000 ms  

| Method |  Job | WarmupCount | N |      Mean |     Error |    StdDev |    Median |
|------- |----- |------------ |-- |----------:|----------:|----------:|----------:|
|    Foo | Base |     Default | 1 | 2.2980 ns | 0.0672 ns | 0.1611 ns | 2.2162 ns |
|    Foo |  W10 |          10 | 1 | 2.4105 ns | 0.0257 ns | 0.0215 ns | 2.4178 ns |
|    Foo | W200 |         200 | 1 | 0.4492 ns | 0.0189 ns | 0.0252 ns | 0.4372 ns |

As we can see, the increased number of WarmupCount gives another result. Now let's look at the logs:

WorkloadWarmup  53: 524288 op, 1891400.00 ns, 3.6076 ns/op
WorkloadWarmup  54: 524288 op, 1895000.00 ns, 3.6144 ns/op
WorkloadWarmup  55: 524288 op, 1898000.00 ns, 3.6201 ns/op
WorkloadWarmup  56: 524288 op, 1900800.00 ns, 3.6255 ns/op
WorkloadWarmup  57: 524288 op, 2079800.00 ns, 3.9669 ns/op
WorkloadWarmup  58: 524288 op, 979400.00 ns, 1.8681 ns/op
WorkloadWarmup  59: 524288 op, 851700.00 ns, 1.6245 ns/op
WorkloadWarmup  60: 524288 op, 847600.00 ns, 1.6167 ns/op
WorkloadWarmup  61: 524288 op, 847500.00 ns, 1.6165 ns/op
WorkloadWarmup  62: 524288 op, 847500.00 ns, 1.6165 ns/op
WorkloadWarmup  63: 524288 op, 847600.00 ns, 1.6167 ns/op
WorkloadWarmup  64: 524288 op, 851500.00 ns, 1.6241 ns/op
WorkloadWarmup  65: 524288 op, 852200.00 ns, 1.6254 ns/op
WorkloadWarmup  66: 524288 op, 852300.00 ns, 1.6256 ns/op

On iteration 58, JIT promoted the benchmark to the next tier (we can observe an obvious speedup). Potentially, this issue could be relevant for some benchmarks even with the default MinIterationTime value (I wasn't able to come up with an example). While the issue could be considered as a corner case for the current settings, it becomes a serious problem for lower MinIterationTime values.

Currently, I see three ways to resolve this problem:

  1. Force the runtime to use the latest JIT tier by default from the beginning. It may require some changes in the runtime.
  2. Get notified about the JIT events and continue the warmup stage until all relevant methods are promoted to the latest tier. It could by quite tricky to properly implement.
  3. Introduce a new parameter like MinTotalWarmupDuration so that BenchmarkDotNet will always spend enough time in the warmup stage regardless of the InvocationCount/MinIterationTime.

The latest option looks the most promising because it's runtime-agnostic. However, it requires choosing a proper default value for MinTotalWarmupDuration.

Relevant issues: #1125, #1210, #1466, dotnet/runtime#13069

@AndreyAkinshin
Copy link
Member Author

@adamsitnik, @EgorBo do you have any thoughts on this?

@adamsitnik
Copy link
Member

Force the runtime to use the latest JIT tier by default from the beginning. It may require some changes in the runtime.

Most likely there is a magic env var that allows for that already. The problem is that it would most likely promote all the methods, even those who don't meet the "promotion requirements" (invoked too few times)?

Get notified about the JIT events and continue the warmup stage until all relevant methods are promoted to the latest tier. It could by quite tricky to properly implement

I agree, it would be tricky to implement, and it could have some overhead (logging events)

Introduce a new parameter like MinTotalWarmupDuration so that BenchmarkDotNet will always spend enough time in the warmup stage regardless of the InvocationCount/MinIterationTime.

I like this idea. It should work well as long as the JIT heuristic does not change.

@AndyAyersMS @kouvel is it still 30 invocations and 100 ms of no new methods being JITted?

@kouvel
Copy link
Member

kouvel commented May 4, 2022

is it still 30 invocations and 100 ms of no new methods being JITted?

After 100 ms of no new methods being called, call counting then begins and after 30 calls the method would be queued for promotion to tier 1 in the background.

Force the runtime to use the latest JIT tier by default from the beginning. It may require some changes in the runtime.

There is an env var DOTNET_TC_AggressiveTiering=1 that removes the startup delay and reduces the call counting threshold for tier 1. Although the call counting threshold is set to 1, practically it's about 3, so with that methods would get queued for promotion to tier 1 after about 3 calls.

@MichalPetryka
Copy link
Contributor

Unless you've manually enabled DOTNET_TC_QuickJitForLoops, this method doesn't have tiered codegen, since .Net 6 by default doesn't do tiered JIT on methods with loops.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants