Problems with default UnrollFactor in V0.11.0 #837

Tornhoof · 2018-07-23T19:23:56Z

Description
I just upgraded SpanJson to v0.11 of Benchmark.NET and the new UnrollFactor logic does not work for codegen-heavy benchmarks.
Actual Behaviour

// Benchmark: ModelBenchmark.DeserializeAccessTokenWithJilSerializer: DefaultJob
// *** Execute ***
// Launch: 1 / 1
// BeforeAnythingElse
// Benchmark Process Environment Information:
// Runtime=.NET Core 2.1.1 (CoreCLR 4.6.26606.02, CoreFX 4.6.26606.05), 64bit RyuJIT
// GC=Concurrent Workstation
// Job: DefaultJob
OverheadJitting  1: 1 op, 327424.34 ns, 327.4243 us/op
WorkloadJitting  1: 1 op, 2934831805.27 ns, 2.9348 s/op
WorkloadWarmup   1: 1 op, 16384.02 ns, 16.3840 us/op
WorkloadWarmup   2: 1 op, 10496.01 ns, 10.4960 us/op
WorkloadWarmup   3: 1 op, 9728.01 ns, 9.7280 us/op
WorkloadWarmup   4: 1 op, 9216.01 ns, 9.2160 us/op
WorkloadWarmup   5: 1 op, 8960.01 ns, 8.9600 us/op
WorkloadWarmup   6: 1 op, 23040.02 ns, 23.0400 us/op
WorkloadWarmup   7: 1 op, 7936.01 ns, 7.9360 us/op
WorkloadWarmup   8: 1 op, 8704.01 ns, 8.7040 us/op
WorkloadWarmup   9: 1 op, 9472.01 ns, 9.4720 us/op
WorkloadWarmup  10: 1 op, 8448.01 ns, 8.4480 us/op
// BeforeActualRun
WorkloadActual   1: 1 op, 10240.01 ns, 10.2400 us/op
WorkloadActual   2: 1 op, 25088.03 ns, 25.0880 us/op

The workloadActual lines then repeat around 90 times with quite a large variation in timings.
Expected behaviour
I get a larger amount of ops with a more consistent timing, the 1 op benchmark is wrong on a large large scale.

// **************************
// Benchmark: ModelBenchmark.DeserializeAccessTokenWithJilSerializer: Job-GXPFMQ(UnrollFactor=2)
// *** Execute ***
// Launch: 1 / 1
// BeforeAnythingElse
// Benchmark Process Environment Information:
// Runtime=.NET Core 2.1.1 (CoreCLR 4.6.26606.02, CoreFX 4.6.26606.05), 64bit RyuJIT
// GC=Concurrent Workstation
// Job: Job-CZJWNW(UnrollFactor=2)
OverheadJitting  1: 2 op, 352000.36 ns, 176.0002 us/op
WorkloadJitting  1: 2 op, 2947211977.95 ns, 1.4736 s/op
WorkloadPilot    1: 4 op, 13568.01 ns, 3.3920 us/op
WorkloadPilot    2: 8 op, 14848.02 ns, 1.8560 us/op
WorkloadPilot    3: 16 op, 15872.02 ns, 992.0010 ns/op
WorkloadPilot    4: 32 op, 25344.03 ns, 792.0008 ns/op
WorkloadPilot    5: 64 op, 44288.05 ns, 692.0007 ns/op
WorkloadPilot    6: 128 op, 84480.09 ns, 660.0007 ns/op
WorkloadPilot    7: 256 op, 152576.16 ns, 596.0006 ns/op
WorkloadPilot    8: 512 op, 300032.31 ns, 586.0006 ns/op
WorkloadPilot    9: 1024 op, 574208.59 ns, 560.7506 ns/op
WorkloadPilot   10: 2048 op, 1207041.24 ns, 589.3756 ns/op
WorkloadPilot   11: 4096 op, 2299650.35 ns, 561.4381 ns/op
WorkloadPilot   12: 8192 op, 4912389.03 ns, 599.6569 ns/op
WorkloadPilot   13: 16384 op, 9578249.81 ns, 584.6100 ns/op
WorkloadPilot   14: 32768 op, 18317842.76 ns, 559.0162 ns/op
WorkloadPilot   15: 65536 op, 36165925.03 ns, 551.8482 ns/op
WorkloadPilot   16: 131072 op, 72334922.07 ns, 551.8717 ns/op
WorkloadPilot   17: 262144 op, 143300498.74 ns, 546.6480 ns/op
WorkloadPilot   18: 524288 op, 285817892.68 ns, 545.1544 ns/op
WorkloadPilot   19: 1048576 op, 572262474.00 ns, 545.7520 ns/op
OverheadWarmup   1: 1048576 op, 3304963.38 ns, 3.1519 ns/op
OverheadWarmup   2: 1048576 op, 3340035.42 ns, 3.1853 ns/op
OverheadWarmup   3: 1048576 op, 3255811.33 ns, 3.1050 ns/op
OverheadWarmup   4: 1048576 op, 3256067.33 ns, 3.1052 ns/op
OverheadWarmup   5: 1048576 op, 3302403.38 ns, 3.1494 ns/op
OverheadWarmup   6: 1048576 op, 3288835.37 ns, 3.1365 ns/op
OverheadActual   1: 1048576 op, 3316483.40 ns, 3.1628 ns/op
OverheadActual   2: 1048576 op, 3276291.35 ns, 3.1245 ns/op
OverheadActual   3: 1048576 op, 3233283.31 ns, 3.0835 ns/op
OverheadActual   4: 1048576 op, 3339267.42 ns, 3.1846 ns/op
OverheadActual   5: 1048576 op, 3338755.42 ns, 3.1841 ns/op
OverheadActual   6: 1048576 op, 3284739.36 ns, 3.1326 ns/op
OverheadActual   7: 1048576 op, 3252483.33 ns, 3.1018 ns/op
OverheadActual   8: 1048576 op, 3229443.31 ns, 3.0798 ns/op
OverheadActual   9: 1048576 op, 3227139.30 ns, 3.0776 ns/op
OverheadActual  10: 1048576 op, 3257347.34 ns, 3.1064 ns/op
OverheadActual  11: 1048576 op, 3340035.42 ns, 3.1853 ns/op
OverheadActual  12: 1048576 op, 3311619.39 ns, 3.1582 ns/op
OverheadActual  13: 1048576 op, 3286019.36 ns, 3.1338 ns/op
OverheadActual  14: 1048576 op, 3352835.43 ns, 3.1975 ns/op
OverheadActual  15: 1048576 op, 3325187.40 ns, 3.1711 ns/op
WorkloadWarmup   1: 1048576 op, 571615561.33 ns, 545.1351 ns/op
WorkloadWarmup   2: 1048576 op, 571237192.95 ns, 544.7742 ns/op
WorkloadWarmup   3: 1048576 op, 572946506.70 ns, 546.4044 ns/op
WorkloadWarmup   4: 1048576 op, 571277128.99 ns, 544.8123 ns/op
WorkloadWarmup   5: 1048576 op, 571070280.78 ns, 544.6151 ns/op
WorkloadWarmup   6: 1048576 op, 571714377.44 ns, 545.2293 ns/op
WorkloadWarmup   7: 1048576 op, 571312713.02 ns, 544.8463 ns/op
// BeforeActualRun
WorkloadActual   1: 1048576 op, 571627337.35 ns, 545.1463 ns/op
WorkloadActual   2: 1048576 op, 571613257.33 ns, 545.1329 ns/op

I think that's bececause the WorkloadJitting is taking that long (yeah it's codegen heavy), as soon as I change the Unrollfactor to 2 it works as expected (atleast as far as I've run the benchmarks).
The benchmark is that codegen heavy, because I initialize ~480 serializers statically in that class (the whole benchmark class is one large t4 script, see https://github.com/Tornhoof/SpanJson/blob/master/SpanJson.Benchmarks/ModelBenchmark.cs)

Repro

Checkout Branch BenchmarkUpdates on https://github.com/Tornhoof/SpanJson
cd SpanJson.Benchmarks
dotnet run -c Release and select 2

AndreyAkinshin · 2018-07-25T09:39:00Z

@Tornhoof, thank you very much for the report.

You correctly described the problem: BenchmarkDotNet ignores the Pilot stage (where we should choose the best amount of invocation) because WorkloadJitting takes too much time. We did this "optimization" for heavy macrobenchmark, but obviously it doesn't work for your case.

I think we should always run the Pilot stage if the jitting (cold start) took a reasonable amount of time (e.g., < 20 sec). It should automatically fix the invocation count and solve your problem. Currently, we compare the WorkloadJitting time with engineParameters.IterationTime which is 500 ms by default. Probably it makes sense to introduce additional TimeInterval option for the WorkloadJitting threshold value.
We should discuss it with @adamsitnik, but he will be available only next week.

As a temporary workaround, you can manually set the invocation count or increase the IterationTime value.

Tornhoof · 2018-07-25T10:17:59Z

Thank you for the workaround.

adamsitnik · 2018-07-31T16:08:05Z

@Tornhoof big thanks for the bug report and a clear repro case!

I will try to find some solution and fix it during the next weekend

AndreyAkinshin · 2018-08-16T10:31:04Z

@adamsitnik do you have any progress with this issue?
I have another idea about how to solve the problem without additional settings. Let's check that we do enough operations during the WorkloadWarmup stage. If it turns out that the number of operations is not enough and our WorkloadJitting heuristics produced a wrong hypothesis about the warmed operation duration, we should restart iterations from the Pilot stage.
What do you think?

adamsitnik · 2018-08-20T12:24:36Z

@AndreyAkinshin Great idea!

So far I have observed that this issue happens only for benchmarks that rely heavily on Reflection.ILEmit and my only idea was to check if the benchmark relies on this assembly and if it does, run it one more time. But I think that your solution is much more universal and clean!

AndreyAkinshin · 2018-08-20T12:30:11Z

I have observed that this issue happens only for benchmarks that rely heavily on Reflection.ILEmit

In fact, there are too many factors that can slow down the first iteration (from heavy assembly loading to custom user logic).

adamsitnik · 2018-08-20T12:37:41Z

In fact, there are too many factors that can slow down the first iteration (from heavy assembly loading to custom user logic).

Totally agree! Would you like to implement it?

AndreyAkinshin · 2018-08-20T12:38:37Z

Would you like to implement it?

Yes.

Need to perform a warmup so the workload jitting doesn't take too long dotnet/BenchmarkDotNet#837

kevingosse · 2020-09-09T08:56:18Z

Hello, any update on this?

I just ran into this issue (and given the list of mentions, I guess this is widespread). Any way I could help?

… magic number based on observations of the reported bug) we call it one more time to see if Jitting itself has not dominated the first invocation if it did, it shoud NOT be a single invocation engine (fixes #837, fixes #1337 and fixes #1338)

adamsitnik · 2020-10-24T15:52:25Z

Please excuse me for the delay. I've sent a fix: #1573

…n lasts longer than IterationTime (#1573) * add the test * if the Jitting took more than IterationTime but still less than 1s (a magic number based on observations of the reported bug) we call it one more time to see if Jitting itself has not dominated the first invocation if it did, it shoud NOT be a single invocation engine (fixes #837, fixes #1337 and fixes #1338)

* Update Cursively benchmark code. - VisitPartialFieldContents is required for correctness - Use the original byte array, since we have one, instead of forcing a stream around it - Set iteration time to 1 second, to work around dotnet/BenchmarkDotNet#837 while we wait for a version of this package that includes dotnet/BenchmarkDotNet#1573 * Change the order of this check Whether or not this is the optimal order is almost certain to depend on the data: data sets that would almost never use the pool would likely hurt more by doing it this way, especially if this were a real-world application since this would likely push a line out of the CPU cache for no reason. Then again, a real-world application probably wouldn't use the pool conditionally like this, so there's not much of a reason NOT to either.

adamsitnik · 2021-04-28T16:34:18Z

I thought that I've solved this problem in #1573, but today I've run into this problem again when benchmarking PowerShell (PowerShell/PowerShell#15242)

We should most probably implement the approach described by @AndreyAkinshin :

I have another idea about how to solve the problem without additional settings. Let's check that we do enough operations during the WorkloadWarmup stage. If it turns out that the number of operations is not enough and our WorkloadJitting heuristics produced a wrong hypothesis about the warmed operation duration, we should restart iterations from the Pilot stage.

Having said that, I am going to reopen the issue.

cc @daxian-dbw @adityapatwardhan

adamsitnik · 2022-10-05T10:08:10Z

It was most probably fixed with #1949

AndreyAkinshin assigned adamsitnik Jul 25, 2018

AndreyAkinshin assigned AndreyAkinshin and unassigned adamsitnik Aug 20, 2018

adamsitnik mentioned this issue Jan 2, 2019

Op/Invocation count not scaling up #994

Closed

Turnerj added a commit to TurnerSoftware/InfinityCrawler that referenced this issue Jan 3, 2019

Added workaround for benchmarking ops issue

d4728bd

Need to perform a warmup so the workload jitting doesn't take too long dotnet/BenchmarkDotNet#837

AArnott added a commit to AArnott/vs-streamjsonrpc that referenced this issue May 18, 2019

Workaround dotnet/BenchmarkDotNet#837

a2c869e

ycrumeyrolle added a commit to uruk-project/Jwt that referenced this issue Jul 14, 2019

Try to fix benchmark using dotnet/BenchmarkDotNet#837

75de0b5

ycrumeyrolle added a commit to uruk-project/Jwt that referenced this issue Jul 14, 2019

Try to fix benchmark using dotnet/BenchmarkDotNet#837 (#311)

8843d24

This was referenced Aug 14, 2019

update BenchmarkDotNet to latest version to take advantage of improvements dotnet/performance#511

Merged

warmup Jil benchmarks in the setup to make sure BDN calls them more than once per iteration dotnet/performance#769

Merged

om-ha mentioned this issue Feb 16, 2020

Add (de)serializer performance tests for System.Text.Json dotnet/performance#456

Merged

adamsitnik mentioned this issue Sep 2, 2020

CompareExchange_long benchmark sometimes reports very long execution time on x86 dotnet/performance#1497

Open

kevingosse mentioned this issue Sep 9, 2020

Make benchmark results more stable DataDog/dd-trace-dotnet#897

Merged

adamsitnik mentioned this issue Oct 24, 2020

Don't run the benchmark once per iteration if only the first ivocation lasts longer than IterationTime #1573

Merged

AndreyAkinshin closed this as completed in #1573 Oct 26, 2020

adamsitnik added this to the v0.13.0 milestone Nov 9, 2020

adamsitnik assigned adamsitnik and unassigned AndreyAkinshin Nov 9, 2020

adamsitnik added Area:Engine bug labels Nov 9, 2020

airbreather mentioned this issue Jan 12, 2021

Update Cursively benchmark code. MarkPflug/Benchmarks#1

Merged

adamsitnik reopened this Apr 28, 2021

adamsitnik mentioned this issue Apr 28, 2021

Add the performance benchmark project for PowerShell performance testing PowerShell/PowerShell#15242

Merged

12 tasks

AndreyAkinshin modified the milestones: v0.13.0, v0.13.x May 19, 2021

adamsitnik closed this as completed Oct 5, 2022

adamsitnik modified the milestones: v0.13.x, v0.13.2 Oct 5, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Problems with default UnrollFactor in V0.11.0 #837

Problems with default UnrollFactor in V0.11.0 #837

Tornhoof commented Jul 23, 2018 •

edited

Loading

AndreyAkinshin commented Jul 25, 2018

Tornhoof commented Jul 25, 2018

adamsitnik commented Jul 31, 2018

AndreyAkinshin commented Aug 16, 2018

adamsitnik commented Aug 20, 2018

AndreyAkinshin commented Aug 20, 2018

adamsitnik commented Aug 20, 2018

AndreyAkinshin commented Aug 20, 2018

kevingosse commented Sep 9, 2020

adamsitnik commented Oct 24, 2020

adamsitnik commented Apr 28, 2021

adamsitnik commented Oct 5, 2022

Problems with default UnrollFactor in V0.11.0 #837

Problems with default UnrollFactor in V0.11.0 #837

Comments

Tornhoof commented Jul 23, 2018 • edited Loading

AndreyAkinshin commented Jul 25, 2018

Tornhoof commented Jul 25, 2018

adamsitnik commented Jul 31, 2018

AndreyAkinshin commented Aug 16, 2018

adamsitnik commented Aug 20, 2018

AndreyAkinshin commented Aug 20, 2018

adamsitnik commented Aug 20, 2018

AndreyAkinshin commented Aug 20, 2018

kevingosse commented Sep 9, 2020

adamsitnik commented Oct 24, 2020

adamsitnik commented Apr 28, 2021

adamsitnik commented Oct 5, 2022

Tornhoof commented Jul 23, 2018 •

edited

Loading