[macOS] Potential regression in delegates invocation #59152

adamsitnik · 2021-09-15T13:11:47Z

It seems to be affecting only macOS (cc @Lxiamail @jeffhandley)

git clone https://github.com/dotnet/performance.git
python3 ./performance/scripts/benchmarks_ci.py -f net5.0 net6.0 --filter PerfLabTests.DelegatePerf.DelegateInvoke

PerfLabTests.DelegatePerf.DelegateInvoke

Result	Ratio	Operating System
Same	1.01	Windows 10.0.19043.1165
Same	0.99	Windows 10.0.20348
Same	1.00	Windows 10.0.20348
Same	1.00	Windows 10.0.18363.1621
Same	0.97	Windows 8.1
Same	1.00	Windows 10.0.19042.685
Same	1.00	Windows 10.0.19043.1165
Same	0.99	Windows 10.0.22454
Same	0.99	Windows 10.0.22451
Same	1.00	Windows 10.0.19042.1165
Slower	0.29	Windows 7 SP1
Same	0.99	centos 8
Same	1.00	debian 10
Same	0.99	rhel 7
Same	1.02	sles 15
Same	1.01	opensuse-leap 15.3
Same	1.00	ubuntu 18.04
Same	1.00	ubuntu 18.04
Same	1.00	alpine 3.13
Same	1.00	ubuntu 16.04
Faster	1.33	Windows 10.0.19043.1165
Faster	1.43	Windows 10.0.22000
Same	1.00	Windows 10.0.19043.1165
Same	1.00	Windows 10.0.18363.1621
Same	0.99	Windows 10.0.19043.1165
Slower	0.89	macOS Big Sur 11.5.2
Slower	0.72	macOS Big Sur 11.5.2
Slower	0.78	macOS Big Sur 11.4

cc @AndyAyersMS @kunalspathak @EgorBo

ghost · 2021-09-15T13:11:50Z

Tagging subscribers to this area: @dotnet/area-system-runtime
See info in area-owners.md if you want to be subscribed.

Issue Details

It seems to be affecting only macOS (cc @Lxiamail @jeffhandley)

git clone https://github.com/dotnet/performance.git
python3 ./performance/scripts/benchmarks_ci.py -f net5.0 net6.0 --filter PerfLabTests.DelegatePerf.DelegateInvoke

PerfLabTests.DelegatePerf.DelegateInvoke

Result	Ratio	Operating System
Same	1.01	Windows 10.0.19043.1165
Same	0.99	Windows 10.0.20348
Same	1.00	Windows 10.0.20348
Same	1.00	Windows 10.0.18363.1621
Same	0.97	Windows 8.1
Same	1.00	Windows 10.0.19042.685
Same	1.00	Windows 10.0.19043.1165
Same	0.99	Windows 10.0.22454
Same	0.99	Windows 10.0.22451
Same	1.00	Windows 10.0.19042.1165
Slower	0.29	Windows 7 SP1
Same	0.99	centos 8
Same	1.00	debian 10
Same	0.99	rhel 7
Same	1.02	sles 15
Same	1.01	opensuse-leap 15.3
Same	1.00	ubuntu 18.04
Same	1.00	ubuntu 18.04
Same	1.00	alpine 3.13
Same	1.00	ubuntu 16.04
Faster	1.33	Windows 10.0.19043.1165
Faster	1.43	Windows 10.0.22000
Same	1.00	Windows 10.0.19043.1165
Same	1.00	Windows 10.0.18363.1621
Same	0.99	Windows 10.0.19043.1165
Slower	0.89	macOS Big Sur 11.5.2
Slower	0.72	macOS Big Sur 11.5.2
Slower	0.78	macOS Big Sur 11.4

cc @AndyAyersMS @kunalspathak @EgorBo

Author:	adamsitnik
Assignees:	-
Labels:	`area-System.Runtime`, `tenet-performance`
Milestone:	-

jeffhandley · 2021-09-30T21:56:25Z

@AndyAyersMS @kunalspathak @EgorBo What would you recommend as the first step of investigating this?

kunalspathak · 2021-09-30T21:58:51Z

@AndyAyersMS - Is this the example that involves struct arguments passing and we hit some struct promotion limit that you were referring to few days back?

AndyAyersMS · 2021-09-30T22:17:16Z

I don't think so -- the test doesn't look like it involves structs.

https://github.com/dotnet/performance/blob/d7dac8a7ca12a28d099192f8a901cf8e30361384/src/benchmarks/micro/runtime/perflab/DelegatePerf.cs#L32-L44

As far as investigating goes, we should see if we can repro locally, then we can drill in (perhaps start with jitted codegen for the benchmark method).

What HW was this...?

jeffhandley · 2021-10-07T19:19:03Z

@adamsitnik I need to redirect this back to you for initial investigation per Andy's comments.

adamsitnik · 2021-10-11T14:20:28Z

What HW was this...?

@AndyAyersMS it was reproducible on all three x64 mac books that we have used (mine is 4 year old mac book pro, not sure about @jeffhandley or @carlossanlop laptops who provided the other results).

I need to redirect this back to you for initial investigation per Andy's comments.

I am able to reproduce it on my macBook, the problem is that there is no easy way to get disassembly on macOS so I can't just share it. That is why it would be better if someone from the JIT Team took a look at this.

jeffhandley · 2021-10-11T16:20:13Z

My MacBook Pro is a mid-2014 13" retina with 2.6GHz dual-core Intel Core i5 and 8GB RAM.

jeffhandley · 2021-10-18T06:23:01Z

I'm moving this to 7.0.0, but we'll still want to get to the root cause to make sure we understand the impact and tradeoff.

AndyAyersMS · 2021-10-19T02:40:15Z

I can see if this repros on my Mac Mini, but it may take me a while to get around to it...

AndyAyersMS · 2021-10-19T18:47:47Z

I can repro this. Will see what I can uncover...

BenchmarkDotNet=v0.13.1.1611-nightly, OS=macOS High Sierra 10.13.6 (17G14042) [Darwin 17.7.0]
Intel Core i5-4278U CPU 2.60GHz (Haswell), 1 CPU, 4 logical and 2 physical cores
.NET SDK=6.0.100-rc.2.21505.57
  [Host]     : .NET 6.0.0 (6.0.21.48005), X64 RyuJIT
  Job-IIHWUZ : .NET 5.0.11 (5.0.1121.47308), X64 RyuJIT
  Job-SZCDQB : .NET 6.0.0 (6.0.21.48005), X64 RyuJIT

PowerPlanMode=00000000-0000-0000-0000-000000000000  Arguments=/p:DebugType=portable,-bl:benchmarkdotnet.binlog  IterationTime=250.0000 ms  
MaxIterationCount=20  MinIterationCount=15  WarmupCount=1  

|         Method |        Job |  Runtime | Toolchain |     Mean |   Error |  StdDev |   Median |      Min |      Max | Ratio | Allocated |
|--------------- |----------- |--------- |---------- |---------:|--------:|--------:|---------:|---------:|---------:|------:|----------:|
| DelegateInvoke | Job-IIHWUZ | .NET 5.0 |    net5.0 | 458.4 us | 1.22 us | 1.08 us | 458.3 us | 457.1 us | 461.2 us |  1.00 |       1 B |
| DelegateInvoke | Job-SZCDQB | .NET 6.0 |    net6.0 | 523.7 us | 1.02 us | 0.85 us | 523.8 us | 522.3 us | 524.8 us |  1.14 |       2 B |

jeffhandley · 2021-10-19T19:02:47Z

Thanks, @AndyAyersMS. Unless you think this should be considered for 6.0.1 (first servicing release) as a potential issue to address, it's not urgent. Since it can be repro'd, I'll remove the needs further triage label.

AndyAyersMS · 2021-10-25T18:56:30Z

From what I can tell the jitted codegen for 5.0 and 6.0 is identical. So the perf issue is either in the native part of the runtime (seems unlikely) or some stub I can't yet see, or...?

Going to try and get a more general profile, but I am not that familiar with how to do this on MacOS.

AndyAyersMS · 2021-10-25T20:17:04Z

Ok, think this is related to loop alignment.

In 5.0 we only 32 byte aligned Tier1 methods with loops. This was fixed in 6.0 (#42909) to handle 32 byte aligning all optimized method with loops. We later went on to add alignment padding for loops in 6.0. But we bypass padding if a loop contains a call.

In this test the jitted codegen is identical in 5.0 and 6.0, and the key inner loop in InvokeDelegate is

G_M6345_IG03:              ;; offset=003EH
       488BC3               mov      rax, rbx
       488B7808             mov      rdi, gword ptr [rax+8]
       498BF6               mov      rsi, r14
       BA64000000           mov      edx, 100
       B964000000           mov      ecx, 100
       FF5018               call     qword ptr [rax+24]DelegateLong:Invoke(System.Object,long,long):long:this
       4C8BF8               mov      r15, rax
       41FFC4               inc      r12d
       48B88067BF0801000000 mov      rax, 0x108BF6780
       443B20               cmp      r12d, dword ptr [rax]
       7CD4                 jl       SHORT G_M6345_IG03
						;; bbWeight=4    PerfScore 39.00
G_M6345_IG04:              ;; offset=006AH

For 6.0 the method is 32 byte aligned, and so this loop body runs from 0x3E .. 0x6A and needs 3 fetch windows (plus presumably one more for the call).

For 5.0 the method ends up being 16 byte aligned, this is favorable for the key loop which now spans (effectively) 0x2E..0x5A or two fetch windows (plus perhaps one more for the call).

As a result 5.0 runs faster:

BenchmarkDotNet=v0.13.1.1611-nightly, OS=macOS Catalina 10.15.7 (19H1419) [Darwin 19.6.0]
Intel Core i5-4278U CPU 2.60GHz (Haswell), 1 CPU, 4 logical and 2 physical cores
.NET SDK=6.0.100-rc.2.21505.57
  [Host]     : .NET 6.0.0 (6.0.21.48005), X64 RyuJIT
  Job-AIDUIT : .NET 5.0.11 (5.0.1121.47308), X64 RyuJIT
  Job-NBIBBZ : .NET 6.0.0 (6.0.21.48005), X64 RyuJIT

PowerPlanMode=00000000-0000-0000-0000-000000000000  Arguments=/p:DebugType=portable,-bl:benchmarkdotnet.binlog  IterationTime=250.0000 ms  
MaxIterationCount=20  MinIterationCount=15  WarmupCount=1

Method	Job	Runtime	Toolchain	Mean	Error	StdDev	Median	Min	Max	Ratio	Allocated
DelegateInvoke	Job-AIDUIT	.NET 5.0	net5.0	462.6 us	2.25 us	2.11 us	462.6 us	460.1 us	467.9 us	1.00	1 B
DelegateInvoke	Job-NBIBBZ	.NET 6.0	net6.0	530.0 us	3.86 us	3.61 us	528.2 us	525.8 us	536.9 us	1.15	2 B

But if I fix the "bug" in the 5.0 alignment code by setting COMPlus_TC_QuickJitForLoops=1 then 5.0 also gets 32 byte method alignment and so poor loop alignment, and the performance equalizes:

BenchmarkDotNet=v0.13.1.1611-nightly, OS=macOS Catalina 10.15.7 (19H1419) [Darwin 19.6.0]
Intel Core i5-4278U CPU 2.60GHz (Haswell), 1 CPU, 4 logical and 2 physical cores
.NET SDK=6.0.100-rc.2.21505.57
  [Host]     : .NET 6.0.0 (6.0.21.48005), X64 RyuJIT
  Job-CFHBWL : .NET 5.0.11 (5.0.1121.47308), X64 RyuJIT
  Job-SQMUMX : .NET 6.0.0 (6.0.21.48005), X64 RyuJIT

EnvironmentVariables=COMPlus_TC_QuickJitForLoops=1  PowerPlanMode=00000000-0000-0000-0000-000000000000  Arguments=/p:DebugType=portable,-bl:benchmarkdotnet.binlog  
IterationTime=250.0000 ms  MaxIterationCount=20  MinIterationCount=15  
WarmupCount=1

Method	Job	Runtime	Toolchain	Mean	Error	StdDev	Median	Min	Max	Ratio	Allocated
DelegateInvoke	Job-CFHBWL	.NET 5.0	net5.0	540.9 us	5.78 us	4.83 us	540.5 us	533.7 us	550.2 us	1.00	-
DelegateInvoke	Job-SQMUMX	.NET 6.0	net6.0	540.7 us	2.75 us	2.58 us	541.0 us	534.7 us	544.4 us	1.00	2 B

If you modify the benchmark to use a param instead of a static for the loop limit InnerIterationCount200000 we see similar swings in perf, as removing the class init check modifies the loop alignment.

At any rate, here's a case where aligning a loop with a call seems to have a noticeable impact on perf, because the callee is trivial.

cc @kunalspathak

AndyAyersMS · 2022-04-25T21:08:28Z

We should revisit this now that OSR is enabled and see what that does (will wait for the 7P4 perf runs...)

AndyAyersMS · 2022-06-16T22:03:54Z

Current data with 7.0 looks better but still not quite as good as 5.0

BenchmarkDotNet=v0.13.1.1786-nightly, OS=macOS Monterey 12.4 (21F79) [Darwin 21.5.0]
Intel Core i5-4278U CPU 2.60GHz (Haswell), 1 CPU, 4 logical and 2 physical cores
.NET SDK=7.0.100-preview.5.22307.18
  [Host]     : .NET 6.0.6 (6.0.622.26707), X64 RyuJIT
  Job-TXZCIP : .NET 5.0.11 (5.0.1121.47308), X64 RyuJIT
  Job-NIIKGA : .NET 6.0.6 (6.0.622.26707), X64 RyuJIT
  Job-HAVFNP : .NET 7.0.0 (7.0.22.30112), X64 RyuJIT

Method	Job	Runtime	Toolchain	Mean	Error	StdDev	Median	Min	Max	Ratio	Allocated	Alloc Ratio
DelegateInvoke	Job-KVFMUW	.NET 5.0	net5.0	452.8 us	0.22 us	0.20 us	452.8 us	452.5 us	453.2 us	1.00	1 B	1.00
DelegateInvoke	Job-HRZMKX	.NET 6.0	net6.0	613.1 us	6.84 us	6.40 us	612.8 us	604.2 us	624.9 us	1.35	2 B	2.00
DelegateInvoke	Job-PUBOIJ	.NET 7.0	net7.0	518.0 us	0.85 us	0.75 us	518.2 us	517.0 us	519.5 us	1.14	2 B	2.00

AndyAyersMS · 2022-07-13T18:24:44Z

Procrastination not paying off here. 7p6 is not any faster.

BenchmarkDotNet=v0.13.1.1786-nightly, OS=macOS Monterey 12.4 (21F79) [Darwin 21.5.0]
Intel Core i5-4278U CPU 2.60GHz (Haswell), 1 CPU, 4 logical and 2 physical cores
.NET SDK=7.0.100-preview.6.22352.1
[Host] : .NET 7.0.0 (7.0.22.32404), X64 RyuJIT
Job-SVXENR : .NET 5.0.11 (5.0.1121.47308), X64 RyuJIT
Job-AZCEHU : .NET 6.0.6 (6.0.622.26707), X64 RyuJIT
Job-XJBOAY : .NET 7.0.0 (7.0.22.32404), X64 RyuJIT

PowerPlanMode=00000000-0000-0000-0000-000000000000 IterationTime=250.0000 ms MaxIterationCount=20
MinIterationCount=15 WarmupCount=1

Method	Job	Runtime	Toolchain	Mean	Error	StdDev	Median	Min	Max	Ratio	Allocated	Alloc Ratio
DelegateInvoke	Job-SVXENR	.NET 5.0	net5.0	452.4 us	0.49 us	0.41 us	452.2 us	452.0 us	453.3 us	1.00	1 B	1.00
DelegateInvoke	Job-AZCEHU	.NET 6.0	net6.0	594.4 us	4.51 us	4.21 us	595.7 us	586.6 us	601.1 us	1.31	2 B	2.00
DelegateInvoke	Job-XJBOAY	.NET 7.0	net7.0	516.7 us	0.14 us	0.12 us	516.6 us	516.5 us	516.9 us	1.14	2 B	2.00

AndyAyersMS · 2022-07-13T18:38:30Z

FWIW TieredPGO does nicely here in .NET 7, thanks to @jakobbotsch

Method	Job	Runtime	Toolchain	Mean	Error	StdDev	Median	Min	Max	Ratio	Allocated	Alloc Ratio
DelegateInvoke	Job-KIGZAI	.NET 5.0	net5.0	452.4 us	0.56 us	0.47 us	452.2 us	452.04 us	453.5 us	1.00	-	NA
DelegateInvoke	Job-GEGJCA	.NET 6.0	net6.0	516.8 us	0.21 us	0.18 us	516.8 us	516.49 us	517.1 us	1.14	3 B	NA
DelegateInvoke	Job-RJIHFR	.NET 7.0	net7.0	102.0 us	1.50 us	1.33 us	102.1 us	99.52 us	104.1 us	0.23	17 B	NA

AndyAyersMS · 2022-07-13T19:15:13Z

Some non-OSX data on Intel CPUs

BenchmarkDotNet=v0.13.1.1786-nightly, OS=ubuntu 18.04
Intel Core i7-2720QM CPU 2.20GHz (Sandy Bridge), 1 CPU, 8 logical and 4 physical cores
.NET SDK=7.0.100-preview.5.22307.18
  [Host]     : .NET 7.0.0 (7.0.22.30112), X64 RyuJIT
  Job-ROQCLI : .NET 5.0.5 (5.0.521.16609), X64 RyuJIT
  Job-RENGIG : .NET 6.0.5 (6.0.522.21309), X64 RyuJIT
  Job-STLHTD : .NET 7.0.0 (7.0.22.30112), X64 RyuJIT

PowerPlanMode=00000000-0000-0000-0000-000000000000  IterationTime=250.0000 ms  MaxIterationCount=20  
MinIterationCount=15  WarmupCount=1

Method	Job	Runtime	Toolchain	Mean	Error	StdDev	Median	Min	Max	Ratio	Allocated	Alloc Ratio
DelegateInvoke	Job-ROQCLI	.NET 5.0	net5.0	551.5 us	0.51 us	0.45 us	551.4 us	550.7 us	552.2 us	1.00	1 B	1.00
DelegateInvoke	Job-RENGIG	.NET 6.0	net6.0	614.2 us	2.18 us	1.70 us	613.5 us	612.5 us	618.0 us	1.11	2 B	2.00
DelegateInvoke	Job-STLHTD	.NET 7.0	net7.0	552.0 us	1.43 us	1.19 us	552.0 us	550.4 us	554.1 us	1.00	2 B	2.00

BenchmarkDotNet=v0.13.1.1786-nightly, OS=Windows 10 (10.0.19044.1739/21H2/November2021Update)
Intel Core i7-6700 CPU 3.40GHz (Skylake), 1 CPU, 8 logical and 4 physical cores
.NET SDK=7.0.100-preview.6.22352.1
  [Host]     : .NET 7.0.0 (7.0.22.32404), X64 RyuJIT
  Job-TDISHI : .NET 5.0.17 (5.0.1722.21314), X64 RyuJIT
  Job-DIKFLT : .NET 6.0.7 (6.0.722.32202), X64 RyuJIT
  Job-AWWZXD : .NET 7.0.0 (7.0.22.32404), X64 RyuJIT

PowerPlanMode=00000000-0000-0000-0000-000000000000  IterationTime=250.0000 ms  MaxIterationCount=20
MinIterationCount=15  WarmupCount=1

Method	Job	Runtime	Toolchain	Mean	Error	StdDev	Median	Min	Max	Ratio	RatioSD	Allocated	Alloc Ratio
DelegateInvoke	Job-TDISHI	.NET 5.0	net5.0	319.3 us	6.46 us	6.91 us	319.3 us	309.5 us	336.4 us	1.00	0.00	-	NA
DelegateInvoke	Job-DIKFLT	.NET 6.0	net6.0	323.0 us	6.29 us	6.99 us	323.0 us	310.2 us	335.2 us	1.01	0.03	1 B	NA
DelegateInvoke	Job-AWWZXD	.NET 7.0	net7.0	432.0 us	6.38 us	5.33 us	430.2 us	422.6 us	444.7 us	1.35	0.03	1 B	NA

BenchmarkDotNet=v0.13.1.1786-nightly, OS=Windows 11 (10.0.22000.795/21H2)
Intel Core i7-8650U CPU 1.90GHz (Kaby Lake R), 1 CPU, 8 logical and 4 physical cores
.NET SDK=7.0.100-preview.5.22307.18
  [Host]     : .NET 7.0.0 (7.0.22.30112), X64 RyuJIT
  Job-COVRZU : .NET 5.0.17 (5.0.1722.21314), X64 RyuJIT
  Job-IADVQC : .NET 6.0.7 (6.0.722.32202), X64 RyuJIT
  Job-MJQVRZ : .NET 7.0.0 (7.0.22.30112), X64 RyuJIT

PowerPlanMode=00000000-0000-0000-0000-000000000000 IterationTime=250.0000 ms MaxIterationCount=20
MinIterationCount=15 WarmupCount=1

Method	Job	Runtime	Toolchain	Mean	Error	StdDev	Median	Min	Max	Ratio	RatioSD	Allocated	Alloc Ratio
DelegateInvoke	Job-COVRZU	.NET 5.0	net5.0	625.9 us	75.56 us	87.02 us	587.0 us	527.3 us	750.7 us	1.00	0.00	-	NA
DelegateInvoke	Job-IADVQC	.NET 6.0	net6.0	813.8 us	98.12 us	104.99 us	768.8 us	701.2 us	1,086.4 us	1.29	0.15	1 B	NA
DelegateInvoke	Job-MJQVRZ	.NET 7.0	net7.0	684.8 us	64.32 us	74.07 us	660.4 us	600.7 us	851.8 us	1.12	0.24	2 B	NA

BenchmarkDotNet=v0.13.1.1786-nightly, OS=Windows 11 (10.0.22000.739/21H2)
Intel Core i9-9900T CPU 2.10GHz, 1 CPU, 16 logical and 8 physical cores
.NET SDK=7.0.100-preview.6.22352.1
  [Host]     : .NET 7.0.0 (7.0.22.32404), X64 RyuJIT
  Job-AHPPXT : .NET 5.0.17 (5.0.1722.21314), X64 RyuJIT
  Job-AXCIJO : .NET 6.0.7 (6.0.722.32202), X64 RyuJIT
  Job-WWEVZP : .NET 7.0.0 (7.0.22.32404), X64 RyuJIT

PowerPlanMode=00000000-0000-0000-0000-000000000000  IterationTime=250.0000 ms  MaxIterationCount=20
MinIterationCount=15  WarmupCount=1

Method	Job	Runtime	Toolchain	Mean	Error	StdDev	Median	Min	Max	Ratio	RatioSD	Allocated	Alloc Ratio
DelegateInvoke	Job-AHPPXT	.NET 5.0	net5.0	325.9 us	13.47 us	15.51 us	329.9 us	304.1 us	354.2 us	1.00	0.00	-	NA
DelegateInvoke	Job-AXCIJO	.NET 6.0	net6.0	452.2 us	4.05 us	3.79 us	452.0 us	444.9 us	458.8 us	1.40	0.07	1 B	NA
DelegateInvoke	Job-WWEVZP	.NET 7.0	net7.0	385.0 us	11.07 us	12.75 us	388.1 us	361.1 us	400.0 us	1.18	0.05	1 B	NA

BenchmarkDotNet=v0.13.1.1786-nightly, OS=Windows 11 (10.0.22000.795/21H2)
Intel Core i7-8700 CPU 3.20GHz (Coffee Lake), 1 CPU, 12 logical and 6 physical cores
.NET SDK=7.0.100-preview.5.22307.18
  [Host]     : .NET 7.0.0 (7.0.22.30112), X64 RyuJIT
  Job-MCKRNJ : .NET 5.0.17 (5.0.1722.21314), X64 RyuJIT
  Job-UAXSFU : .NET 6.0.7 (6.0.722.32202), X64 RyuJIT
  Job-MYYFBF : .NET 7.0.0 (7.0.22.30112), X64 RyuJIT

PowerPlanMode=00000000-0000-0000-0000-000000000000  IterationTime=250.0000 ms  MaxIterationCount=20
MinIterationCount=15  WarmupCount=1

Method	Job	Runtime	Toolchain	Mean	Error	StdDev	Median	Min	Max	Ratio	RatioSD	Allocated	Alloc Ratio
DelegateInvoke	Job-MCKRNJ	.NET 5.0	net5.0	269.4 us	4.89 us	4.57 us	269.2 us	263.6 us	278.9 us	1.00	0.00	-	NA
DelegateInvoke	Job-UAXSFU	.NET 6.0	net6.0	268.4 us	3.66 us	3.43 us	267.7 us	263.5 us	275.4 us	1.00	0.01	1 B	NA
DelegateInvoke	Job-MYYFBF	.NET 7.0	net7.0	313.1 us	1.98 us	1.85 us	313.2 us	310.0 us	317.3 us	1.16	0.02	1 B	NA

AndyAyersMS · 2022-07-15T19:25:24Z

From the above we can see 5.0 is consistently fastest, with 6.0 or 7.0 sometimes similar to 5.0 and sometimes slower depending on the particular processor. 7.0 is generally faster than 6.0 though not always.

I modified the jit to align small (single block) loops with calls (as in the benchmark) and didn't see any improvement, so at this point I suspect the perf differences are related to the alignment of the delegate or its precode.

I think this sort of thing is only going to show up prominently when we have frequently executed delegates that do little or no computation. So, I'm going to close this as won't fix.

adamsitnik added area-System.Runtime tenet-performance Performance related issue labels Sep 15, 2021

dotnet-issue-labeler bot added the untriaged New issue has not been triaged by the area owner label Sep 15, 2021

jeffhandley added this to the 6.0.0 milestone Sep 30, 2021

jeffhandley added needs-further-triage Issue has been initially triaged, but needs deeper consideration or reconsideration and removed untriaged New issue has not been triaged by the area owner labels Sep 30, 2021

jeffhandley assigned adamsitnik Oct 7, 2021

adamsitnik removed their assignment Oct 11, 2021

jeffhandley modified the milestones: 6.0.0, 7.0.0 Oct 18, 2021

jeffhandley assigned AndyAyersMS Oct 19, 2021

jeffhandley removed the needs-further-triage Issue has been initially triaged, but needs deeper consideration or reconsideration label Oct 19, 2021

jeffhandley mentioned this issue Feb 1, 2022

System.Runtime work planned for .NET 7 #64603

Closed

28 tasks

AndyAyersMS added the Regression label Jun 25, 2022

AndyAyersMS closed this as completed Jul 15, 2022

ghost locked as resolved and limited conversation to collaborators Aug 15, 2022

jeffhandley removed the Regression label Dec 28, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[macOS] Potential regression in delegates invocation #59152

[macOS] Potential regression in delegates invocation #59152

adamsitnik commented Sep 15, 2021

ghost commented Sep 15, 2021

PerfLabTests.DelegatePerf.DelegateInvoke

jeffhandley commented Sep 30, 2021

kunalspathak commented Sep 30, 2021

AndyAyersMS commented Sep 30, 2021

jeffhandley commented Oct 7, 2021

adamsitnik commented Oct 11, 2021

jeffhandley commented Oct 11, 2021

jeffhandley commented Oct 18, 2021

AndyAyersMS commented Oct 19, 2021

AndyAyersMS commented Oct 19, 2021

jeffhandley commented Oct 19, 2021

AndyAyersMS commented Oct 25, 2021

AndyAyersMS commented Oct 25, 2021

AndyAyersMS commented Apr 25, 2022

AndyAyersMS commented Jun 16, 2022 •

edited

Loading

AndyAyersMS commented Jul 13, 2022

AndyAyersMS commented Jul 13, 2022

AndyAyersMS commented Jul 13, 2022

AndyAyersMS commented Jul 15, 2022

[macOS] Potential regression in delegates invocation #59152

[macOS] Potential regression in delegates invocation #59152

Comments

adamsitnik commented Sep 15, 2021

PerfLabTests.DelegatePerf.DelegateInvoke

ghost commented Sep 15, 2021

PerfLabTests.DelegatePerf.DelegateInvoke

jeffhandley commented Sep 30, 2021

kunalspathak commented Sep 30, 2021

AndyAyersMS commented Sep 30, 2021

jeffhandley commented Oct 7, 2021

adamsitnik commented Oct 11, 2021

jeffhandley commented Oct 11, 2021

jeffhandley commented Oct 18, 2021

AndyAyersMS commented Oct 19, 2021

AndyAyersMS commented Oct 19, 2021

jeffhandley commented Oct 19, 2021

AndyAyersMS commented Oct 25, 2021

AndyAyersMS commented Oct 25, 2021

AndyAyersMS commented Apr 25, 2022

AndyAyersMS commented Jun 16, 2022 • edited Loading

AndyAyersMS commented Jul 13, 2022

AndyAyersMS commented Jul 13, 2022

AndyAyersMS commented Jul 13, 2022

AndyAyersMS commented Jul 15, 2022

AndyAyersMS commented Jun 16, 2022 •

edited

Loading