-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
JIT: enable devirtualization/inlining of other array interface methods #109209
JIT: enable devirtualization/inlining of other array interface methods #109209
Conversation
The JIT recently enabled devirtualization of `GetEnumerator`, but other methods were inhibited from devirtualization because the runtime was returning an instantiating stub instead of the actual method. This blocked inlining and the JIT currently will not GDV unless it can also inline. So for instance `ICollection<T>.Count` would not devirtualize. We think we know enough to pass the right inst parameter (the exact method desc) so enable this for the array case, at least for normal jitting. For NAOT array devirtualization happens via normal paths thanks to `Array<T>` so should already fpr these cases. For R2R we don't do array interface devirt (yet). There was an existing field on `CORINFO_DEVIRTUALIZATION_INFO` to record the need for an inst parameter, but it was unused and so I renamed it and use it for this case. Contributes to dotnet#108913.
using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Running;
using System.Runtime.CompilerServices;
BenchmarkSwitcher.FromAssembly(typeof(Bench).Assembly).Run(args);
public class Bench
{
static byte[] Data = new byte[512];
[Benchmark]
public int Test() => TestInner(Data);
[MethodImpl(MethodImplOptions.NoInlining)]
int TestInner(ICollection<byte> c) => c.Count;
} |
EgorBot results: BenchmarkDotNet v0.14.0, Ubuntu 24.04 LTS (Noble Numbat)
When I run locally though: BenchmarkDotNet v0.14.0, Windows 11 (10.0.26100.2033) Runtime=.NET 9.0
Maybe my baselines are out of date... |
@EgorBot -intel -arm64 -profiler --envvars DOTNET_JitDisasm:TestInner using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Running;
using System.Runtime.CompilerServices;
BenchmarkSwitcher.FromAssembly(typeof(Bench).Assembly).Run(args);
public class Bench
{
static byte[] Data = new byte[512];
[Benchmark]
public int Test() => TestInner(Data);
[MethodImpl(MethodImplOptions.NoInlining)]
int TestInner(ICollection<byte> c) => c.Count;
} |
Most of them ignore inst param, except runtime/src/coreclr/vm/array.cpp Lines 984 to 990 in 9228ccf
I think this approach may work for non-shared code, but it looks too simple to handle shared generic code properly. |
@AndyAyersMS hm.. weird, it feels like it's already expanded in main according to bot... |
@EgorBot -intel -arm64 -profiler --envvars DOTNET_JitDisasm:TestInner using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Running;
using System.Runtime.CompilerServices;
BenchmarkSwitcher.FromAssembly(typeof(Bench).Assembly).Run(args);
public class Bench
{
static string[] Data = new string[512];
[Benchmark]
public int Test() => TestInner(Data);
[MethodImpl(MethodImplOptions.NoInlining)]
int TestInner(ICollection<string> c) => c.Count;
} |
The issue was with array devirt for ref-type (or shared) array elements... updated the benchmark to use |
still need to update JIT to expect a method handle instead of class handle in a few places. |
@EgorBot -intel -arm64 -profiler --envvars DOTNET_JitDisasm:TestInner using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Running;
using System.Runtime.CompilerServices;
BenchmarkSwitcher.FromAssembly(typeof(Bench).Assembly).Run(args);
public class Bench
{
static string[] Data = new string[512];
[Benchmark]
public int Test() => TestInner(Data);
[MethodImpl(MethodImplOptions.NoInlining)]
int TestInner(ICollection<string> c) => c.Count;
} |
src/coreclr/vm/jitinterface.cpp
Outdated
THROWS; | ||
GC_TRIGGERS; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
THROWS; | |
GC_TRIGGERS; | |
NOTHROW; | |
GC_NOTRIGGER; |
LEAF
JIT/EE interface methods should be nothrow/notrigger.
This comment was marked as outdated.
This comment was marked as outdated.
@EgorBot -intel -arm64 -profiler --envvars DOTNET_JitDisasm:Foreach Count using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Running;
using System.Runtime.CompilerServices;
BenchmarkSwitcher.FromAssembly(typeof(Bench).Assembly).Run(args);
public class Bench
{
string[] s_ro_str_array = new string[512];
[Benchmark]
public int Foreach()
{
IEnumerable<string> e = s_ro_str_array;
int sum = 0;
foreach (string s in e) sum += s == null ? 0 : s.Length;
return sum;
}
[Benchmark]
public int Count() => CountInner(s_ro_str_array);
[MethodImpl(MethodImplOptions.NoInlining)]
int CountInner(ICollection<string> c) => c.Count;
} The Foreach test should now devirtualize and stack allocate the enumerator (but not yet promote, needs #109237). |
@EgorBot -intel -arm64 using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Running;
using System.Runtime.CompilerServices;
BenchmarkSwitcher.FromAssembly(typeof(Bench).Assembly).Run(args);
public class Bench
{
string[] s_ro_str_array = new string[512];
[Benchmark]
public int Foreach()
{
IEnumerable<string> e = s_ro_str_array;
int sum = 0;
foreach (string s in e) sum += s == null ? 0 : s.Length;
return sum;
}
[Benchmark]
public int Count() => CountInner(s_ro_str_array);
[MethodImpl(MethodImplOptions.NoInlining)]
int CountInner(ICollection<string> c) => c.Count;
} |
@AndyAyersMS the bot didn't report anything because of I guess it has to be wrapped with quotes, but not sure it will work either |
Locally I have intel as faster (at least on windows):
Loop code is similar too: ;;; base inner loop (heap allocated enumerator)
G_M47389_IG03: ;; offset=0x0026
mov edx, dword ptr [rcx+0x08]
;; size=3 bbWeight=0.91 PerfScore 1.82
G_M47389_IG04: ;; offset=0x0029
add ebx, edx
;; size=2 bbWeight=516.86 PerfScore 129.22
G_M47389_IG05: ;; offset=0x002B
mov ecx, dword ptr [rax+0x08]
inc ecx
mov edx, ecx
mov esi, dword ptr [rax+0x0C]
cmp edx, esi
jae SHORT G_M47389_IG08
;; size=14 bbWeight=517.86 PerfScore 2977.70
G_M47389_IG06: ;; offset=0x0039
mov dword ptr [rax+0x08], edx
cmp ecx, esi
jae SHORT G_M47389_IG10
mov rdx, gword ptr [rax+0x10]
cmp ecx, dword ptr [rdx+0x08]
jae SHORT G_M47389_IG11
mov ecx, ecx
mov rcx, gword ptr [rdx+8*rcx+0x10]
test rcx, rcx
jne SHORT G_M47389_IG03
;;; diff inner loop (stack allocated enumerator / not promoted)
G_M47389_IG03: ;; offset=0x0053
mov r8d, dword ptr [rcx+0x08]
;; size=4 bbWeight=0.76 PerfScore 1.52
G_M47389_IG04: ;; offset=0x0057
add eax, r8d
;; size=3 bbWeight=511.71 PerfScore 127.93
G_M47389_IG05: ;; offset=0x005A
mov ecx, dword ptr [rdx+0x08]
inc ecx
mov ebx, dword ptr [rdx+0x0C]
cmp ecx, ebx
jae SHORT G_M47389_IG08
;; size=12 bbWeight=512.71 PerfScore 2819.93
G_M47389_IG06: ;; offset=0x0066
mov dword ptr [rdx+0x08], ecx
mov ecx, dword ptr [rdx+0x08]
cmp ecx, dword ptr [rdx+0x0C]
jae SHORT G_M47389_IG10
mov r8, gword ptr [rdx+0x10]
cmp ecx, dword ptr [r8+0x08]
jae SHORT G_M47389_IG11
mov rcx, gword ptr [r8+8*rcx+0x10]
test rcx, rcx
jne SHORT G_M47389_IG03 |
Now that #109237 is merged, the perf of |
@EgorBot -intel -arm64 using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Running;
using System.Runtime.CompilerServices;
BenchmarkSwitcher.FromAssembly(typeof(Bench).Assembly).Run(args);
public class Bench
{
string[] s_ro_str_array = new string[512];
[Benchmark]
public int Foreach()
{
IEnumerable<string> e = s_ro_str_array;
int sum = 0;
foreach (string s in e) sum += s == null ? 0 : s.Length;
return sum;
}
[Benchmark]
public int Count() => CountInner(s_ro_str_array);
[MethodImpl(MethodImplOptions.NoInlining)]
int CountInner(ICollection<string> c) => c.Count;
} |
|
NAOT failure looks like #104340.
|
Updated diff inner loop for jmp SHORT G_M47389_IG05
align [12 bytes for IG03]
;; size=34 bbWeight=1.84 PerfScore 8.74
G_M47389_IG03: ;; offset=0x0030
mov r10d, dword ptr [r8+0x08]
;; size=4 bbWeight=4.58 PerfScore 9.16
G_M47389_IG04: ;; offset=0x0034
add eax, r10d
;; size=3 bbWeight=944.53 PerfScore 236.13
G_M47389_IG05: ;; offset=0x0037
inc edx
cmp edx, 512
jae SHORT G_M47389_IG08
;; size=10 bbWeight=946.37 PerfScore 1419.55
G_M47389_IG06: ;; offset=0x0041
mov edx, edx
mov r8, gword ptr [rcx+8*rdx+0x10]
test r8, r8
jne SHORT G_M47389_IG03 |
@@ -14238,6 +14238,52 @@ CORINFO_CLASS_HANDLE Compiler::gtGetHelperArgClassHandle(GenTree* tree) | |||
return result; | |||
} | |||
|
|||
//------------------------------------------------------------------------ | |||
// gtGetHelperArgMethodHandle: find the compile time method handle from |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I presume the impl could be merged with gtGetHelperArgClassHandle to reduce copy-paste, but not sure it's worth it
{ | ||
*methodArg = null; | ||
*classArg = null; | ||
return null; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
worth leaving a note why it's no-op for NAOT/R2R?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, good point.
Also this should not be too hard to implement, but I don't know how to do it, and it won't get used. Currently R2R lacks the devirtualization support, and NAOT doesn't handle arrays the same way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we kick off a PGO pipeline?
/azp run runtime-coreclr libraries-pgo, runtime-coreclr pgo, runtime-coreclr pgostress |
Azure Pipelines successfully started running 3 pipeline(s). |
The JIT recently enabled devirtualization of
GetEnumerator
, but other methods were inhibited from devirtualization because the runtime was returning an instantiating stub instead of the actual method. This blocked inlining and the JIT currently will not GDV unless it can also inline.So for instance
ICollection<T>.Count
would not devirtualize.We think we know enough to pass the right inst parameter (the exact method desc) so enable this for the array case, at least for normal jitting.
For NAOT array devirtualization happens via normal paths thanks to
Array<T>
so should already fpr these cases. For R2R we don't do array interface devirt (yet).There was an existing field on
CORINFO_DEVIRTUALIZATION_INFO
to record the need for an inst parameter, but it was unused and so I renamed it and use it for this case.Contributes to #108913.