Use RyuJIT as IL scanner in NativeAOT #83021

MichalStrehovsky · 2023-03-06T07:22:05Z

Definitely Future milestone - just wanted to write down a couple notes as I spent a couple hours looking at this today.

Optimized compilation in NativeAOT consists of two phases - scanning (builds whole program view) and compilation (generates code).

Scanning is currently implemented in C# - we have an IL importer that essentially simulates some things that RyuJIT will need when compiling the method (e.g. method call will require us to provide method body of the callee, etc.).

We discussed a couple times doing this analysis with RyuJIT and there even was a prototype that didn't quite do what we need in NativeAOT (it collected whole program view for RyuJIT's purposes, but not ours, so it's not very useful for NativeAOT - what we need is the list of relocs from a method body).

I put together a hack to let us run RyuJIT as a scanner: MichalStrehovsky@34f7a96

This hack does nothing to prevent codegen, so we do all the unnecessary things like register allocation and code generation in RyuJIT and then we throw it all away. The compile throughput impact is about 15%.

It is a regression in size, both for BasicMinimalApi (10%) and HelloWorld (5%). I expect it also to regress working set as a side effect.

Observations:

We don't have a good way to model optimizations we currently do in scanner through JitInterface (e.g. someType.GetType() == typeof(X) currently avoids requesting a MethodTable with a populated vtable in the C# scanner, but RyuJIT can't communicate a limited MethodTable is sufficient - we get a full one).
Fix VNs for method table fetched from newobj runtimelab#1128 shows up as a problem again because we want to be able to give out MethodTable symbols without a vtable
RyuJIT may optimize away some things during importing around virtual method lookups, or generic dictionary lookup, but then during compilation it's going to ask questions about those that we cannot answer (this structurally falls out from JitInterface design where getCallinfo provides some information that RyuJIT may optimize into something that doesn't need the thing JitInterface provided).

Not sure if all of the size regression can be attributed to above - the above causes enough noise that it wasn't worth spending more time on it.

The text was updated successfully, but these errors were encountered:

ghost · 2023-03-06T07:22:12Z

Tagging subscribers to this area: @agocke, @MichalStrehovsky, @jkotas
See info in area-owners.md if you want to be subscribed.

Issue Details

Definitely Future milestone - just wanted to write down a couple notes as I spent a couple hours looking at this today.

Optimized compilation in NativeAOT consists of two phases - scanning (builds whole program view) and compilation (generates code).

Scanning is currently implemented in C# - we have an IL importer that essentially simulates some things that RyuJIT will need when compiling the method (e.g. method call will require us to provide method body of the callee, etc.).

We discussed a couple times doing this analysis with RyuJIT and there even was a prototype that didn't quite do what we need in NativeAOT (it collected whole program view for RyuJIT's purposes, but not ours, so it's not very useful for NativeAOT - what we need is the list of relocs from a method body).

I put together a hack to let us run RyuJIT as a scanner: MichalStrehovsky@34f7a96

This hack does nothing to prevent codegen, so we do all the unnecessary things like register allocation and code generation in RyuJIT and then we throw it all away. The compile throughput impact is about 15%.

It is a regression in size, both for BasicMinimalApi (10%) and HelloWorld (5%). I expect it also to regress working set as a side effect.

Observations:

We don't have a good way to model optimizations we currently do in scanner through JitInterface (e.g. someType.GetType() == typeof(X) currently avoids requesting a MethodTable with a populated vtable in the C# scanner, but RyuJIT can't communicate a limited MethodTable is sufficient - we get a full one).
Fix VNs for method table fetched from newobj runtimelab#1128 shows up as a problem again because we want to be able to give out MethodTable symbols without a vtable
RyuJIT may optimize away some things during importing around virtual method lookups, or generic dictionary lookup, but then during compilation it's going to ask questions about those that we cannot answer (this structurally falls out from JitInterface design where getCallinfo provides some information that RyuJIT may optimize into something that doesn't need the thing JitInterface provided).

Not sure if all of the size regression can be attributed to above - the above causes enough noise that it wasn't worth spending more time on it.

Author:	MichalStrehovsky
Assignees:	-
Labels:	`area-NativeAOT-coreclr`
Milestone:	8.0.0

MichalStrehovsky · 2023-10-03T08:55:40Z

It might be beneficial to be able to scan methods basic block after basic block to fix issues like #92850.

@stephentoub

@stephentoub found out that for following code: ```csharp using System.Buffers; Foo<Bar>(); static T[] Foo<T>() { if (typeof(T).IsValueType) { return ArrayPool<T>.Shared.Rent(42); } return null!; } class Bar {} ``` We end up generating `ArrayPool`s of `Bar` even though it's obviously never reachable. The problem is architectural: * We run a whole program analysis phase that tries to figure out things like generic dictionary layouts so that later, in code generation phase, we can inline offsets into generic dictionaries into codegen. * For the above code, whole program analysis decides that the dictionary layout of `Foo<__Canon>` needs a slot for `ArrayPool<!0>`. * Codegen then optimizes out the `IsValueType` branch because `__Canon` is never a valuetype. But we already allocated the dictionary slot and will fill it out, even though it ends up unused due to the optimization. We're going to run into issues like this until dotnet#83021 is addressed. Whole program analysis cannot currently assume a certain optimization happens because we don't know whether RyuJIT will do it. The only way we can "optimize" during whole program analysis is if we rewrite IL and give RyuJIT no saying in whether to do an optimization or not. Rewriting the IL is not great because it e.g. causes PGO data to not match. I don't like doing it, but there's nothing else we can do. This change extends dead block elimination to understand `typeof(X).IsValueType`. If we recognize a branch is reachable under this condition, we evaluate whether this is true or false and replace the basic block with nops.

@stephentoub

@stephentoub found out that for following code: ```csharp using System.Buffers; Foo<Bar>(); static T[] Foo<T>() { if (typeof(T).IsValueType) { return ArrayPool<T>.Shared.Rent(42); } return null!; } class Bar {} ``` We end up generating `ArrayPool`s of `Bar` even though it's obviously never reachable. The problem is architectural: * We run a whole program analysis phase that tries to figure out things like generic dictionary layouts so that later, in code generation phase, we can inline offsets into generic dictionaries into codegen. * For the above code, whole program analysis decides that the dictionary layout of `Foo<__Canon>` needs a slot for `ArrayPool<!0>`. * Codegen then optimizes out the `IsValueType` branch because `__Canon` is never a valuetype. But we already allocated the dictionary slot and will fill it out, even though it ends up unused due to the optimization. We're going to run into issues like this until #83021 is addressed. Whole program analysis cannot currently assume a certain optimization happens because we don't know whether RyuJIT will do it. The only way we can "optimize" during whole program analysis is if we rewrite IL and give RyuJIT no saying in whether to do an optimization or not. Rewriting the IL is not great because it e.g. causes PGO data to not match. I don't like doing it, but there's nothing else we can do. This change extends dead block elimination to understand `typeof(X).IsValueType`. If we recognize a branch is reachable under this condition, we evaluate whether this is true or false and replace the basic block with nops.

@stephentoub

@stephentoub found out that for following code: ```csharp using System.Buffers; Foo<Bar>(); static T[] Foo<T>() { if (typeof(T).IsValueType) { return ArrayPool<T>.Shared.Rent(42); } return null!; } class Bar {} ``` We end up generating `ArrayPool`s of `Bar` even though it's obviously never reachable. The problem is architectural: * We run a whole program analysis phase that tries to figure out things like generic dictionary layouts so that later, in code generation phase, we can inline offsets into generic dictionaries into codegen. * For the above code, whole program analysis decides that the dictionary layout of `Foo<__Canon>` needs a slot for `ArrayPool<!0>`. * Codegen then optimizes out the `IsValueType` branch because `__Canon` is never a valuetype. But we already allocated the dictionary slot and will fill it out, even though it ends up unused due to the optimization. We're going to run into issues like this until dotnet#83021 is addressed. Whole program analysis cannot currently assume a certain optimization happens because we don't know whether RyuJIT will do it. The only way we can "optimize" during whole program analysis is if we rewrite IL and give RyuJIT no saying in whether to do an optimization or not. Rewriting the IL is not great because it e.g. causes PGO data to not match. I don't like doing it, but there's nothing else we can do. This change extends dead block elimination to understand `typeof(X).IsValueType`. If we recognize a branch is reachable under this condition, we evaluate whether this is true or false and replace the basic block with nops.

MichalStrehovsky added the area-NativeAOT-coreclr label Mar 6, 2023

MichalStrehovsky added this to the 8.0.0 milestone Mar 6, 2023

MichalStrehovsky modified the milestones: 8.0.0, Future Mar 6, 2023

agocke added this to AppModel Mar 6, 2023

MichalStrehovsky mentioned this issue Apr 2, 2023

Add constprop handling for typeof(T) == typeof(Foo) #84224

Merged

MichalStrehovsky mentioned this issue Aug 29, 2023

Trim XmlSerializers codegen if unsupported #88796

Closed

MichalStrehovsky mentioned this issue Oct 3, 2023

NativeAot isinst branch trimming #92850

Closed

MichalPetryka mentioned this issue Nov 20, 2023

[API Proposal]: RuntimeFieldHandle.Offset #94976

Open

MichalStrehovsky mentioned this issue Jan 17, 2024

Dead code elimination for if (typeof(T).IsValueType) #97080

Merged

AndyAyersMS mentioned this issue Oct 16, 2024

JIT: Interprocedural Analysis (IPA) in .NET 10 #108931

Open

EgorBo mentioned this issue Oct 17, 2024

JIT Focus Area for .NET 10 #108988

Open

13 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use RyuJIT as IL scanner in NativeAOT #83021

Use RyuJIT as IL scanner in NativeAOT #83021

MichalStrehovsky commented Mar 6, 2023

ghost commented Mar 6, 2023

MichalStrehovsky commented Oct 3, 2023

Use RyuJIT as IL scanner in NativeAOT #83021

Use RyuJIT as IL scanner in NativeAOT #83021

Comments

MichalStrehovsky commented Mar 6, 2023

ghost commented Mar 6, 2023

MichalStrehovsky commented Oct 3, 2023