-
Notifications
You must be signed in to change notification settings - Fork 201
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[NativeAOT-LLVM] Add a portable implementation of RhGetCodeTarget #2333
[NativeAOT-LLVM] Add a portable implementation of RhGetCodeTarget #2333
Conversation
@@ -116,39 +122,24 @@ public void InitializeDebugInfo(MethodDebugInformation debugInfo) | |||
public void InitializeLocalTypes(TypeDesc[] localTypes) | |||
{ | |||
} | |||
} | |||
|
|||
internal sealed class LlvmMethodBodyNode : LLVMMethodCodeNode, IMethodBodyNode |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I deleted the LlvmMethodBodyNode/LLVMMethodCodeNode
fork, it did not look to serve any meaningful purpose. MethodCodeNode
does not use such pattern either.
|
||
public override IEnumerable<DependencyListEntry> GetStaticDependencies(NodeFactory factory) | ||
public ISymbolNode GetUnboxingThunkTarget(NodeFactory factory) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ISpecialUnboxThunkNode
implementation is C&P from MethodCodeNode
.
...ILCompiler.Compiler/Compiler/DependencyAnalysis/Target_Wasm/UnboxingStubTargetMappingNode.cs
Outdated
Show resolved
Hide resolved
@dotnet/nativeaot-llvm |
This may be interesting to upstream to address dotnet/runtime#73906 |
uint32_t instantiatingStubCount = allMappingsCount - simpleStubCount; | ||
|
||
if (!initMap(m_unboxingStubTargetsMap, &allMappings->Data[0], simpleStubCount) || | ||
!initMap(m_unboxingAndInstantiatingStubTargetsMap, &allMappings->Data[simpleStubCount], instantiatingStubCount)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The core runtime should not need this mapping for anything. Is that correct?
Would it make more sense to initialize this map lazily and do the whole thing in C#, like we do that for other maps that are needed by reflection?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The core runtime should not need this mapping for anything. Is that correct?
Yes. It's only consumed at the CoreLib layer and above.
Would it make more sense to initialize this map lazily and do the whole thing in C#, like we do that for other maps that are needed by reflection?
I considered it. The main problem is that the functionality is used by delegate equality, which seems low-level enough to be non-trivial to initialize robustly in managed code.
Additionally, the SHash
closed hashing table implementation is very nicely fitted to this use case, a managed dictionary would use 3x+ more memory.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it need to be used by the delegate equality in the first place? It looks like that it is a left-over from some flavor of multi-module mode that allowed multiple unboxing stubs to exist for given method. It should not be a thing in current native AOT. You can try to upstream removing it from delegate equality.
Even if it still needed for some reason, delegate equality should not be used by low-level runtime code where the lazy initialization would lead to problems.
A reverse map like this one is a common problem for reflection-like functionality. I think it would be fine to implement this reverse map in a similar way. We have a few of those. For example, look for FunctionPointersToOffsets
. (This one is also more complicated than it needs to be due to support for multimodule flavors that we do not care about anymore.)
In fact, it looks like this reverse map was implemented in managed code at some point. There are unused UnboxingAndInstantiatingStubMap
and struct UnboxingAndInstantiatingStubMapEntry
that one would need to implement reverse maps like this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it need to be used by the delegate equality in the first place? It looks like that it is a left-over from some flavor of multi-module mode that allowed multiple unboxing stubs to exist for given method. It should not be a thing in current native AOT. You can try to upstream removing it from delegate equality.
Doing some research, agree it looks unnecessary.
I will pursue its removal and a managed implementation for these maps.
...ILCompiler.Compiler/Compiler/DependencyAnalysis/Target_Wasm/UnboxingStubTargetMappingNode.cs
Outdated
Show resolved
Hide resolved
...ILCompiler.Compiler/Compiler/DependencyAnalysis/Target_Wasm/UnboxingStubTargetMappingNode.cs
Outdated
Show resolved
Hide resolved
uint32_t instantiatingStubCount = allMappingsCount - simpleStubCount; | ||
|
||
if (!initMap(m_unboxingStubTargetsMap, &allMappings->Data[0], simpleStubCount) || | ||
!initMap(m_unboxingAndInstantiatingStubTargetsMap, &allMappings->Data[simpleStubCount], instantiatingStubCount)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it need to be used by the delegate equality in the first place? It looks like that it is a left-over from some flavor of multi-module mode that allowed multiple unboxing stubs to exist for given method. It should not be a thing in current native AOT. You can try to upstream removing it from delegate equality.
Even if it still needed for some reason, delegate equality should not be used by low-level runtime code where the lazy initialization would lead to problems.
A reverse map like this one is a common problem for reflection-like functionality. I think it would be fine to implement this reverse map in a similar way. We have a few of those. For example, look for FunctionPointersToOffsets
. (This one is also more complicated than it needs to be due to support for multimodule flavors that we do not care about anymore.)
In fact, it looks like this reverse map was implemented in managed code at some point. There are unused UnboxingAndInstantiatingStubMap
and struct UnboxingAndInstantiatingStubMapEntry
that one would need to implement reverse maps like this.
5d120c4
to
ec99e4e
Compare
@dotnet/nativeaot-llvm PTAL |
In this implementation: 1) The compiler generates an array of tuples { Stub, Target }. This is done for both simple and instantiating stubs. 2) At runtime startup, pointers into this array are stored in two hash tables, with the stubs as keys. This allow for more or less efficient target retrieval. Costs: 1) The size of the array is ~5KB for the reflection test, which has ~500 simple unboxing stubs and ~100 instantiating ones. Thus for a medium-sized program, overhead on the order of ~50KB is expected. 2) The hash table construction startup penalty should be on the order of 10us for the reflection test and scaling linearly for larger programs, which seems acceptable.
This reverts commit 82c5c95.
dotnet/corert@08d78ae The original motivation for this was handling import stubs: ``` Function pointer equality comparison was not handling cross-module pointers correctly when optimizations were enabled (causes target pointers to be wrapped in jump stubs sometimes). The delegate equality comparison was hitting this bug. ``` We do not have import stubs anymore and unwrapping unboxing stubs serves no purpose here. Microbenchmarks of delegate equality show ~3x improvement with this change: ``` Bench_DelegateEquality_Positive_OpenStatic<10000000>() took: 355 ms Bench_DelegateEquality_Positive_ClosedStatic<10000000>() took: 367 ms Bench_DelegateEquality_Positive_ClosedInstance<10000000>() took: 371 ms Bench_DelegateEquality_Positive_OpenStatic<10000000>() took: 121 ms Bench_DelegateEquality_Positive_ClosedStatic<10000000>() took: 120 ms Bench_DelegateEquality_Positive_ClosedInstance<10000000>() took: 122 ms ``` Additionally, there is some desire to upstream changes for a portable RhGetCodeTarget implementation. Not having to deal with it at this relatively low-level layer will make things more robust.
In this implementation: 1) The compiler generates an array of tuples { Stub, Target }. This is done for both simple and instantiating stubs. 2) At runtime, this information is sorted, with lookups done via binary search. Costs: 1) The size of the array is ~5KB for the HelloWasm test, which has ~400 simple unboxing stubs and ~100 instantiating ones. For a medium-sized application that has ~50K stubs, that's 400K on 32 bit (and double that on 64 bit). 2) Binary search is not "cheap" and, unfortunately, we will often hit it on the startup path because of ComputeLdftnReverseLookup. Fortunately, we should rarely hit in steady state, as it will only be used for Delegate.MethodInfo and Delegate.CreateDelegate (the latter - in rare scenarios). 3) Managed code that supports all of this clocks in at about 5K, which is rather large, but is mostly a reflection of how our codegen is not that great. Code structure choices: 1) ILC - layered under MetadataManager like the other maps used by reflection. 2) Runtime - layered in CoreLib for consistency with the existing (non-portable) implementation.
ec99e4e
to
42bf2dd
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you!
In this implementation:
Costs:
Code structure choices:
Contributes to #2307.