Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test failure: Loader\\CollectibleAssemblies\\ResolvedFromDifferentContext\\ResolvedFromDifferentContext\\ResolvedFromDifferentContext.cmd #68112

Closed
BruceForstall opened this issue Apr 16, 2022 · 16 comments · Fixed by #68550

Comments

@BruceForstall
Copy link
Member

Assert failure(PID 5456 [0x00001550], Thread: 10976 [0x2ae0]): refRetVal->GetAt(i) != NULL

CORECLR! RuntimeTypeHandle::GetInterfaces + 0x4A8 (0x00007ffba633e958) SYSTEM.PRIVATE.CORELIB! <no symbol> + 0x0 (0x00007ffba5a12078)
SYSTEM.PRIVATE.CORELIB! <no symbol> + 0x0 (0x00007ffba5a0f9f8) SYSTEM.PRIVATE.CORELIB! <no symbol> + 0x0 (0x00007ffba5a0f634)
SYSTEM.PRIVATE.CORELIB! <no symbol> + 0x0 (0x00007ffba5546958) <no module>! <no symbol> + 0x0 (0x00007ffb46b42924)
File: D:\a_work\1\s\src\coreclr\vm\runtimehandles.cpp Line: 785
Image: D:\h\w\BB20096C\p\corerun.exe

Return code:      1
Raw output file:      D:\h\w\BB20096C\w\A3480908\uploads\Reports\Loader.CollectibleAssemblies\ResolvedFromDifferentContext\ResolvedFromDifferentContext\ResolvedFromDifferentContext.output.txt
Raw output:
BEGIN EXECUTION
"D:\h\w\BB20096C\p\corerun.exe" -p "System.Reflection.Metadata.MetadataUpdater.IsSupported=false"  ResolvedFromDifferentContext.dll
Running test case LoadOverride
Loading TestInterface by alc "Dependencies" System.Runtime.Loader.AssemblyLoadContext #1 for alc "Test2" TestAssemblyLoadContext #0
Load done, type TestClass.Class obtained
Expected: 100
Actual: -1073740286
END EXECUTION - FAILED
FAILED
Test Harness Exitcode is : 1
To run the test:

set CORE_ROOT=D:\h\w\BB20096C\p
D:\h\w\BB20096C\w\A3480908\e\Loader\CollectibleAssemblies\ResolvedFromDifferentContext\ResolvedFromDifferentContext\ResolvedFromDifferentContext.cmd
Expected: True
Actual:   False

https://dev.azure.com/dnceng/public/_build/results?buildId=1721802&view=ms.vss-test-web.build-test-results-tab&runId=46735808&resultId=108688&paneView=debug

Fails for:
coreclr windows arm64 Checked gcstress0x3
coreclr Linux arm Checked gcstress0x3
coreclr OSX arm64 Checked gcstress0x3
coreclr Linux arm64 Checked gcstress0x3
coreclr Linux arm64 Checked gcstress0xc
coreclr Linux x64 Checked gcstress0xc

Also fails with a different message:

Assert failure(PID 3348 [0x00000d14], Thread: 3084 [0x0c0c]): Postcondition failure: FAILED: GetWriteableData()->m_hExposedClassObject != 0

CORECLR! CHECK::Trigger + 0x2D1 (0x00007fffe310c1d1) CORECLR! MethodTable::GetManagedClassObject + 0x1B6 (0x00007fffe2b640d6)
CORECLR! RuntimeTypeHandle::GetInterfaces + 0x5BF (0x00007fffe2eae72f) SYSTEM.PRIVATE.CORELIB! <no symbol> + 0x0 (0x00007fffe22c8ac0)
SYSTEM.PRIVATE.CORELIB! <no symbol> + 0x0 (0x00007fffe22c6f04) SYSTEM.PRIVATE.CORELIB! <no symbol> + 0x0 (0x00007fffe22c6c54)
SYSTEM.PRIVATE.CORELIB! <no symbol> + 0x0 (0x00007fffe1f9309f) <no module>! <no symbol> + 0x0 (0x00007fff837f2686)
<no module>! <no symbol> + 0x0 (0x000001de5dca8d20) <no module>! <no symbol> + 0x0 (0x000001de56f0a4d0)
File: D:\a_work\1\s\src\coreclr\vm\methodtable.cpp Line: 3629
Image: C:\h\w\B611097B\p\corerun.exe

Return code:      1
Raw output file:      C:\h\w\B611097B\w\A0D70927\uploads\Reports\Loader.CollectibleAssemblies\ResolvedFromDifferentContext\ResolvedFromDifferentContext\ResolvedFromDifferentContext.output.txt
Raw output:
BEGIN EXECUTION
"C:\h\w\B611097B\p\corerun.exe" -p "System.Reflection.Metadata.MetadataUpdater.IsSupported=false"  ResolvedFromDifferentContext.dll
Running test case LoadOverride
Loading TestInterface by alc "Dependencies" System.Runtime.Loader.AssemblyLoadContext #1 for alc "Test2" TestAssemblyLoadContext #0
Load done, type TestClass.Class obtained
Expected: 100
Actual: -1073740286
END EXECUTION - FAILED
FAILED
Test Harness Exitcode is : 1
To run the test:

set CORE_ROOT=C:\h\w\B611097B\p
C:\h\w\B611097B\w\A0D70927\e\Loader\CollectibleAssemblies\ResolvedFromDifferentContext\ResolvedFromDifferentContext\ResolvedFromDifferentContext.cmd
Expected: True
Actual:   False

for:
coreclr windows x64 Checked gcstress0x3
coreclr windows x86 Checked gcstress0x3

@dotnet-issue-labeler dotnet-issue-labeler bot added area-AssemblyLoader-coreclr untriaged New issue has not been triaged by the area owner labels Apr 16, 2022
@ghost
Copy link

ghost commented Apr 16, 2022

Tagging subscribers to this area: @vitek-karas, @agocke, @VSadov
See info in area-owners.md if you want to be subscribed.

Issue Details
Assert failure(PID 5456 [0x00001550], Thread: 10976 [0x2ae0]): refRetVal->GetAt(i) != NULL

CORECLR! RuntimeTypeHandle::GetInterfaces + 0x4A8 (0x00007ffba633e958) SYSTEM.PRIVATE.CORELIB! <no symbol> + 0x0 (0x00007ffba5a12078)
SYSTEM.PRIVATE.CORELIB! <no symbol> + 0x0 (0x00007ffba5a0f9f8) SYSTEM.PRIVATE.CORELIB! <no symbol> + 0x0 (0x00007ffba5a0f634)
SYSTEM.PRIVATE.CORELIB! <no symbol> + 0x0 (0x00007ffba5546958) <no module>! <no symbol> + 0x0 (0x00007ffb46b42924)
File: D:\a_work\1\s\src\coreclr\vm\runtimehandles.cpp Line: 785
Image: D:\h\w\BB20096C\p\corerun.exe

Return code:      1
Raw output file:      D:\h\w\BB20096C\w\A3480908\uploads\Reports\Loader.CollectibleAssemblies\ResolvedFromDifferentContext\ResolvedFromDifferentContext\ResolvedFromDifferentContext.output.txt
Raw output:
BEGIN EXECUTION
"D:\h\w\BB20096C\p\corerun.exe" -p "System.Reflection.Metadata.MetadataUpdater.IsSupported=false"  ResolvedFromDifferentContext.dll
Running test case LoadOverride
Loading TestInterface by alc "Dependencies" System.Runtime.Loader.AssemblyLoadContext #1 for alc "Test2" TestAssemblyLoadContext #0
Load done, type TestClass.Class obtained
Expected: 100
Actual: -1073740286
END EXECUTION - FAILED
FAILED
Test Harness Exitcode is : 1
To run the test:

set CORE_ROOT=D:\h\w\BB20096C\p
D:\h\w\BB20096C\w\A3480908\e\Loader\CollectibleAssemblies\ResolvedFromDifferentContext\ResolvedFromDifferentContext\ResolvedFromDifferentContext.cmd
Expected: True
Actual:   False

https://dev.azure.com/dnceng/public/_build/results?buildId=1721802&view=ms.vss-test-web.build-test-results-tab&runId=46735808&resultId=108688&paneView=debug

Fails for:
coreclr windows arm64 Checked gcstress0x3
coreclr Linux arm Checked gcstress0x3
coreclr OSX arm64 Checked gcstress0x3
coreclr Linux arm64 Checked gcstress0x3
coreclr Linux arm64 Checked gcstress0xc
coreclr Linux x64 Checked gcstress0xc

Also fails with a different message:

Assert failure(PID 3348 [0x00000d14], Thread: 3084 [0x0c0c]): Postcondition failure: FAILED: GetWriteableData()->m_hExposedClassObject != 0

CORECLR! CHECK::Trigger + 0x2D1 (0x00007fffe310c1d1) CORECLR! MethodTable::GetManagedClassObject + 0x1B6 (0x00007fffe2b640d6)
CORECLR! RuntimeTypeHandle::GetInterfaces + 0x5BF (0x00007fffe2eae72f) SYSTEM.PRIVATE.CORELIB! <no symbol> + 0x0 (0x00007fffe22c8ac0)
SYSTEM.PRIVATE.CORELIB! <no symbol> + 0x0 (0x00007fffe22c6f04) SYSTEM.PRIVATE.CORELIB! <no symbol> + 0x0 (0x00007fffe22c6c54)
SYSTEM.PRIVATE.CORELIB! <no symbol> + 0x0 (0x00007fffe1f9309f) <no module>! <no symbol> + 0x0 (0x00007fff837f2686)
<no module>! <no symbol> + 0x0 (0x000001de5dca8d20) <no module>! <no symbol> + 0x0 (0x000001de56f0a4d0)
File: D:\a_work\1\s\src\coreclr\vm\methodtable.cpp Line: 3629
Image: C:\h\w\B611097B\p\corerun.exe

Return code:      1
Raw output file:      C:\h\w\B611097B\w\A0D70927\uploads\Reports\Loader.CollectibleAssemblies\ResolvedFromDifferentContext\ResolvedFromDifferentContext\ResolvedFromDifferentContext.output.txt
Raw output:
BEGIN EXECUTION
"C:\h\w\B611097B\p\corerun.exe" -p "System.Reflection.Metadata.MetadataUpdater.IsSupported=false"  ResolvedFromDifferentContext.dll
Running test case LoadOverride
Loading TestInterface by alc "Dependencies" System.Runtime.Loader.AssemblyLoadContext #1 for alc "Test2" TestAssemblyLoadContext #0
Load done, type TestClass.Class obtained
Expected: 100
Actual: -1073740286
END EXECUTION - FAILED
FAILED
Test Harness Exitcode is : 1
To run the test:

set CORE_ROOT=C:\h\w\B611097B\p
C:\h\w\B611097B\w\A0D70927\e\Loader\CollectibleAssemblies\ResolvedFromDifferentContext\ResolvedFromDifferentContext\ResolvedFromDifferentContext.cmd
Expected: True
Actual:   False

for:
coreclr windows x64 Checked gcstress0x3
coreclr windows x86 Checked gcstress0x3

Author: BruceForstall
Assignees: -
Labels:

GCStress, area-AssemblyLoader-coreclr, untriaged

Milestone: -

@BruceForstall
Copy link
Member Author

@janvorli Looks like it came from #67922

@VSadov
Copy link
Member

VSadov commented Apr 18, 2022

It looks like the fix in #67922 is sufficient for ResolvedFromDifferentContext testcase to pass normally, but is not sufficient under GC stress.

@janvorli
Copy link
Member

I have investigated the problem. It occurs when the managed LoaderAllocator gets collected before the RuntimeTypeHandle::GetInterfaces is called. To create the managed Type objects for the interfaces, we need the managed LoaderAllocators related to the ALCs where the assemblies of the interfaces were loaded, since we need to store it in the ReflectClassBaseObject::m_keepAlive in order to ensure that the created Type object keeps alive the loader allocator machinery.

It seems that a natural way to solve this issue would be to modify the GC get_class_object and the related GCToEEInterface::GetLoaderAllocatorObjectForGC and MethodTable::GetLoaderAllocatorObjectForGC to return an array of LoaderAllocator objects instead of just one and the element count. This array would contain references to LoaderAllocators of the type and also of all the interfaces it implements that are defined in collectible assemblies. And obviously, GC would need to be modified to walk this array.

I was also thinking that there might be a way to recreate the managed LoaderAllocator instead when creating the managed Type instances in the interface enumeration if it doesn't exist anymore. But I am not sure if that is possible at that stage.

@jkotas do you happen to have some other idea on how to fix the problem?

@jkotas
Copy link
Member

jkotas commented Apr 20, 2022

The unit of unloading is whole assembly. We do not support unloading of individual types from the assembly.

This should work based on assembly references. When an assembly picks up a reference to collectible assembly, this reference should be added to the graph of loader allocators. Interfaces that happen to get resolved through this reference should be kept alive by virtue of being contained in assemblies represented in the graph.

I do not think we want to be changing MethodTable::GetLoaderAllocatorObjectForGC to return multiple items.

@jkotas jkotas added this to the 7.0.0 milestone Apr 20, 2022
@janvorli
Copy link
Member

The references between the native loader allocators work fine, these were fixed by my previous change. The MethodTables of the interfaces are kept alive too. But those cannot keep the managed LoaderAllocator and LoaderAllocatorScout instances alive, as those are kept alive only by the GC and the interface types don't have instances. And without the managed LoaderAllocator, we cannot create the managed Type instance for the interfaces. Those are created on demand in the RuntimeTypeHandle::GetInterfaces. So we need to somehow keep the managed LoaderAllocator alive or to be able to recreate it after it is destroyed.

@VSadov
Copy link
Member

VSadov commented Apr 20, 2022

Is this only a problem with interfaces or it is more general problem of types materialized via reflection while not represented by any instances. I.E. can fetching type constraints of some type parameter also see these issues?

@janvorli
Copy link
Member

Actually, I've just realized it would not work even if we made the changes in the GC related stuff I've mentioned. The issue is really with managed Type materialization via reflection, so no instances of the actual target type may be present on the heap. It seems it will likely cause the same troubles in all cases of type materialization via reflection.

@janvorli
Copy link
Member

To be precise, it would happen in all cases when a reflection would return a type from a collectible ALC1 based on something from some ALC2. I am not sure what would be all the other cases besides the GetInterfaces. I'll take a look.

@jkotas
Copy link
Member

jkotas commented Apr 20, 2022

Did we have this fundamental problem since forever or was it introduced by #67160?

@janvorli
Copy link
Member

We did have a problem forever, but it would never be hit as the problem that #67160 fixed would hide it.

I have looked more into it. It looks like the culprit for this issue is actually at a bit different place. The issue is that when my newly added fix calls LoaderAllocator::EnsureReference, the passed in pOtherLA doesn't have the LoaderAllocator yet. If it existed, the LoaderAllocator::CheckAddReference_Unlocked that's called from the LoaderAllocator::EnsureReference would create a handle to that LoaderAllocator in the parent assembly LoaderAllocator and hold it alive and everything would work. But when the managed LoaderAllocator doesn't exist yet, the LoaderAllocator::CheckAddReference_Unlocked ends up silently ignoring it.

It seems that ensuring that the managed LoaderAllocator exists before calling the EnsureReference in my previous change would fix this issue.

@janvorli
Copy link
Member

the passed in pOtherLA doesn't have the LoaderAllocator yet.

Or anymore, which seems to be the case here - the LoaderAllocator::m_hLoaderAllocatorObjectHandle is non-null, but the handle value itself is null. So clearly, the handle was allocated and later released.

@jkotas
Copy link
Member

jkotas commented Apr 20, 2022

I would expect that once LoaderAllocator::m_hLoaderAllocatorObjectHandle becomes NULL, nothing alive should be holding a reference to the associated assembly and nothing should be able get a new reference to the associated assembly.

@janvorli
Copy link
Member

The handle becomes NULL once there are no managed objects from the ALC. However, in the problematic case, there was only the managed Assembly containing definitions of the interface in that ALC (and the managed LoaderAllocator). A type in another ALC just implements interfaces from the first ALC, so there are never any managed objects of types in the first ALC and the whole managed side goes away. Only the native part remains alive thanks to the references between the native LoaderAllocators.
And then code in the second ALC calls GetInterfaces on a type from the second ALC. Those interface types live in the first ALC. We try to create the managed Type objects for them and set their m_keepAlive to the managed LoaderAllocator of the first ALC. But it is already collected.

After looking more into it, it seems that a fix might be to also make AssemblyLoadContext reference the managed LoaderAllocator. That would ensure that the managed LoaderAllocator is alive when the AssemblyLoadContext is loading the interface assembly. And that in turn would ensure that the native LoaderAllocator for the second ALC would be able to keep handle to the managed LoaderAllocator of the first ALC. The new reference to managed LoaderAllocator from the AssemblyLoadContext would need to be cleared when unloading starts, otherwise the unload would never complete.

@janvorli
Copy link
Member

Hmm, it should actually work fine, while the AssemblyLoadContext is alive and unload has not started yet, the CustomAssemblyBinder should hold strong handle to the managed LoaderAllocator. I need to investigate what's going on that this doesn't work.

@janvorli
Copy link
Member

So I have a full understanding of how we get into the problematic state. The key thing is how the test loads the interface assembly. It creates the collectible ALC1 inside of the ALC.Load method of the other collectible ALC2, loads the interface assembly using that and returns the resulting assembly from the Load method. So when it returns, there are no more references to that ALC1. There is still a reference to the managed LoaderAllocator of that ALC1 from the managed Assembly instance, however, that managed assembly instances is only referenced in the RuntimeInvokeHostAssemblyResolver that invoked the managed code to load the assembly. Once we return from that function, the reference to that managed Assembly is gone too. At the time we return back to AppDomain::BindAssemblySpec and call AppDomain::AddFileToCache from there, we've already performed at least a couple of GC collections, so both the managed Assembly and the managed LoaderAllocator are already collected. And that's the point where we are adding reference between the native LoaderAllocators of the parent and resolved assembly and due to the sequence of events, we don't have the managed LoaderAllocator anymore.

One way to fix that issue may be to move adding the reference between the native LoaderAllocators up to the RuntimeInvokeHostAssemblyResolver function where we still have the reference to the just loaded managed Assembly. Interestingly enough, that's a place where I had it originally, but I've ended up moving it to the common place in AppDomain::AddFileToCache after consulting it with @vitek-karas. It looked like a good idea at that time.

@jeffschwMSFT jeffschwMSFT removed the untriaged New issue has not been triaged by the area owner label Apr 25, 2022
@ghost ghost added the in-pr There is an active PR which will close this issue when it is merged label Apr 26, 2022
@ghost ghost removed the in-pr There is an active PR which will close this issue when it is merged label Apr 27, 2022
@ghost ghost locked as resolved and limited conversation to collaborators May 28, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants