-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CoreCLR initialization failures logging #78790
CoreCLR initialization failures logging #78790
Conversation
This change adds detecting and logging of failures during coreclr initialization. For logging, it uses a new host API `coreclr_set_error_writer` to register a callback to report the errors to the host. The hosts have support for optional usage of this API so that they can work with older runtime versions as well. The logging checks and reports failures with: * System.Private.CoreLib.dll * GC initialization * JIT initialization * libSystem.Native.so/dylib on Unix The logging messages should allow customers to self-diagnose the issues causing the failures.
Here are some examples of the error logs: Missing SPC fileFailed to load System.Private.CoreLib.dll (error code 0x80070057) Could not load file or assembly 'F:\git\runtime8\artifacts\tests\coreclr\windows.x64.Debug\Tests\Core_Root\System.Private.CoreLib.dll'. The parameter is incorrect. (0x80070057 (E_INVALIDARG)) Another assembly instead of SPCFailed to load System.Private.CoreLib.dll (error code 0x80131522) Could not load type 'System.Object' from assembly 'System.ObjectModel, Version=7.0.0.0, Culture=neutral, PublicKeyToken=b03f5f7f11d50a3a'. Text file instead of SPCFailed to load System.Private.CoreLib.dll (error code 0x80131018) Could not load file or assembly 'F:\git\runtime8\artifacts\tests\coreclr\windows.x64.Debug\Tests\Core_Root\System.Private.CoreLib.dll'. The module was expected to contain an assembly manifest. (0x80131018 (COR_E_ASSEMBLYEXPECTED)) Wrong version of SPC (x86)Failed to load System.Private.CoreLib.dll (error code 0x8007000B) Could not load file or assembly 'F:\git\runtime8\artifacts\tests\coreclr\windows.x64.Debug\Tests\Core_Root\System.Private.CoreLib.dll'. An attempt was made to load a program with an incorrect format. (0x8007000B (COR_E_BADIMAGEFORMAT)) Wrong version of SPC (x64 from 3.1):Failed to load System.Private.CoreLib.dll (error code 0x80131522) Could not load type 'System.Runtime.CompilerServices.CastHelpers' from assembly 'System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e'. OOMFailed to load System.Private.CoreLib.dll (error code 0x8007000E) Wrong setting of GC log fileGC: Cannot create log file |
src/coreclr/vm/ceemain.cpp
Outdated
@@ -947,6 +950,21 @@ void EEStartupHelper() | |||
g_MiniMetaDataBuffMaxSize, MEM_COMMIT, PAGE_READWRITE); | |||
#endif // FEATURE_MINIMETADATA_IN_TRIAGEDUMPS | |||
|
|||
#ifdef TARGET_UNIX | |||
// Preload the libSystem.Native.so/dylib to detect possible problems with loading it early |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this going to work for singlefilehost?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, singlefilehost should have been excluded here
Where is the invalid arg comming from in this case? Would it be feasible to change it to file not found? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the work here a lot.
It comes from here: runtime/src/coreclr/binder/utils.cpp Line 202 in 89cfbfd
When the S.P.C. is not found at the expected location, corerun tries to find it in the TPA list. To iterate over the list, it calls the runtime/src/coreclr/binder/assemblybindercommon.cpp Lines 297 to 315 in 89cfbfd
|
* revert the try_get_export to not to have the isOptional argument * Unify the error logging in JIT loading to single variadic macro * Also fixes Unix build, the code in gcenv.unix.cpp cannot use the GCToEEInterface
Yes, I agree. (A separate question is why we need to be looking for CoreLib in so many places.) |
src/coreclr/vm/ceemain.cpp
Outdated
@@ -947,6 +950,29 @@ void EEStartupHelper() | |||
g_MiniMetaDataBuffMaxSize, MEM_COMMIT, PAGE_READWRITE); | |||
#endif // FEATURE_MINIMETADATA_IN_TRIAGEDUMPS | |||
|
|||
#ifdef TARGET_UNIX | |||
// Check if the current code is executing in the single file host or in libcoreclr.so. The libSystem.Native is linked |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be better to just check if (!g_coreclr_embedded)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was looking for a check like that and have failed to locate this one :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you could also use the CORECLR_EMBEDDED
define?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, nice. I have not noticed that the ceemain.cpp is one of the few files that are compiled separately for single file host and libcoreclr.so.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@elinor-fung I have found that when S.P.C. is bundled, this NativeLibrary::LoadLibraryByName
call doesn't work as the S.P.C. doesn't exist in the filesystem, just in the bundle. This causes e.g. the
Microsoft.NET.HostModel.Tests.BundleAndRun.TestWithRelativePaths
test to fail.
Do I need to pass something else than SystemDomain::SystemAssembly()
to NativeLibrary::LoadLibraryByName
to make it work?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, it seems I should pass in the DLLIMPORTSEARCHPATH_ASSEMBLYDIRECTORY.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem was actually different, in the EEStartupHelper
, the native dll search paths from the properties passed by the host are not set yet. Moving the check to the end of the CreateAppDomainWithManager
resolved the issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sigh, yet one more host test failing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that might be a variation of #75387? The case that is failing uses an extraction path that has a backslash on Unix and loading native libraries in PAL_LoadLibraryDirect
goes through FILEDosToUnixPathA
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, right, running it locally with the FILEDosToUnixPathA
modified to be dummy fixed the issue. There was one more mono related issue that I've fixed, but now it seems fine.
@jkotas, @AaronRobinsonMSFT the CI tests have passed except for one known issue and one unrelated issue. I believe I've addressed your feedback. Can you please take another look? |
Could you please make the message for missing CoreLib better (file not found instead of invalid arg)? |
@jkotas I've fixed that in the last commit. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thank you!
// Parameters: | ||
// errorWriter - callback that will be called for each line of the error info | ||
// - passing in NULL removes a callback that was previously set | ||
// |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should probably add that the callback can be called from any thread and thus must be thread safe.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is not the case at the moment, it is always called from the thread running the initialization. Do you mean that we should add that comment to make it possible if we ever needed that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Didn't realize we don't need it currently. But I would add it anyway... it's really easy to forget about this and it could break in really weird ways (the current host implementation is thread safe fortunately).
The AllSubsets_Mono_LLVMFullAot_RuntimeTests llvmfullaot legs for x64 and arm64 are failing due to a sigsegv after I've pushed the fix for the error code. I don't see how that could have introduced such issue and I am unable to repro it on my local Ubuntu 21.04. Trying to repro in the centos 7 container that the lab uses. |
I think you're hitting #78580 (comment) and it's not your PR. |
@MichalStrehovsky ah, great, thank you for letting me know! |
This change adds detecting and logging of failures during coreclr initialization. For logging, it uses a new host API
coreclr_set_error_writer
to register a callback to report the errors to the host. The hosts have support for optional usage of this API so that they can work with older runtime versions as well.The logging checks and reports failures with:
The logging messages should allow customers to self-diagnose the issues causing the failures.