Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segmentation fault linux since net 6 upgrade #69323

Closed
Martin-Molinero opened this issue May 13, 2022 · 10 comments
Closed

Segmentation fault linux since net 6 upgrade #69323

Martin-Molinero opened this issue May 13, 2022 · 10 comments
Assignees
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Milestone

Comments

@Martin-Molinero
Copy link

Martin-Molinero commented May 13, 2022

Description

Started getting a segmentation fault in linux after upgrading to net 6 (& PGO enabled too). It's not consistent though, happening 1/7. Seen it happen twice already.

Call stack

(lldb)
* thread #1, name = 'price-equity-us', stop reason = signal SIGSEGV
  * frame #0: 0x00007f5950614ca8 libclrjit.so`Compiler::optFindLoops() at optimizer.cpp:1933:56
    frame #1: 0x00007f5950614ba1 libclrjit.so`Compiler::optFindLoops() [inlined] (anonymous namespace)::LoopSearch::FindLoop(this=0x00007f4a9bffdd10, head=<unavailable>, top=<unavailable>, bottom=<unavailable>) at optimizer.cpp:1692
    frame #2: 0x00007f5950614b37 libclrjit.so`Compiler::optFindLoops() at optimizer.cpp:2371
    frame #3: 0x00007f5950612197 libclrjit.so`Compiler::optFindLoops(this=<unavailable>) at optimizer.cpp:4641
    frame #4: 0x00007f59505461fa libclrjit.so`Compiler::compCompileHelper(CORINFO_MODULE_STRUCT_*, ICorJitInfo*, CORINFO_METHOD_INFO*, void**, unsigned int*, JitFlags*) [inlined] Phase::Run(this=0x00007f4a9bffded0) at phase.cpp:61:26
    frame #5: 0x00007f59505461e5 libclrjit.so`Compiler::compCompileHelper(CORINFO_MODULE_STRUCT_*, ICorJitInfo*, CORINFO_METHOD_INFO*, void**, unsigned int*, JitFlags*) [inlined] DoPhase(_compiler=0x00007f4a940432d8, _phase=PHASE_FIND_LOOPS, _action=00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00)()) at phase.h:136
    frame #6: 0x00007f59505461a4 libclrjit.so`Compiler::compCompileHelper(CORINFO_MODULE_STRUCT_*, ICorJitInfo*, CORINFO_METHOD_INFO*, void**, unsigned int*, JitFlags*) [inlined] Compiler::compCompile(this=0x00007f4a940432d8, methodCodePtr=0x00007f4a9bffe328, methodCodeSize=0x00007f4a9bffe4cc, compileFlags=0x00007f4a9bffe340) at compiler.cpp:4944
    frame #7: 0x00007f5950545fac libclrjit.so`Compiler::compCompileHelper(this=0x00007f4a940432d8, classPtr=<unavailable>, compHnd=<unavailable>, methodInfo=<unavailable>, methodCodePtr=0x00007f4a9bffe328, methodCodeSize=0x00007f4a9bffe4cc, compileFlags=0x00007f4a9bffe340) at compiler.cpp:6409
    frame #8: 0x00007f595054970e libclrjit.so`jitNativeCode(CORINFO_METHOD_STRUCT_*, CORINFO_MODULE_STRUCT_*, ICorJitInfo*, CORINFO_METHOD_INFO*, void**, unsigned int*, JitFlags*, void*) at compiler.cpp:5686:28
    frame #9: 0x00007f59505496e6 libclrjit.so`jitNativeCode(CORINFO_METHOD_STRUCT_*, CORINFO_MODULE_STRUCT_*, ICorJitInfo*, CORINFO_METHOD_INFO*, void**, unsigned int*, JitFlags*, void*) at compiler.cpp:5705
    frame #10: 0x00007f5950549533 libclrjit.so`jitNativeCode(CORINFO_METHOD_STRUCT_*, CORINFO_MODULE_STRUCT_*, ICorJitInfo*, CORINFO_METHOD_INFO*, void**, unsigned int*, JitFlags*, void*) at compiler.cpp:7055
    frame #11: 0x00007f59505490c7 libclrjit.so`jitNativeCode(CORINFO_METHOD_STRUCT_*, CORINFO_MODULE_STRUCT_*, ICorJitInfo*, CORINFO_METHOD_INFO*, void**, unsigned int*, JitFlags*, void*) at compiler.cpp:7080
    frame #12: 0x00007f59505490c3 libclrjit.so`jitNativeCode(methodHnd=0x00007f58dfd3e330, classPtr=0x00007f58de534000, compHnd=0x00007f4a9bffe668, methodInfo=<unavailable>, methodCodePtr=0x00007f4a9bffe328, methodCodeSize=0x00007f4a9bffe4cc, compileFlags=0x00007f4a9bffe340, inlineInfoPtr=0x0000000000000000) at compiler.cpp:7082
    frame #13: 0x00007f595054ee56 libclrjit.so`CILJit::compileMethod(this=<unavailable>, compHnd=0x00007f4a9bffe668, methodInfo=0x00007f4a9bffe508, flags=<unavailable>, entryAddress=0x00007f4a9bffe638, nativeSizeOfCode=<unavailable>) at ee_il_dll.cpp:276:14
    frame #14: 0x00007f5958205a6c libcoreclr.so`invokeCompileMethodHelper(jitMgr=<unavailable>, comp=0x00007f4a9bffe668, info=<unavailable>, jitFlags=CORJIT_FLAGS @ 0x00007f4a9bffe3e0, nativeEntry=<unavailable>, nativeSizeOfCode=<unavailable>) at jitinterface.cpp:12774:30
    frame #15: 0x00007f5958205b38 libcoreclr.so`invokeCompileMethod(jitMgr=0x0000557915e99810, comp=0x00007f4a9bffe668, info=0x00007f4a9bffe508, jitFlags=CORJIT_FLAGS @ 0x00007f4a9bffe460, nativeEntry=0x00007f4a9bffe638, nativeSizeOfCode=0x00007f4a9bffe4cc) at jitinterface.cpp:12839:24
    frame #16: 0x00007f59582065dc libcoreclr.so`UnsafeJitFunction(config=<unavailable>, ILHeader=0x00007f4a9bffe800, flags=<unavailable>, pSizeOfCode=0x00007f4a9bffe92c) at jitinterface.cpp:13355:19
    frame #17: 0x00007f5958243dc6 libcoreclr.so`MethodDesc::JitCompileCodeLocked(this=<unavailable>, pConfig=0x00007f4a9bffeab8, pEntry=0x00007f4a940013e0, pSizeOfCode=0x00007f4a9bffe92c, pFlags=0x00007f4a9bffe8c0) at prestub.cpp:1051:17
    frame #18: 0x00007f5958243a78 libcoreclr.so`MethodDesc::JitCompileCodeLockedEventWrapper(this=0x00007f58dfd3e330, pConfig=0x00007f4a9bffeab8, pEntry=0x00007f4a940013e0) at prestub.cpp:920:17
    frame #19: 0x00007f59582431aa libcoreclr.so`MethodDesc::JitCompileCode(this=0x00007f58dfd3e330, pConfig=0x00007f4a9bffeab8) at prestub.cpp:860:20
    frame #20: 0x00007f5958242d60 libcoreclr.so`MethodDesc::PrepareILBasedCode(this=0x00007f58dfd3e330, pConfig=0x00007f4a9bffeab8) at prestub.cpp:439:17
    frame #21: 0x00007f59582723b1 libcoreclr.so`TieredCompilationManager::CompileCodeVersion(this=<unavailable>, nativeCodeVersion=NativeCodeVersion @ 0x00007f4a9bffebe8) at tieredcompilation.cpp:906:26
    frame #22: 0x00007f59582718d1 libcoreclr.so`TieredCompilationManager::DoBackgroundWork(unsigned long*, unsigned long, unsigned long) [inlined] TieredCompilationManager::OptimizeMethod(this=0x0000557915e977e8, nativeCodeVersion=<unavailable>) at tieredcompilation.cpp:883:9
    frame #23: 0x00007f59582718b6 libcoreclr.so`TieredCompilationManager::DoBackgroundWork(this=0x0000557915e977e8, workDurationTicksRef=0x00007f4a9bffec58, minWorkDurationTicks=32000000, maxWorkDurationTicks=50000000) at tieredcompilation.cpp:768
    frame #24: 0x00007f5958270ef4 libcoreclr.so`TieredCompilationManager::BackgroundWorkerStart(this=0x0000557915e977e8) at tieredcompilation.cpp:483:14
    frame #25: 0x00007f5958270d2c libcoreclr.so`TieredCompilationManager::BackgroundWorkerBootstrapper1((null)=<unavailable>) at tieredcompilation.cpp:431:52
    frame #26: 0x00007f595826d6ca libcoreclr.so`ManagedThreadBase_DispatchOuter(ManagedThreadCallState*) [inlined] ManagedThreadBase_DispatchInner(ManagedThreadCallState*) at threads.cpp:7321:5
    frame #27: 0x00007f595826d6c8 libcoreclr.so`ManagedThreadBase_DispatchOuter(ManagedThreadCallState*) at threads.cpp:7365
    frame #28: 0x00007f595826d682 libcoreclr.so`ManagedThreadBase_DispatchOuter(ManagedThreadCallState*) [inlined] ManagedThreadBase_DispatchOuter(this=<unavailable>)::$_6::operator()(ManagedThreadBase_DispatchOuter(ManagedThreadCallState*)::TryArgs*) const::'lambda'(Param*)::operator()(Param*) const at threads.cpp:7523
    frame #29: 0x00007f595826d682 libcoreclr.so`ManagedThreadBase_DispatchOuter(ManagedThreadCallState*) at threads.cpp:7525
    frame #30: 0x00007f595826d613 libcoreclr.so`ManagedThreadBase_DispatchOuter(pCallState=0x00007f4a9bffee00) at threads.cpp:7549
    frame #31: 0x00007f595826dccd libcoreclr.so`ManagedThreadBase::KickOff(void (*)(void*), void*) [inlined] ManagedThreadBase_FullTransition(pTarget=<unavailable>, args=<unavailable>, filterType=ManagedThread)(void*), void*, UnhandledExceptionLocation) at threads.cpp:7569:5
    frame #32: 0x00007f595826dcb5 libcoreclr.so`ManagedThreadBase::KickOff(pTarget=<unavailable>, args=<unavailable>)(void*), void*) at threads.cpp:7604
    frame #33: 0x00007f5958270c50 libcoreclr.so`TieredCompilationManager::BackgroundWorkerBootstrapper0(args=0x00007f4c100197a0) at tieredcompilation.cpp:412:5
    frame #34: 0x00007f595860653e libcoreclr.so`CorUnix::CPalThread::ThreadEntry(pvParam=0x00007f4c10017090) at thread.cpp:1862:16
    frame #35: 0x00007f5958e77609 libpthread.so.0`start_thread(arg=<unavailable>) at pthread_create.c:477:8
    frame #36: 0x00007f5958a12163 libc.so.6`umount2 at umount2.S:8

bt_all.txt

Reproduction Steps

Sorry it's a complex project in a private repository. I can say there is a lot of action going on, high CPU/ram usage, file read/write.

Expected behavior

No segmentation fault

Actual behavior

Segmentation fault

Regression?

Net 5 worked

Known Workarounds

No response

Configuration

# dotnet --version
6.0.202
# uname -a
Linux 9c8a01c29b3b 5.4.0-109-generic #123-Ubuntu SMP Fri Apr 8 09:10:54 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
# lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 20.04.4 LTS
Release:        20.04
Codename:       focal

Other information

ENV DOTNET_TieredPGO=1
ENV DOTNET_ReadyToRun=0
ENV DOTNET_TC_QuickJitForLoops=1
@ghost ghost added the untriaged New issue has not been triaged by the area owner label May 13, 2022
@dotnet-issue-labeler dotnet-issue-labeler bot added area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI and removed untriaged New issue has not been triaged by the area owner labels May 13, 2022
@ghost
Copy link

ghost commented May 13, 2022

Tagging subscribers to this area: @JulieLeeMSFT
See info in area-owners.md if you want to be subscribed.

Issue Details

Description

Started getting a segmentation fault in linux after upgrading to net 6 (& PGO enabled too). It's not consistent though, happening 1/7. Seen it happen twice already.

Call stack

(lldb)
* thread #1, name = 'price-equity-us', stop reason = signal SIGSEGV
  * frame #0: 0x00007f5950614ca8 libclrjit.so`Compiler::optFindLoops() at optimizer.cpp:1933:56
    frame #1: 0x00007f5950614ba1 libclrjit.so`Compiler::optFindLoops() [inlined] (anonymous namespace)::LoopSearch::FindLoop(this=0x00007f4a9bffdd10, head=<unavailable>, top=<unavailable>, bottom=<unavailable>) at optimizer.cpp:1692
    frame #2: 0x00007f5950614b37 libclrjit.so`Compiler::optFindLoops() at optimizer.cpp:2371
    frame #3: 0x00007f5950612197 libclrjit.so`Compiler::optFindLoops(this=<unavailable>) at optimizer.cpp:4641
    frame #4: 0x00007f59505461fa libclrjit.so`Compiler::compCompileHelper(CORINFO_MODULE_STRUCT_*, ICorJitInfo*, CORINFO_METHOD_INFO*, void**, unsigned int*, JitFlags*) [inlined] Phase::Run(this=0x00007f4a9bffded0) at phase.cpp:61:26
    frame #5: 0x00007f59505461e5 libclrjit.so`Compiler::compCompileHelper(CORINFO_MODULE_STRUCT_*, ICorJitInfo*, CORINFO_METHOD_INFO*, void**, unsigned int*, JitFlags*) [inlined] DoPhase(_compiler=0x00007f4a940432d8, _phase=PHASE_FIND_LOOPS, _action=00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00)()) at phase.h:136
    frame #6: 0x00007f59505461a4 libclrjit.so`Compiler::compCompileHelper(CORINFO_MODULE_STRUCT_*, ICorJitInfo*, CORINFO_METHOD_INFO*, void**, unsigned int*, JitFlags*) [inlined] Compiler::compCompile(this=0x00007f4a940432d8, methodCodePtr=0x00007f4a9bffe328, methodCodeSize=0x00007f4a9bffe4cc, compileFlags=0x00007f4a9bffe340) at compiler.cpp:4944
    frame #7: 0x00007f5950545fac libclrjit.so`Compiler::compCompileHelper(this=0x00007f4a940432d8, classPtr=<unavailable>, compHnd=<unavailable>, methodInfo=<unavailable>, methodCodePtr=0x00007f4a9bffe328, methodCodeSize=0x00007f4a9bffe4cc, compileFlags=0x00007f4a9bffe340) at compiler.cpp:6409
    frame #8: 0x00007f595054970e libclrjit.so`jitNativeCode(CORINFO_METHOD_STRUCT_*, CORINFO_MODULE_STRUCT_*, ICorJitInfo*, CORINFO_METHOD_INFO*, void**, unsigned int*, JitFlags*, void*) at compiler.cpp:5686:28
    frame #9: 0x00007f59505496e6 libclrjit.so`jitNativeCode(CORINFO_METHOD_STRUCT_*, CORINFO_MODULE_STRUCT_*, ICorJitInfo*, CORINFO_METHOD_INFO*, void**, unsigned int*, JitFlags*, void*) at compiler.cpp:5705
    frame #10: 0x00007f5950549533 libclrjit.so`jitNativeCode(CORINFO_METHOD_STRUCT_*, CORINFO_MODULE_STRUCT_*, ICorJitInfo*, CORINFO_METHOD_INFO*, void**, unsigned int*, JitFlags*, void*) at compiler.cpp:7055
    frame #11: 0x00007f59505490c7 libclrjit.so`jitNativeCode(CORINFO_METHOD_STRUCT_*, CORINFO_MODULE_STRUCT_*, ICorJitInfo*, CORINFO_METHOD_INFO*, void**, unsigned int*, JitFlags*, void*) at compiler.cpp:7080
    frame #12: 0x00007f59505490c3 libclrjit.so`jitNativeCode(methodHnd=0x00007f58dfd3e330, classPtr=0x00007f58de534000, compHnd=0x00007f4a9bffe668, methodInfo=<unavailable>, methodCodePtr=0x00007f4a9bffe328, methodCodeSize=0x00007f4a9bffe4cc, compileFlags=0x00007f4a9bffe340, inlineInfoPtr=0x0000000000000000) at compiler.cpp:7082
    frame #13: 0x00007f595054ee56 libclrjit.so`CILJit::compileMethod(this=<unavailable>, compHnd=0x00007f4a9bffe668, methodInfo=0x00007f4a9bffe508, flags=<unavailable>, entryAddress=0x00007f4a9bffe638, nativeSizeOfCode=<unavailable>) at ee_il_dll.cpp:276:14
    frame #14: 0x00007f5958205a6c libcoreclr.so`invokeCompileMethodHelper(jitMgr=<unavailable>, comp=0x00007f4a9bffe668, info=<unavailable>, jitFlags=CORJIT_FLAGS @ 0x00007f4a9bffe3e0, nativeEntry=<unavailable>, nativeSizeOfCode=<unavailable>) at jitinterface.cpp:12774:30
    frame #15: 0x00007f5958205b38 libcoreclr.so`invokeCompileMethod(jitMgr=0x0000557915e99810, comp=0x00007f4a9bffe668, info=0x00007f4a9bffe508, jitFlags=CORJIT_FLAGS @ 0x00007f4a9bffe460, nativeEntry=0x00007f4a9bffe638, nativeSizeOfCode=0x00007f4a9bffe4cc) at jitinterface.cpp:12839:24
    frame #16: 0x00007f59582065dc libcoreclr.so`UnsafeJitFunction(config=<unavailable>, ILHeader=0x00007f4a9bffe800, flags=<unavailable>, pSizeOfCode=0x00007f4a9bffe92c) at jitinterface.cpp:13355:19
    frame #17: 0x00007f5958243dc6 libcoreclr.so`MethodDesc::JitCompileCodeLocked(this=<unavailable>, pConfig=0x00007f4a9bffeab8, pEntry=0x00007f4a940013e0, pSizeOfCode=0x00007f4a9bffe92c, pFlags=0x00007f4a9bffe8c0) at prestub.cpp:1051:17
    frame #18: 0x00007f5958243a78 libcoreclr.so`MethodDesc::JitCompileCodeLockedEventWrapper(this=0x00007f58dfd3e330, pConfig=0x00007f4a9bffeab8, pEntry=0x00007f4a940013e0) at prestub.cpp:920:17
    frame #19: 0x00007f59582431aa libcoreclr.so`MethodDesc::JitCompileCode(this=0x00007f58dfd3e330, pConfig=0x00007f4a9bffeab8) at prestub.cpp:860:20
    frame #20: 0x00007f5958242d60 libcoreclr.so`MethodDesc::PrepareILBasedCode(this=0x00007f58dfd3e330, pConfig=0x00007f4a9bffeab8) at prestub.cpp:439:17
    frame #21: 0x00007f59582723b1 libcoreclr.so`TieredCompilationManager::CompileCodeVersion(this=<unavailable>, nativeCodeVersion=NativeCodeVersion @ 0x00007f4a9bffebe8) at tieredcompilation.cpp:906:26
    frame #22: 0x00007f59582718d1 libcoreclr.so`TieredCompilationManager::DoBackgroundWork(unsigned long*, unsigned long, unsigned long) [inlined] TieredCompilationManager::OptimizeMethod(this=0x0000557915e977e8, nativeCodeVersion=<unavailable>) at tieredcompilation.cpp:883:9
    frame #23: 0x00007f59582718b6 libcoreclr.so`TieredCompilationManager::DoBackgroundWork(this=0x0000557915e977e8, workDurationTicksRef=0x00007f4a9bffec58, minWorkDurationTicks=32000000, maxWorkDurationTicks=50000000) at tieredcompilation.cpp:768
    frame #24: 0x00007f5958270ef4 libcoreclr.so`TieredCompilationManager::BackgroundWorkerStart(this=0x0000557915e977e8) at tieredcompilation.cpp:483:14
    frame #25: 0x00007f5958270d2c libcoreclr.so`TieredCompilationManager::BackgroundWorkerBootstrapper1((null)=<unavailable>) at tieredcompilation.cpp:431:52
    frame #26: 0x00007f595826d6ca libcoreclr.so`ManagedThreadBase_DispatchOuter(ManagedThreadCallState*) [inlined] ManagedThreadBase_DispatchInner(ManagedThreadCallState*) at threads.cpp:7321:5
    frame #27: 0x00007f595826d6c8 libcoreclr.so`ManagedThreadBase_DispatchOuter(ManagedThreadCallState*) at threads.cpp:7365
    frame #28: 0x00007f595826d682 libcoreclr.so`ManagedThreadBase_DispatchOuter(ManagedThreadCallState*) [inlined] ManagedThreadBase_DispatchOuter(this=<unavailable>)::$_6::operator()(ManagedThreadBase_DispatchOuter(ManagedThreadCallState*)::TryArgs*) const::'lambda'(Param*)::operator()(Param*) const at threads.cpp:7523
    frame #29: 0x00007f595826d682 libcoreclr.so`ManagedThreadBase_DispatchOuter(ManagedThreadCallState*) at threads.cpp:7525
    frame #30: 0x00007f595826d613 libcoreclr.so`ManagedThreadBase_DispatchOuter(pCallState=0x00007f4a9bffee00) at threads.cpp:7549
    frame #31: 0x00007f595826dccd libcoreclr.so`ManagedThreadBase::KickOff(void (*)(void*), void*) [inlined] ManagedThreadBase_FullTransition(pTarget=<unavailable>, args=<unavailable>, filterType=ManagedThread)(void*), void*, UnhandledExceptionLocation) at threads.cpp:7569:5
    frame #32: 0x00007f595826dcb5 libcoreclr.so`ManagedThreadBase::KickOff(pTarget=<unavailable>, args=<unavailable>)(void*), void*) at threads.cpp:7604
    frame #33: 0x00007f5958270c50 libcoreclr.so`TieredCompilationManager::BackgroundWorkerBootstrapper0(args=0x00007f4c100197a0) at tieredcompilation.cpp:412:5
    frame #34: 0x00007f595860653e libcoreclr.so`CorUnix::CPalThread::ThreadEntry(pvParam=0x00007f4c10017090) at thread.cpp:1862:16
    frame #35: 0x00007f5958e77609 libpthread.so.0`start_thread(arg=<unavailable>) at pthread_create.c:477:8
    frame #36: 0x00007f5958a12163 libc.so.6`umount2 at umount2.S:8

bt_all.txt

Reproduction Steps

Sorry it's a complex project in a private repository. I can say there is a lot of action going on, high CPU/ram usage, file read/write.

Expected behavior

No segmentation fault

Actual behavior

Segmentation fault

Regression?

Net 5 worked

Known Workarounds

No response

Configuration

# dotnet --version
6.0.202
# uname -a
Linux 9c8a01c29b3b 5.4.0-109-generic #123-Ubuntu SMP Fri Apr 8 09:10:54 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
# lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 20.04.4 LTS
Release:        20.04
Codename:       focal

Other information

No response

Author: Martin-Molinero
Assignees: -
Labels:

area-CodeGen-coreclr

Milestone: -

@AndyAyersMS
Copy link
Member

FYI @BruceForstall
cc @dotnet/jit-contrib

Faulting code is

// This blocks is lexically between TOP and BOTTOM, but it does not
// participate in the flow cycle. Check for a run of consecutive
// such blocks.
BasicBlock* lastNonLoopBlock = block;
BasicBlock* nextLoopBlock = block->bbNext;
while (!loopBlocks.IsMember(nextLoopBlock->bbNum))

Presumably nextLoopBlock is nullptr, despite the code believing that it will always encounter bottom and exit before it will reach the end of the block chain. I don't think we've seen this behavior before.

@Martin-Molinero you may be able to work around this by annotating the method that is being jitted here with either [MethodImpl(MethodImplOptions.NoOptimization)] or perhaps [MethodImpl(MethodImplOptions.AggressiveOptimization)].

If you can (privately) share a core dump we can try and track down how we're hitting this issue. Also if you're not sure which method was being jitted we can probably figure this out from the core dump too.

@BruceForstall
Copy link
Member

If you are able to debug under lldb, and can use SOS (https://docs.microsoft.com/en-us/dotnet/core/diagnostics/debug-linux-dumps, https://docs.microsoft.com/en-us/dotnet/core/diagnostics/dotnet-sos), and can find a MethodDesc pointer on the stack, using the DumpMD command (https://docs.microsoft.com/en-us/dotnet/core/diagnostics/sos-debugging-extension) should identify the function.

There are likely some very complex or unusual loop structures. Perhaps if you can't share the code or a core dump, you could try to extract a sample of the same function that still exhibits the problem.

@Martin-Molinero
Copy link
Author

If you are able to debug under lldb, and can use SOS

Yes, but even after installing SOS the commands are not available at lldb as I understand they should be?

I've shared along the core dump with @AndyAyersMS

@AndyAyersMS
Copy link
Member

Thanks, I'll take a look.

@AndyAyersMS
Copy link
Member

Based on the data you sent me offline, it seems like the JIT is compiling System.Threading.PortableThreadPool+WorkerThread.WorkerThreadStart which is not something you are going to be able to fix via annotation.

A workaround for now is to stop setting DOTNET_TC_QuickJitForLoops=1 as the problem arises when a certain pattern of dynamic PGO data intersects with the JIT's loop recognition. Using this workaround may reduce the performance gains you see (if any) with PGO.

I can reproduce what look like similar issues in the .NET 6.0 by feeding the JIT randomized profile data, but I have yet to confirm if this is indeed the same problem or a related one.

@AndyAyersMS
Copy link
Member

I think I understand roughly what happens. We have a flow graph with more than one loop. The first loop we find has some non-loop code in its extent, including two different stretches of code belonging to a second loop (because PGO has moved blocks). We manage to move the first extent with MakeCompactAndFindExits but ultimately, we fail to recognize the loop. But in the process, we've scrambled the bbNext order for the second loop and now its bottom block (which was in the second extent) ends up earlier in the block list than its top (which was in the first extent). We get to MakeCompactAndFindExits for the second loop. There are also non-loop blocks in this loop and so we start walking the bbNext chain at the first non-loop block to find the end of that segment of blocks, expecting we'll hit another loop block before we reach the end of the method. But we never do find one. So, we AV walking off the end of the bbNext chain.

The fix for .NET 6 is likely is to just avoid the AV by giving up on recognizing the second loop. We hopefully can find a more robust fix in .NET 7 that avoids this sort of scrambled order.

AndyAyersMS added a commit to AndyAyersMS/runtime that referenced this issue May 18, 2022
…t chain

In dotnet#69323 the 6.0.4 jit caused an AV because it walked off the end of the
bbNext chain during `optFindNaturalLoops`.

Analysis of a customer-provided dump showed that `MakeCompactAndFindExits`
might fail to find an expected loop block and so walk the entire bbNext chain
and then fall off the end. Details from the dump suggested that this happened
because a prior call to `MakeCompactAndFindExits` had moved most but not all of
a loop's blocks later in bbNext order, leaving that loop's bottom block earlier
in the bbNext chain then it's top. This ordering was unexpected.

I cannot repro this failure. The customer was using PGO and it's likely that
earlier PGO-driven block reordering contributed to this problem by interleaving
the blocks from two loops. We can recover the root method PGO schema from the
dump, but applying this is insufficient to cause the problem. This method does
quite a bit of inlining so it's likely that some inlinee PGO data must also be
a contributing factor.

At any rate, we can guard against this case easily enough, and simply abandon
recognition of any loop where we fail to find an expected loop block during
the bbNext chain walk.
AndyAyersMS added a commit that referenced this issue May 18, 2022
…t chain (#69503)

In #69323 the 6.0.4 jit caused an AV because it walked off the end of the
bbNext chain during `optFindNaturalLoops`.

Analysis of a customer-provided dump showed that `MakeCompactAndFindExits`
might fail to find an expected loop block and so walk the entire bbNext chain
and then fall off the end. Details from the dump suggested that this happened
because a prior call to `MakeCompactAndFindExits` had moved most but not all of
a loop's blocks later in bbNext order, leaving that loop's bottom block earlier
in the bbNext chain then it's top. This ordering was unexpected.

I cannot repro this failure. The customer was using PGO and it's likely that
earlier PGO-driven block reordering contributed to this problem by interleaving
the blocks from two loops. We can recover the root method PGO schema from the
dump, but applying this is insufficient to cause the problem. This method does
quite a bit of inlining so it's likely that some inlinee PGO data must also be
a contributing factor.

At any rate, we can guard against this case easily enough, and simply abandon
recognition of any loop where we fail to find an expected loop block during
the bbNext chain walk.
github-actions bot pushed a commit that referenced this issue May 18, 2022
…t chain

In #69323 the 6.0.4 jit caused an AV because it walked off the end of the
bbNext chain during `optFindNaturalLoops`.

Analysis of a customer-provided dump showed that `MakeCompactAndFindExits`
might fail to find an expected loop block and so walk the entire bbNext chain
and then fall off the end. Details from the dump suggested that this happened
because a prior call to `MakeCompactAndFindExits` had moved most but not all of
a loop's blocks later in bbNext order, leaving that loop's bottom block earlier
in the bbNext chain then it's top. This ordering was unexpected.

I cannot repro this failure. The customer was using PGO and it's likely that
earlier PGO-driven block reordering contributed to this problem by interleaving
the blocks from two loops. We can recover the root method PGO schema from the
dump, but applying this is insufficient to cause the problem. This method does
quite a bit of inlining so it's likely that some inlinee PGO data must also be
a contributing factor.

At any rate, we can guard against this case easily enough, and simply abandon
recognition of any loop where we fail to find an expected loop block during
the bbNext chain walk.
@AndyAyersMS AndyAyersMS self-assigned this May 20, 2022
@JulieLeeMSFT JulieLeeMSFT added the needs-further-triage Issue has been initially triaged, but needs deeper consideration or reconsideration label May 20, 2022
@JulieLeeMSFT JulieLeeMSFT added this to the 7.0.0 milestone May 23, 2022
@JulieLeeMSFT JulieLeeMSFT removed the needs-further-triage Issue has been initially triaged, but needs deeper consideration or reconsideration label May 23, 2022
@AndyAyersMS
Copy link
Member

This is fixed in main / 7.0, will close this once we've serviced it in 6.x

@AndyAyersMS AndyAyersMS modified the milestones: 7.0.0, 6.0.x Jun 3, 2022
carlossanlop pushed a commit that referenced this issue Jun 9, 2022
…t chain (#69525)

In #69323 the 6.0.4 jit caused an AV because it walked off the end of the
bbNext chain during `optFindNaturalLoops`.

Analysis of a customer-provided dump showed that `MakeCompactAndFindExits`
might fail to find an expected loop block and so walk the entire bbNext chain
and then fall off the end. Details from the dump suggested that this happened
because a prior call to `MakeCompactAndFindExits` had moved most but not all of
a loop's blocks later in bbNext order, leaving that loop's bottom block earlier
in the bbNext chain then it's top. This ordering was unexpected.

I cannot repro this failure. The customer was using PGO and it's likely that
earlier PGO-driven block reordering contributed to this problem by interleaving
the blocks from two loops. We can recover the root method PGO schema from the
dump, but applying this is insufficient to cause the problem. This method does
quite a bit of inlining so it's likely that some inlinee PGO data must also be
a contributing factor.

At any rate, we can guard against this case easily enough, and simply abandon
recognition of any loop where we fail to find an expected loop block during
the bbNext chain walk.

Co-authored-by: Andy Ayers <[email protected]>
@AndyAyersMS
Copy link
Member

@Martin-Molinero 6.0.7 is now out and hopefully fixes this and other issues you ran into with PGO: https://devblogs.microsoft.com/dotnet/july-2022-updates/

Let me know you get a chance to try it out.

@AndyAyersMS
Copy link
Member

I'm going to close this, feel free to re-open if needed if you get around to trying 6.0.7.

@ghost ghost locked as resolved and limited conversation to collaborators Aug 21, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Development

No branches or pull requests

4 participants