Segmentation fault linux since net 6 upgrade #69323

Martin-Molinero · 2022-05-13T17:53:03Z

Description

Started getting a segmentation fault in linux after upgrading to net 6 (& PGO enabled too). It's not consistent though, happening 1/7. Seen it happen twice already.

Call stack

(lldb)
* thread #1, name = 'price-equity-us', stop reason = signal SIGSEGV
  * frame #0: 0x00007f5950614ca8 libclrjit.so`Compiler::optFindLoops() at optimizer.cpp:1933:56
    frame #1: 0x00007f5950614ba1 libclrjit.so`Compiler::optFindLoops() [inlined] (anonymous namespace)::LoopSearch::FindLoop(this=0x00007f4a9bffdd10, head=<unavailable>, top=<unavailable>, bottom=<unavailable>) at optimizer.cpp:1692
    frame #2: 0x00007f5950614b37 libclrjit.so`Compiler::optFindLoops() at optimizer.cpp:2371
    frame #3: 0x00007f5950612197 libclrjit.so`Compiler::optFindLoops(this=<unavailable>) at optimizer.cpp:4641
    frame #4: 0x00007f59505461fa libclrjit.so`Compiler::compCompileHelper(CORINFO_MODULE_STRUCT_*, ICorJitInfo*, CORINFO_METHOD_INFO*, void**, unsigned int*, JitFlags*) [inlined] Phase::Run(this=0x00007f4a9bffded0) at phase.cpp:61:26
    frame #5: 0x00007f59505461e5 libclrjit.so`Compiler::compCompileHelper(CORINFO_MODULE_STRUCT_*, ICorJitInfo*, CORINFO_METHOD_INFO*, void**, unsigned int*, JitFlags*) [inlined] DoPhase(_compiler=0x00007f4a940432d8, _phase=PHASE_FIND_LOOPS, _action=00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00)()) at phase.h:136
    frame #6: 0x00007f59505461a4 libclrjit.so`Compiler::compCompileHelper(CORINFO_MODULE_STRUCT_*, ICorJitInfo*, CORINFO_METHOD_INFO*, void**, unsigned int*, JitFlags*) [inlined] Compiler::compCompile(this=0x00007f4a940432d8, methodCodePtr=0x00007f4a9bffe328, methodCodeSize=0x00007f4a9bffe4cc, compileFlags=0x00007f4a9bffe340) at compiler.cpp:4944
    frame #7: 0x00007f5950545fac libclrjit.so`Compiler::compCompileHelper(this=0x00007f4a940432d8, classPtr=<unavailable>, compHnd=<unavailable>, methodInfo=<unavailable>, methodCodePtr=0x00007f4a9bffe328, methodCodeSize=0x00007f4a9bffe4cc, compileFlags=0x00007f4a9bffe340) at compiler.cpp:6409
    frame #8: 0x00007f595054970e libclrjit.so`jitNativeCode(CORINFO_METHOD_STRUCT_*, CORINFO_MODULE_STRUCT_*, ICorJitInfo*, CORINFO_METHOD_INFO*, void**, unsigned int*, JitFlags*, void*) at compiler.cpp:5686:28
    frame #9: 0x00007f59505496e6 libclrjit.so`jitNativeCode(CORINFO_METHOD_STRUCT_*, CORINFO_MODULE_STRUCT_*, ICorJitInfo*, CORINFO_METHOD_INFO*, void**, unsigned int*, JitFlags*, void*) at compiler.cpp:5705
    frame #10: 0x00007f5950549533 libclrjit.so`jitNativeCode(CORINFO_METHOD_STRUCT_*, CORINFO_MODULE_STRUCT_*, ICorJitInfo*, CORINFO_METHOD_INFO*, void**, unsigned int*, JitFlags*, void*) at compiler.cpp:7055
    frame #11: 0x00007f59505490c7 libclrjit.so`jitNativeCode(CORINFO_METHOD_STRUCT_*, CORINFO_MODULE_STRUCT_*, ICorJitInfo*, CORINFO_METHOD_INFO*, void**, unsigned int*, JitFlags*, void*) at compiler.cpp:7080
    frame #12: 0x00007f59505490c3 libclrjit.so`jitNativeCode(methodHnd=0x00007f58dfd3e330, classPtr=0x00007f58de534000, compHnd=0x00007f4a9bffe668, methodInfo=<unavailable>, methodCodePtr=0x00007f4a9bffe328, methodCodeSize=0x00007f4a9bffe4cc, compileFlags=0x00007f4a9bffe340, inlineInfoPtr=0x0000000000000000) at compiler.cpp:7082
    frame #13: 0x00007f595054ee56 libclrjit.so`CILJit::compileMethod(this=<unavailable>, compHnd=0x00007f4a9bffe668, methodInfo=0x00007f4a9bffe508, flags=<unavailable>, entryAddress=0x00007f4a9bffe638, nativeSizeOfCode=<unavailable>) at ee_il_dll.cpp:276:14
    frame #14: 0x00007f5958205a6c libcoreclr.so`invokeCompileMethodHelper(jitMgr=<unavailable>, comp=0x00007f4a9bffe668, info=<unavailable>, jitFlags=CORJIT_FLAGS @ 0x00007f4a9bffe3e0, nativeEntry=<unavailable>, nativeSizeOfCode=<unavailable>) at jitinterface.cpp:12774:30
    frame #15: 0x00007f5958205b38 libcoreclr.so`invokeCompileMethod(jitMgr=0x0000557915e99810, comp=0x00007f4a9bffe668, info=0x00007f4a9bffe508, jitFlags=CORJIT_FLAGS @ 0x00007f4a9bffe460, nativeEntry=0x00007f4a9bffe638, nativeSizeOfCode=0x00007f4a9bffe4cc) at jitinterface.cpp:12839:24
    frame #16: 0x00007f59582065dc libcoreclr.so`UnsafeJitFunction(config=<unavailable>, ILHeader=0x00007f4a9bffe800, flags=<unavailable>, pSizeOfCode=0x00007f4a9bffe92c) at jitinterface.cpp:13355:19
    frame #17: 0x00007f5958243dc6 libcoreclr.so`MethodDesc::JitCompileCodeLocked(this=<unavailable>, pConfig=0x00007f4a9bffeab8, pEntry=0x00007f4a940013e0, pSizeOfCode=0x00007f4a9bffe92c, pFlags=0x00007f4a9bffe8c0) at prestub.cpp:1051:17
    frame #18: 0x00007f5958243a78 libcoreclr.so`MethodDesc::JitCompileCodeLockedEventWrapper(this=0x00007f58dfd3e330, pConfig=0x00007f4a9bffeab8, pEntry=0x00007f4a940013e0) at prestub.cpp:920:17
    frame #19: 0x00007f59582431aa libcoreclr.so`MethodDesc::JitCompileCode(this=0x00007f58dfd3e330, pConfig=0x00007f4a9bffeab8) at prestub.cpp:860:20
    frame #20: 0x00007f5958242d60 libcoreclr.so`MethodDesc::PrepareILBasedCode(this=0x00007f58dfd3e330, pConfig=0x00007f4a9bffeab8) at prestub.cpp:439:17
    frame #21: 0x00007f59582723b1 libcoreclr.so`TieredCompilationManager::CompileCodeVersion(this=<unavailable>, nativeCodeVersion=NativeCodeVersion @ 0x00007f4a9bffebe8) at tieredcompilation.cpp:906:26
    frame #22: 0x00007f59582718d1 libcoreclr.so`TieredCompilationManager::DoBackgroundWork(unsigned long*, unsigned long, unsigned long) [inlined] TieredCompilationManager::OptimizeMethod(this=0x0000557915e977e8, nativeCodeVersion=<unavailable>) at tieredcompilation.cpp:883:9
    frame #23: 0x00007f59582718b6 libcoreclr.so`TieredCompilationManager::DoBackgroundWork(this=0x0000557915e977e8, workDurationTicksRef=0x00007f4a9bffec58, minWorkDurationTicks=32000000, maxWorkDurationTicks=50000000) at tieredcompilation.cpp:768
    frame #24: 0x00007f5958270ef4 libcoreclr.so`TieredCompilationManager::BackgroundWorkerStart(this=0x0000557915e977e8) at tieredcompilation.cpp:483:14
    frame #25: 0x00007f5958270d2c libcoreclr.so`TieredCompilationManager::BackgroundWorkerBootstrapper1((null)=<unavailable>) at tieredcompilation.cpp:431:52
    frame #26: 0x00007f595826d6ca libcoreclr.so`ManagedThreadBase_DispatchOuter(ManagedThreadCallState*) [inlined] ManagedThreadBase_DispatchInner(ManagedThreadCallState*) at threads.cpp:7321:5
    frame #27: 0x00007f595826d6c8 libcoreclr.so`ManagedThreadBase_DispatchOuter(ManagedThreadCallState*) at threads.cpp:7365
    frame #28: 0x00007f595826d682 libcoreclr.so`ManagedThreadBase_DispatchOuter(ManagedThreadCallState*) [inlined] ManagedThreadBase_DispatchOuter(this=<unavailable>)::$_6::operator()(ManagedThreadBase_DispatchOuter(ManagedThreadCallState*)::TryArgs*) const::'lambda'(Param*)::operator()(Param*) const at threads.cpp:7523
    frame #29: 0x00007f595826d682 libcoreclr.so`ManagedThreadBase_DispatchOuter(ManagedThreadCallState*) at threads.cpp:7525
    frame #30: 0x00007f595826d613 libcoreclr.so`ManagedThreadBase_DispatchOuter(pCallState=0x00007f4a9bffee00) at threads.cpp:7549
    frame #31: 0x00007f595826dccd libcoreclr.so`ManagedThreadBase::KickOff(void (*)(void*), void*) [inlined] ManagedThreadBase_FullTransition(pTarget=<unavailable>, args=<unavailable>, filterType=ManagedThread)(void*), void*, UnhandledExceptionLocation) at threads.cpp:7569:5
    frame #32: 0x00007f595826dcb5 libcoreclr.so`ManagedThreadBase::KickOff(pTarget=<unavailable>, args=<unavailable>)(void*), void*) at threads.cpp:7604
    frame #33: 0x00007f5958270c50 libcoreclr.so`TieredCompilationManager::BackgroundWorkerBootstrapper0(args=0x00007f4c100197a0) at tieredcompilation.cpp:412:5
    frame #34: 0x00007f595860653e libcoreclr.so`CorUnix::CPalThread::ThreadEntry(pvParam=0x00007f4c10017090) at thread.cpp:1862:16
    frame #35: 0x00007f5958e77609 libpthread.so.0`start_thread(arg=<unavailable>) at pthread_create.c:477:8
    frame #36: 0x00007f5958a12163 libc.so.6`umount2 at umount2.S:8

bt_all.txt

Reproduction Steps

Sorry it's a complex project in a private repository. I can say there is a lot of action going on, high CPU/ram usage, file read/write.

Expected behavior

No segmentation fault

Actual behavior

Segmentation fault

Regression?

Net 5 worked

Known Workarounds

No response

Configuration

# dotnet --version
6.0.202

# uname -a
Linux 9c8a01c29b3b 5.4.0-109-generic #123-Ubuntu SMP Fri Apr 8 09:10:54 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux

# lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 20.04.4 LTS
Release:        20.04
Codename:       focal

Other information

ENV DOTNET_TieredPGO=1
ENV DOTNET_ReadyToRun=0
ENV DOTNET_TC_QuickJitForLoops=1

The text was updated successfully, but these errors were encountered:

ghost · 2022-05-13T17:53:08Z

Tagging subscribers to this area: @JulieLeeMSFT
See info in area-owners.md if you want to be subscribed.

Issue Details

Description

Started getting a segmentation fault in linux after upgrading to net 6 (& PGO enabled too). It's not consistent though, happening 1/7. Seen it happen twice already.

Call stack

(lldb)
* thread #1, name = 'price-equity-us', stop reason = signal SIGSEGV
  * frame #0: 0x00007f5950614ca8 libclrjit.so`Compiler::optFindLoops() at optimizer.cpp:1933:56
    frame #1: 0x00007f5950614ba1 libclrjit.so`Compiler::optFindLoops() [inlined] (anonymous namespace)::LoopSearch::FindLoop(this=0x00007f4a9bffdd10, head=<unavailable>, top=<unavailable>, bottom=<unavailable>) at optimizer.cpp:1692
    frame #2: 0x00007f5950614b37 libclrjit.so`Compiler::optFindLoops() at optimizer.cpp:2371
    frame #3: 0x00007f5950612197 libclrjit.so`Compiler::optFindLoops(this=<unavailable>) at optimizer.cpp:4641
    frame #4: 0x00007f59505461fa libclrjit.so`Compiler::compCompileHelper(CORINFO_MODULE_STRUCT_*, ICorJitInfo*, CORINFO_METHOD_INFO*, void**, unsigned int*, JitFlags*) [inlined] Phase::Run(this=0x00007f4a9bffded0) at phase.cpp:61:26
    frame #5: 0x00007f59505461e5 libclrjit.so`Compiler::compCompileHelper(CORINFO_MODULE_STRUCT_*, ICorJitInfo*, CORINFO_METHOD_INFO*, void**, unsigned int*, JitFlags*) [inlined] DoPhase(_compiler=0x00007f4a940432d8, _phase=PHASE_FIND_LOOPS, _action=00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00)()) at phase.h:136
    frame #6: 0x00007f59505461a4 libclrjit.so`Compiler::compCompileHelper(CORINFO_MODULE_STRUCT_*, ICorJitInfo*, CORINFO_METHOD_INFO*, void**, unsigned int*, JitFlags*) [inlined] Compiler::compCompile(this=0x00007f4a940432d8, methodCodePtr=0x00007f4a9bffe328, methodCodeSize=0x00007f4a9bffe4cc, compileFlags=0x00007f4a9bffe340) at compiler.cpp:4944
    frame #7: 0x00007f5950545fac libclrjit.so`Compiler::compCompileHelper(this=0x00007f4a940432d8, classPtr=<unavailable>, compHnd=<unavailable>, methodInfo=<unavailable>, methodCodePtr=0x00007f4a9bffe328, methodCodeSize=0x00007f4a9bffe4cc, compileFlags=0x00007f4a9bffe340) at compiler.cpp:6409
    frame #8: 0x00007f595054970e libclrjit.so`jitNativeCode(CORINFO_METHOD_STRUCT_*, CORINFO_MODULE_STRUCT_*, ICorJitInfo*, CORINFO_METHOD_INFO*, void**, unsigned int*, JitFlags*, void*) at compiler.cpp:5686:28
    frame #9: 0x00007f59505496e6 libclrjit.so`jitNativeCode(CORINFO_METHOD_STRUCT_*, CORINFO_MODULE_STRUCT_*, ICorJitInfo*, CORINFO_METHOD_INFO*, void**, unsigned int*, JitFlags*, void*) at compiler.cpp:5705
    frame #10: 0x00007f5950549533 libclrjit.so`jitNativeCode(CORINFO_METHOD_STRUCT_*, CORINFO_MODULE_STRUCT_*, ICorJitInfo*, CORINFO_METHOD_INFO*, void**, unsigned int*, JitFlags*, void*) at compiler.cpp:7055
    frame #11: 0x00007f59505490c7 libclrjit.so`jitNativeCode(CORINFO_METHOD_STRUCT_*, CORINFO_MODULE_STRUCT_*, ICorJitInfo*, CORINFO_METHOD_INFO*, void**, unsigned int*, JitFlags*, void*) at compiler.cpp:7080
    frame #12: 0x00007f59505490c3 libclrjit.so`jitNativeCode(methodHnd=0x00007f58dfd3e330, classPtr=0x00007f58de534000, compHnd=0x00007f4a9bffe668, methodInfo=<unavailable>, methodCodePtr=0x00007f4a9bffe328, methodCodeSize=0x00007f4a9bffe4cc, compileFlags=0x00007f4a9bffe340, inlineInfoPtr=0x0000000000000000) at compiler.cpp:7082
    frame #13: 0x00007f595054ee56 libclrjit.so`CILJit::compileMethod(this=<unavailable>, compHnd=0x00007f4a9bffe668, methodInfo=0x00007f4a9bffe508, flags=<unavailable>, entryAddress=0x00007f4a9bffe638, nativeSizeOfCode=<unavailable>) at ee_il_dll.cpp:276:14
    frame #14: 0x00007f5958205a6c libcoreclr.so`invokeCompileMethodHelper(jitMgr=<unavailable>, comp=0x00007f4a9bffe668, info=<unavailable>, jitFlags=CORJIT_FLAGS @ 0x00007f4a9bffe3e0, nativeEntry=<unavailable>, nativeSizeOfCode=<unavailable>) at jitinterface.cpp:12774:30
    frame #15: 0x00007f5958205b38 libcoreclr.so`invokeCompileMethod(jitMgr=0x0000557915e99810, comp=0x00007f4a9bffe668, info=0x00007f4a9bffe508, jitFlags=CORJIT_FLAGS @ 0x00007f4a9bffe460, nativeEntry=0x00007f4a9bffe638, nativeSizeOfCode=0x00007f4a9bffe4cc) at jitinterface.cpp:12839:24
    frame #16: 0x00007f59582065dc libcoreclr.so`UnsafeJitFunction(config=<unavailable>, ILHeader=0x00007f4a9bffe800, flags=<unavailable>, pSizeOfCode=0x00007f4a9bffe92c) at jitinterface.cpp:13355:19
    frame #17: 0x00007f5958243dc6 libcoreclr.so`MethodDesc::JitCompileCodeLocked(this=<unavailable>, pConfig=0x00007f4a9bffeab8, pEntry=0x00007f4a940013e0, pSizeOfCode=0x00007f4a9bffe92c, pFlags=0x00007f4a9bffe8c0) at prestub.cpp:1051:17
    frame #18: 0x00007f5958243a78 libcoreclr.so`MethodDesc::JitCompileCodeLockedEventWrapper(this=0x00007f58dfd3e330, pConfig=0x00007f4a9bffeab8, pEntry=0x00007f4a940013e0) at prestub.cpp:920:17
    frame #19: 0x00007f59582431aa libcoreclr.so`MethodDesc::JitCompileCode(this=0x00007f58dfd3e330, pConfig=0x00007f4a9bffeab8) at prestub.cpp:860:20
    frame #20: 0x00007f5958242d60 libcoreclr.so`MethodDesc::PrepareILBasedCode(this=0x00007f58dfd3e330, pConfig=0x00007f4a9bffeab8) at prestub.cpp:439:17
    frame #21: 0x00007f59582723b1 libcoreclr.so`TieredCompilationManager::CompileCodeVersion(this=<unavailable>, nativeCodeVersion=NativeCodeVersion @ 0x00007f4a9bffebe8) at tieredcompilation.cpp:906:26
    frame #22: 0x00007f59582718d1 libcoreclr.so`TieredCompilationManager::DoBackgroundWork(unsigned long*, unsigned long, unsigned long) [inlined] TieredCompilationManager::OptimizeMethod(this=0x0000557915e977e8, nativeCodeVersion=<unavailable>) at tieredcompilation.cpp:883:9
    frame #23: 0x00007f59582718b6 libcoreclr.so`TieredCompilationManager::DoBackgroundWork(this=0x0000557915e977e8, workDurationTicksRef=0x00007f4a9bffec58, minWorkDurationTicks=32000000, maxWorkDurationTicks=50000000) at tieredcompilation.cpp:768
    frame #24: 0x00007f5958270ef4 libcoreclr.so`TieredCompilationManager::BackgroundWorkerStart(this=0x0000557915e977e8) at tieredcompilation.cpp:483:14
    frame #25: 0x00007f5958270d2c libcoreclr.so`TieredCompilationManager::BackgroundWorkerBootstrapper1((null)=<unavailable>) at tieredcompilation.cpp:431:52
    frame #26: 0x00007f595826d6ca libcoreclr.so`ManagedThreadBase_DispatchOuter(ManagedThreadCallState*) [inlined] ManagedThreadBase_DispatchInner(ManagedThreadCallState*) at threads.cpp:7321:5
    frame #27: 0x00007f595826d6c8 libcoreclr.so`ManagedThreadBase_DispatchOuter(ManagedThreadCallState*) at threads.cpp:7365
    frame #28: 0x00007f595826d682 libcoreclr.so`ManagedThreadBase_DispatchOuter(ManagedThreadCallState*) [inlined] ManagedThreadBase_DispatchOuter(this=<unavailable>)::$_6::operator()(ManagedThreadBase_DispatchOuter(ManagedThreadCallState*)::TryArgs*) const::'lambda'(Param*)::operator()(Param*) const at threads.cpp:7523
    frame #29: 0x00007f595826d682 libcoreclr.so`ManagedThreadBase_DispatchOuter(ManagedThreadCallState*) at threads.cpp:7525
    frame #30: 0x00007f595826d613 libcoreclr.so`ManagedThreadBase_DispatchOuter(pCallState=0x00007f4a9bffee00) at threads.cpp:7549
    frame #31: 0x00007f595826dccd libcoreclr.so`ManagedThreadBase::KickOff(void (*)(void*), void*) [inlined] ManagedThreadBase_FullTransition(pTarget=<unavailable>, args=<unavailable>, filterType=ManagedThread)(void*), void*, UnhandledExceptionLocation) at threads.cpp:7569:5
    frame #32: 0x00007f595826dcb5 libcoreclr.so`ManagedThreadBase::KickOff(pTarget=<unavailable>, args=<unavailable>)(void*), void*) at threads.cpp:7604
    frame #33: 0x00007f5958270c50 libcoreclr.so`TieredCompilationManager::BackgroundWorkerBootstrapper0(args=0x00007f4c100197a0) at tieredcompilation.cpp:412:5
    frame #34: 0x00007f595860653e libcoreclr.so`CorUnix::CPalThread::ThreadEntry(pvParam=0x00007f4c10017090) at thread.cpp:1862:16
    frame #35: 0x00007f5958e77609 libpthread.so.0`start_thread(arg=<unavailable>) at pthread_create.c:477:8
    frame #36: 0x00007f5958a12163 libc.so.6`umount2 at umount2.S:8

bt_all.txt

Reproduction Steps

Sorry it's a complex project in a private repository. I can say there is a lot of action going on, high CPU/ram usage, file read/write.

Expected behavior

No segmentation fault

Actual behavior

Segmentation fault

Regression?

Net 5 worked

Known Workarounds

No response

Configuration

# dotnet --version
6.0.202

# uname -a
Linux 9c8a01c29b3b 5.4.0-109-generic #123-Ubuntu SMP Fri Apr 8 09:10:54 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux

# lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 20.04.4 LTS
Release:        20.04
Codename:       focal

Other information

No response

Author:	Martin-Molinero
Assignees:	-
Labels:	`area-CodeGen-coreclr`
Milestone:	-

AndyAyersMS · 2022-05-13T19:32:23Z

FYI @BruceForstall
cc @dotnet/jit-contrib

Faulting code is

runtime/src/coreclr/jit/optimizer.cpp

Lines 1928 to 1933 in ef4ed6d

    
           // This blocks is lexically between TOP and BOTTOM, but it does not 
        
           // participate in the flow cycle.  Check for a run of consecutive 
        
           // such blocks. 
        
           BasicBlock* lastNonLoopBlock = block; 
        
           BasicBlock* nextLoopBlock    = block->bbNext; 
        
           while (!loopBlocks.IsMember(nextLoopBlock->bbNum))

Presumably nextLoopBlock is nullptr, despite the code believing that it will always encounter bottom and exit before it will reach the end of the block chain. I don't think we've seen this behavior before.

@Martin-Molinero you may be able to work around this by annotating the method that is being jitted here with either [MethodImpl(MethodImplOptions.NoOptimization)] or perhaps [MethodImpl(MethodImplOptions.AggressiveOptimization)].

If you can (privately) share a core dump we can try and track down how we're hitting this issue. Also if you're not sure which method was being jitted we can probably figure this out from the core dump too.

BruceForstall · 2022-05-13T20:01:45Z

If you are able to debug under lldb, and can use SOS (https://docs.microsoft.com/en-us/dotnet/core/diagnostics/debug-linux-dumps, https://docs.microsoft.com/en-us/dotnet/core/diagnostics/dotnet-sos), and can find a MethodDesc pointer on the stack, using the DumpMD command (https://docs.microsoft.com/en-us/dotnet/core/diagnostics/sos-debugging-extension) should identify the function.

There are likely some very complex or unusual loop structures. Perhaps if you can't share the code or a core dump, you could try to extract a sample of the same function that still exhibits the problem.

Martin-Molinero · 2022-05-13T20:14:00Z

If you are able to debug under lldb, and can use SOS

Yes, but even after installing SOS the commands are not available at lldb as I understand they should be?

I've shared along the core dump with @AndyAyersMS

AndyAyersMS · 2022-05-13T20:28:57Z

Thanks, I'll take a look.

AndyAyersMS · 2022-05-15T18:25:35Z

Based on the data you sent me offline, it seems like the JIT is compiling System.Threading.PortableThreadPool+WorkerThread.WorkerThreadStart which is not something you are going to be able to fix via annotation.

A workaround for now is to stop setting DOTNET_TC_QuickJitForLoops=1 as the problem arises when a certain pattern of dynamic PGO data intersects with the JIT's loop recognition. Using this workaround may reduce the performance gains you see (if any) with PGO.

I can reproduce what look like similar issues in the .NET 6.0 by feeding the JIT randomized profile data, but I have yet to confirm if this is indeed the same problem or a related one.

AndyAyersMS · 2022-05-17T03:04:58Z

I think I understand roughly what happens. We have a flow graph with more than one loop. The first loop we find has some non-loop code in its extent, including two different stretches of code belonging to a second loop (because PGO has moved blocks). We manage to move the first extent with MakeCompactAndFindExits but ultimately, we fail to recognize the loop. But in the process, we've scrambled the bbNext order for the second loop and now its bottom block (which was in the second extent) ends up earlier in the block list than its top (which was in the first extent). We get to MakeCompactAndFindExits for the second loop. There are also non-loop blocks in this loop and so we start walking the bbNext chain at the first non-loop block to find the end of that segment of blocks, expecting we'll hit another loop block before we reach the end of the method. But we never do find one. So, we AV walking off the end of the bbNext chain.

The fix for .NET 6 is likely is to just avoid the AV by giving up on recognizing the second loop. We hopefully can find a more robust fix in .NET 7 that avoids this sort of scrambled order.

…t chain In dotnet#69323 the 6.0.4 jit caused an AV because it walked off the end of the bbNext chain during `optFindNaturalLoops`. Analysis of a customer-provided dump showed that `MakeCompactAndFindExits` might fail to find an expected loop block and so walk the entire bbNext chain and then fall off the end. Details from the dump suggested that this happened because a prior call to `MakeCompactAndFindExits` had moved most but not all of a loop's blocks later in bbNext order, leaving that loop's bottom block earlier in the bbNext chain then it's top. This ordering was unexpected. I cannot repro this failure. The customer was using PGO and it's likely that earlier PGO-driven block reordering contributed to this problem by interleaving the blocks from two loops. We can recover the root method PGO schema from the dump, but applying this is insufficient to cause the problem. This method does quite a bit of inlining so it's likely that some inlinee PGO data must also be a contributing factor. At any rate, we can guard against this case easily enough, and simply abandon recognition of any loop where we fail to find an expected loop block during the bbNext chain walk.

…t chain (#69503) In #69323 the 6.0.4 jit caused an AV because it walked off the end of the bbNext chain during `optFindNaturalLoops`. Analysis of a customer-provided dump showed that `MakeCompactAndFindExits` might fail to find an expected loop block and so walk the entire bbNext chain and then fall off the end. Details from the dump suggested that this happened because a prior call to `MakeCompactAndFindExits` had moved most but not all of a loop's blocks later in bbNext order, leaving that loop's bottom block earlier in the bbNext chain then it's top. This ordering was unexpected. I cannot repro this failure. The customer was using PGO and it's likely that earlier PGO-driven block reordering contributed to this problem by interleaving the blocks from two loops. We can recover the root method PGO schema from the dump, but applying this is insufficient to cause the problem. This method does quite a bit of inlining so it's likely that some inlinee PGO data must also be a contributing factor. At any rate, we can guard against this case easily enough, and simply abandon recognition of any loop where we fail to find an expected loop block during the bbNext chain walk.

…t chain In #69323 the 6.0.4 jit caused an AV because it walked off the end of the bbNext chain during `optFindNaturalLoops`. Analysis of a customer-provided dump showed that `MakeCompactAndFindExits` might fail to find an expected loop block and so walk the entire bbNext chain and then fall off the end. Details from the dump suggested that this happened because a prior call to `MakeCompactAndFindExits` had moved most but not all of a loop's blocks later in bbNext order, leaving that loop's bottom block earlier in the bbNext chain then it's top. This ordering was unexpected. I cannot repro this failure. The customer was using PGO and it's likely that earlier PGO-driven block reordering contributed to this problem by interleaving the blocks from two loops. We can recover the root method PGO schema from the dump, but applying this is insufficient to cause the problem. This method does quite a bit of inlining so it's likely that some inlinee PGO data must also be a contributing factor. At any rate, we can guard against this case easily enough, and simply abandon recognition of any loop where we fail to find an expected loop block during the bbNext chain walk.

AndyAyersMS · 2022-06-03T01:10:06Z

This is fixed in main / 7.0, will close this once we've serviced it in 6.x

…t chain (#69525) In #69323 the 6.0.4 jit caused an AV because it walked off the end of the bbNext chain during `optFindNaturalLoops`. Analysis of a customer-provided dump showed that `MakeCompactAndFindExits` might fail to find an expected loop block and so walk the entire bbNext chain and then fall off the end. Details from the dump suggested that this happened because a prior call to `MakeCompactAndFindExits` had moved most but not all of a loop's blocks later in bbNext order, leaving that loop's bottom block earlier in the bbNext chain then it's top. This ordering was unexpected. I cannot repro this failure. The customer was using PGO and it's likely that earlier PGO-driven block reordering contributed to this problem by interleaving the blocks from two loops. We can recover the root method PGO schema from the dump, but applying this is insufficient to cause the problem. This method does quite a bit of inlining so it's likely that some inlinee PGO data must also be a contributing factor. At any rate, we can guard against this case easily enough, and simply abandon recognition of any loop where we fail to find an expected loop block during the bbNext chain walk. Co-authored-by: Andy Ayers <[email protected]>

AndyAyersMS · 2022-07-12T18:49:33Z

@Martin-Molinero 6.0.7 is now out and hopefully fixes this and other issues you ran into with PGO: https://devblogs.microsoft.com/dotnet/july-2022-updates/

Let me know you get a chance to try it out.

AndyAyersMS · 2022-07-21T20:49:53Z

I'm going to close this, feel free to re-open if needed if you get around to trying 6.0.7.

ghost added the untriaged New issue has not been triaged by the area owner label May 13, 2022

dotnet-issue-labeler bot added area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI and removed untriaged New issue has not been triaged by the area owner labels May 13, 2022

AndyAyersMS mentioned this issue May 18, 2022

JIT: Abandon loop search if we reach the end of the bbNext chain #69503

Merged

AndyAyersMS mentioned this issue May 18, 2022

[release/6.0] JIT: Abandon loop search if we reach the end of the bbNext chain #69525

Merged

AndyAyersMS self-assigned this May 20, 2022

JulieLeeMSFT added the needs-further-triage Issue has been initially triaged, but needs deeper consideration or reconsideration label May 20, 2022

JulieLeeMSFT added this to the 7.0.0 milestone May 23, 2022

JulieLeeMSFT removed the needs-further-triage Issue has been initially triaged, but needs deeper consideration or reconsideration label May 23, 2022

AndyAyersMS modified the milestones: 7.0.0, 6.0.x Jun 3, 2022

AndyAyersMS closed this as completed Jul 21, 2022

ghost locked as resolved and limited conversation to collaborators Aug 21, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Segmentation fault linux since net 6 upgrade #69323

Segmentation fault linux since net 6 upgrade #69323

Martin-Molinero commented May 13, 2022 •

edited

Loading

ghost commented May 13, 2022

Description

Reproduction Steps

Expected behavior

Actual behavior

Regression?

Known Workarounds

Configuration

Other information

AndyAyersMS commented May 13, 2022

BruceForstall commented May 13, 2022

Martin-Molinero commented May 13, 2022

AndyAyersMS commented May 13, 2022

AndyAyersMS commented May 15, 2022

AndyAyersMS commented May 17, 2022

AndyAyersMS commented Jun 3, 2022

AndyAyersMS commented Jul 12, 2022

AndyAyersMS commented Jul 21, 2022

Segmentation fault linux since net 6 upgrade #69323

Segmentation fault linux since net 6 upgrade #69323

Comments

Martin-Molinero commented May 13, 2022 • edited Loading

Description

Reproduction Steps

Expected behavior

Actual behavior

Regression?

Known Workarounds

Configuration

Other information

ghost commented May 13, 2022

Description

Reproduction Steps

Expected behavior

Actual behavior

Regression?

Known Workarounds

Configuration

Other information

AndyAyersMS commented May 13, 2022

BruceForstall commented May 13, 2022

Martin-Molinero commented May 13, 2022

AndyAyersMS commented May 13, 2022

AndyAyersMS commented May 15, 2022

AndyAyersMS commented May 17, 2022

AndyAyersMS commented Jun 3, 2022

AndyAyersMS commented Jul 12, 2022

AndyAyersMS commented Jul 21, 2022

Martin-Molinero commented May 13, 2022 •

edited

Loading