-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
dotnet build
intermittently crashes with segfault on Ubuntu 18.04
#48411
Comments
I'm experiencing this on a self-hosted Azure DevOps BuildAgent which fails randomly on
Dotnet get's installed on the agent by using the installer task:
|
Is a bit odd. @marcwittke could you run |
@vitek-karas does it ring a bell? |
sure:
well, a cleanup wouldn't be bad... Is it safe to delete the _tool folder? |
I'm experiencing this on a self-hosted Azure DevOps BuildAgent which fails randomly on dotnet commands on .net core 3.1 projects
The build succeeds but since the process is returning with exit code null the build process fails. |
I couldn't figure out the best area label to add to this issue. If you have write-permissions please help me learn by adding exactly one area label. |
Tagging subscribers to this area: @vitek-karas, @agocke Issue DetailsNow and then our build agent produces broken builds. The Error message reads: The project is a dotnet core 3.1 web api solution with something like 30 projects, no unmanaged stuff at all. root cause is a segfault as seen in dmesg
Environment info:
Build agents are equipped with 2vCPU and 2GB memory.
I have no idea how to debug this. I'd like to provide more info, but need assistance to do so.
|
AspNetCore is hitting an issue that looks very similar to this. We run some tests then call We have a crash dump at https://helixre8s23ayyeko0k025g8.blob.core.windows.net/dotnet-aspnetcore-refs-heads-main-bd6750238a114336b0/Microsoft.AspNetCore.Localization.Tests--net6.0/core.1000.9653?sv=2019-07-07&se=2021-04-13T17%3A14%3A17Z&sr=c&sp=rl&sig=2cdAaIh4bXj5NtvyeG%2FSxJtayROazUADEGUgmDPsOJM%3D Both show the thread that segfaulted at an address that looks like it is in the address space of the libpthread-2.27.so module. The dumps will be around for a week or 2. |
I'll take a look at the dumps. |
@BrennanConroy what is the distro that the dumps came from? |
Helix queue ubuntu.1804.amd64.open For the first link: |
What I can see in the dump is that the main thread has already exited and the crashing secondary thread is attempting to run some OpenSSL code and a lock address inside of libcrypto passed to CRYPTO_THREAD_write_lock is set to NULL. This sounds like the same issue as #34231. Only that this time, it doesn't stem from the ERR_reason_error_string like in that issue, but from the following:
cc: @bartonjs |
Given that Ubuntu 18.04 has explicitly removed support for NO_ATEXIT, I worry we'll end up just finding one intermittent problem after another. The previous fix assumed that everything other than the string table was graceful about post-exit calls, but apparently calls into the RNG hit a failure while trying to reinitialize it. Feels like our choices are:
|
Is it feasible/useful to offer a change to OpenSSL? Although perhaps this is a problem others might have to solve when interopping with a different native library that has similar expectations. |
OpenSSL supports the scenario, and we opt into it (OPENSSL_INIT_NO_ATEXIT): runtime/src/libraries/Native/Unix/System.Security.Cryptography.Native/openssl.c Lines 1287 to 1301 in 400311b
The Ubuntu 18.04 build.... somewhere that I found before that I didn't write down and am having trouble finding again... explicitly removes support for that option. |
Ah got it. And later versions - 20.04 etc? |
This seems like the only reasonable possibility. It seems like the next critical thing to know is whether this also affects 20.04+. That would make it more important to fix since presumably 20.04 or later is an option for many 18.04 customers. @bartonjs we know how to find that out? Here's what I have on my 20.04 machine with
From the above info I'm unsure how to determine. |
Those SHA's aren't in the OpenSSL repo and it's not clear where in https://launchpad.net/ubuntu to find the sources Ubuntu used. Anyway I don't know what to look for. |
https://packages.ubuntu.com/source/focal/openssl says that Focal is based on OpenSSL 1.1.1f (plus servicing patches), and in 1.1.1f the source looked like So if you do something like $ lldb /usr/lib/x86_64-linux-gnu/libcrypto.so.1.1
(lldb) target create "/usr/lib/x86_64-linux-gnu/libcrypto.so.1.1"
Current executable set to '/usr/lib/x86_64-linux-gnu/libcrypto.so.1.1' (x86_64).
(lldb) dis -n OPENSSL_init_crypto
libcrypto.so.1.1`OPENSSL_init_crypto:
libcrypto.so.1.1[0x176bc0] <+0>: pushq %rbp
libcrypto.so.1.1[0x176bc1] <+1>: pushq %rbx
libcrypto.so.1.1[0x176bc2] <+2>: movq %rdi, %rbx
libcrypto.so.1.1[0x176bc5] <+5>: subq $0x8, %rsp
libcrypto.so.1.1[0x176bc9] <+9>: movl 0x352c59(%rip), %eax
libcrypto.so.1.1[0x176bcf] <+15>: testl %eax, %eax
libcrypto.so.1.1[0x176bd1] <+17>: je 0x176c18 ; <+88>
libcrypto.so.1.1[0x176bd3] <+19>: testl $0x40000, %edi ; imm = 0x40000
libcrypto.so.1.1[0x176bd9] <+25>: je 0x176bf0 ; <+48>
libcrypto.so.1.1[0x176bdb] <+27>: xorl %ebp, %ebp
libcrypto.so.1.1[0x176bdd] <+29>: addq $0x8, %rsp
libcrypto.so.1.1[0x176be1] <+33>: movl %ebp, %eax
libcrypto.so.1.1[0x176be3] <+35>: popq %rbx
libcrypto.so.1.1[0x176be4] <+36>: popq %rbp
libcrypto.so.1.1[0x176be5] <+37>: retq
libcrypto.so.1.1[0x176be6] <+38>: nopw %cs:(%rax,%rax)
libcrypto.so.1.1[0x176bf0] <+48>: leaq 0xc23cc(%rip), %rcx
libcrypto.so.1.1[0x176bf7] <+55>: movl $0x252, %r8d ; imm = 0x252
libcrypto.so.1.1[0x176bfd] <+61>: movl $0x46, %edx
libcrypto.so.1.1[0x176c02] <+66>: movl $0x74, %esi
libcrypto.so.1.1[0x176c07] <+71>: movl $0xf, %edi
libcrypto.so.1.1[0x176c0c] <+76>: callq 0x1580e0 ; ERR_put_error
libcrypto.so.1.1[0x176c11] <+81>: jmp 0x176bdb ; <+27>
libcrypto.so.1.1[0x176c13] <+83>: nopl (%rax,%rax)
libcrypto.so.1.1[0x176c18] <+88>: movq %rsi, %rbp
libcrypto.so.1.1[0x176c1b] <+91>: leaq 0x352bee(%rip), %rdi
libcrypto.so.1.1[0x176c22] <+98>: leaq -0x2e9(%rip), %rsi ; ___lldb_unnamed_symbol1395$$libcrypto.so.1.1
libcrypto.so.1.1[0x176c29] <+105>: callq 0x1df9f0 ; CRYPTO_THREAD_run_once
libcrypto.so.1.1[0x176c2e] <+110>: testl %eax, %eax
libcrypto.so.1.1[0x176c30] <+112>: je 0x176bdb ; <+27>
libcrypto.so.1.1[0x176c32] <+114>: movl 0x352bd0(%rip), %eax
libcrypto.so.1.1[0x176c38] <+120>: testl %eax, %eax
libcrypto.so.1.1[0x176c3a] <+122>: je 0x176bdb ; <+27>
libcrypto.so.1.1[0x176c3c] <+124>: testl $0x40000, %ebx ; imm = 0x40000
libcrypto.so.1.1[0x176c42] <+130>: je 0x176d70 ; <+432>
libcrypto.so.1.1[0x176c48] <+136>: testb $0x1, %bl
libcrypto.so.1.1[0x176c4b] <+139>: jne 0x176d10 ; <+336>
libcrypto.so.1.1[0x176c51] <+145>: testb $0x2, %bl
libcrypto.so.1.1[0x176c54] <+148>: jne 0x176d40 ; <+384>
libcrypto.so.1.1[0x176c5a] <+154>: testb $0x10, %bl
libcrypto.so.1.1[0x176c5d] <+157>: jne 0x176da0 ; <+480>
libcrypto.so.1.1[0x176c63] <+163>: testb $0x4, %bl
libcrypto.so.1.1[0x176c66] <+166>: jne 0x176dd0 ; <+528>
libcrypto.so.1.1[0x176c6c] <+172>: testb $0x20, %bl
libcrypto.so.1.1[0x176c6f] <+175>: jne 0x176e00 ; <+576>
libcrypto.so.1.1[0x176c75] <+181>: testb $0x8, %bl
libcrypto.so.1.1[0x176c78] <+184>: jne 0x176e2e ; <+622>
libcrypto.so.1.1[0x176c7e] <+190>: testl $0x20000, %ebx ; imm = 0x20000 (from Ubuntu 18.04) hopefully there'll be something that looks like it's doing a test for 0x80000. If so, the problem is just gone on 20.04. I've previously said that Ubuntu "removed" the support. Looking again, I don't see a patch that removes the support... but I also don't see one that adds it. The OPENSSL_INIT_NO_ATEXIT support was backported for OpenSSL 1.1.1b. It looks like Ubuntu 18.04 is 1.1.1 (RTM) plus servicing, and their servicing did something other than "catch up to 1.1.1-stable". |
Ubuntu 20.04 output
(lldb) target create "/usr/lib/x86_64-linux-gnu/libcrypto.so.1.1"
Current executable set to '/usr/lib/x86_64-linux-gnu/libcrypto.so.1.1' (x86_64).
(lldb) dis -n OPENSSL_init_crypto
libcrypto.so.1.1`OPENSSL_init_crypto:
libcrypto.so.1.1[0x177590] <+0>: endbr64
libcrypto.so.1.1[0x177594] <+4>: movl 0x15d296(%rip), %eax
libcrypto.so.1.1[0x17759a] <+10>: pushq %r12
libcrypto.so.1.1[0x17759c] <+12>: pushq %rbp
libcrypto.so.1.1[0x17759d] <+13>: pushq %rbx
libcrypto.so.1.1[0x17759e] <+14>: movq %rdi, %rbx
libcrypto.so.1.1[0x1775a1] <+17>: testl %eax, %eax
libcrypto.so.1.1[0x1775a3] <+19>: je 0x1775c0 ; <+48>
libcrypto.so.1.1[0x1775a5] <+21>: testl $0x40000, %edi ; imm = 0x40000
libcrypto.so.1.1[0x1775ab] <+27>: je 0x1776e8 ; <+344>
libcrypto.so.1.1[0x1775b1] <+33>: xorl %r12d, %r12d
libcrypto.so.1.1[0x1775b4] <+36>: movl %r12d, %eax
libcrypto.so.1.1[0x1775b7] <+39>: popq %rbx
libcrypto.so.1.1[0x1775b8] <+40>: popq %rbp
libcrypto.so.1.1[0x1775b9] <+41>: popq %r12
libcrypto.so.1.1[0x1775bb] <+43>: retq
libcrypto.so.1.1[0x1775bc] <+44>: nopl (%rax)
libcrypto.so.1.1[0x1775c0] <+48>: movq %rsi, %rbp
libcrypto.so.1.1[0x1775c3] <+51>: leaq 0x15d24e(%rip), %rdi
libcrypto.so.1.1[0x1775ca] <+58>: leaq -0x2d1(%rip), %rsi ; ___lldb_unnamed_symbol1509$$libcrypto.so.1.1
libcrypto.so.1.1[0x1775d1] <+65>: callq 0x1e2780 ; CRYPTO_THREAD_run_once
libcrypto.so.1.1[0x1775d6] <+70>: testl %eax, %eax
libcrypto.so.1.1[0x1775d8] <+72>: je 0x1775b1 ; <+33>
libcrypto.so.1.1[0x1775da] <+74>: movl 0x15d230(%rip), %eax
libcrypto.so.1.1[0x1775e0] <+80>: testl %eax, %eax
libcrypto.so.1.1[0x1775e2] <+82>: je 0x1775b1 ; <+33>
libcrypto.so.1.1[0x1775e4] <+84>: movl $0x1, %r12d
libcrypto.so.1.1[0x1775ea] <+90>: testl $0x40000, %ebx ; imm = 0x40000
libcrypto.so.1.1[0x1775f0] <+96>: jne 0x1775b4 ; <+36>
libcrypto.so.1.1[0x1775f2] <+98>: testl $0x80000, %ebx ; imm = 0x80000
libcrypto.so.1.1[0x1775f8] <+104>: je 0x177718 ; <+392>
libcrypto.so.1.1[0x1775fe] <+110>: leaq -0x565(%rip), %rsi ; ___lldb_unnamed_symbol1491$$libcrypto.so.1.1
libcrypto.so.1.1[0x177605] <+117>: leaq 0x15d200(%rip), %rdi
libcrypto.so.1.1[0x17760c] <+124>: callq 0x1e2780 ; CRYPTO_THREAD_run_once
libcrypto.so.1.1[0x177611] <+129>: testl %eax, %eax
libcrypto.so.1.1[0x177613] <+131>: je 0x1775b1 ; <+33>
libcrypto.so.1.1[0x177615] <+133>: movl 0x15d1ec(%rip), %r12d
libcrypto.so.1.1[0x17761c] <+140>: testl %r12d, %r12d
libcrypto.so.1.1[0x17761f] <+143>: je 0x1775b1 ; <+33>
libcrypto.so.1.1[0x177621] <+145>: leaq -0x578(%rip), %rsi ; ___lldb_unnamed_symbol1492$$libcrypto.so.1.1
libcrypto.so.1.1[0x177628] <+152>: leaq 0x15d1d5(%rip), %rdi
libcrypto.so.1.1[0x17762f] <+159>: callq 0x1e2780 ; CRYPTO_THREAD_run_once
libcrypto.so.1.1[0x177634] <+164>: testl %eax, %eax
libcrypto.so.1.1[0x177636] <+166>: je 0x1775b1 ; <+33>
libcrypto.so.1.1[0x17763c] <+172>: movl 0x15d1bd(%rip), %r11d
libcrypto.so.1.1[0x177643] <+179>: testl %r11d, %r11d
libcrypto.so.1.1[0x177646] <+182>: je 0x1775b1 ; <+33>
libcrypto.so.1.1[0x17764c] <+188>: testb $0x1, %bl
libcrypto.so.1.1[0x17764f] <+191>: jne 0x177738 ; <+424>
libcrypto.so.1.1[0x177655] <+197>: testb $0x2, %bl
libcrypto.so.1.1[0x177658] <+200>: jne 0x177768 ; <+472>
libcrypto.so.1.1[0x17765e] <+206>: testb $0x10, %bl
libcrypto.so.1.1[0x177661] <+209>: jne 0x177798 ; <+520>
libcrypto.so.1.1[0x177667] <+215>: testb $0x4, %bl
libcrypto.so.1.1[0x17766a] <+218>: jne 0x1777c8 ; <+568>
libcrypto.so.1.1[0x177670] <+224>: testb $0x20, %bl
libcrypto.so.1.1[0x177673] <+227>: jne 0x1777f6 ; <+614>
libcrypto.so.1.1[0x177679] <+233>: testb $0x8, %bl
libcrypto.so.1.1[0x17767c] <+236>: jne 0x177824 ; <+660>
libcrypto.so.1.1[0x177682] <+242>: testl $0x20000, %ebx ; imm = 0x20000
libcrypto.so.1.1[0x177688] <+248>: jne 0x177852 ; <+706>
libcrypto.so.1.1[0x17768e] <+254>: testb $-0x80, %bl
libcrypto.so.1.1[0x177691] <+257>: jne 0x177864 ; <+724>
libcrypto.so.1.1[0x177697] <+263>: testb $0x40, %bl
libcrypto.so.1.1[0x17769a] <+266>: jne 0x177892 ; <+770>
libcrypto.so.1.1[0x1776a0] <+272>: testb $0x1, %bh
libcrypto.so.1.1[0x1776a3] <+275>: jne 0x1778df ; <+847>
libcrypto.so.1.1[0x1776a9] <+281>: testb $0x8, %bh
libcrypto.so.1.1[0x1776ac] <+284>: jne 0x17790d ; <+893>
libcrypto.so.1.1[0x1776b2] <+290>: testb $0x2, %bh
libcrypto.so.1.1[0x1776b5] <+293>: jne 0x17793a ; <+938>
libcrypto.so.1.1[0x1776bb] <+299>: testb $0x4, %bh
libcrypto.so.1.1[0x1776be] <+302>: jne 0x177991 ; <+1025>
libcrypto.so.1.1[0x1776c4] <+308>: testb $-0x2, %bh
libcrypto.so.1.1[0x1776c7] <+311>: jne 0x1779eb ; <+1115>
libcrypto.so.1.1[0x1776cd] <+317>: testl $0x10000, %ebx ; imm = 0x10000
libcrypto.so.1.1[0x1776d3] <+323>: jne 0x1779be ; <+1070>
libcrypto.so.1.1[0x1776d9] <+329>: movl $0x1, %r12d
libcrypto.so.1.1[0x1776df] <+335>: jmp 0x1775b4 ; <+36>
libcrypto.so.1.1[0x1776e4] <+340>: nopl (%rax)
libcrypto.so.1.1[0x1776e8] <+344>: xorl %r12d, %r12d
libcrypto.so.1.1[0x1776eb] <+347>: movl $0x270, %r8d ; imm = 0x270
libcrypto.so.1.1[0x1776f1] <+353>: movl $0x46, %edx
libcrypto.so.1.1[0x1776f6] <+358>: movl $0x74, %esi
libcrypto.so.1.1[0x1776fb] <+363>: leaq 0xc86d1(%rip), %rcx
libcrypto.so.1.1[0x177702] <+370>: movl $0xf, %edi
libcrypto.so.1.1[0x177707] <+375>: callq 0x157990 ; ERR_put_error
libcrypto.so.1.1[0x17770c] <+380>: movl %r12d, %eax
libcrypto.so.1.1[0x17770f] <+383>: popq %rbx
libcrypto.so.1.1[0x177710] <+384>: popq %rbp
libcrypto.so.1.1[0x177711] <+385>: popq %r12
libcrypto.so.1.1[0x177713] <+387>: retq
libcrypto.so.1.1[0x177714] <+388>: nopl (%rax)
libcrypto.so.1.1[0x177718] <+392>: leaq -0x44f(%rip), %rsi ; ___lldb_unnamed_symbol1508$$libcrypto.so.1.1
libcrypto.so.1.1[0x17771f] <+399>: leaq 0x15d0e6(%rip), %rdi
libcrypto.so.1.1[0x177726] <+406>: callq 0x1e2780 ; CRYPTO_THREAD_run_once
libcrypto.so.1.1[0x17772b] <+411>: testl %eax, %eax
libcrypto.so.1.1[0x17772d] <+413>: jne 0x177615 ; <+133>
libcrypto.so.1.1[0x177733] <+419>: jmp 0x1775b1 ; <+33>
libcrypto.so.1.1[0x177738] <+424>: leaq -0x67f(%rip), %rsi ; ___lldb_unnamed_symbol1493$$libcrypto.so.1.1
libcrypto.so.1.1[0x17773f] <+431>: leaq 0x15d0b6(%rip), %rdi
libcrypto.so.1.1[0x177746] <+438>: callq 0x1e2780 ; CRYPTO_THREAD_run_once
libcrypto.so.1.1[0x17774b] <+443>: testl %eax, %eax
libcrypto.so.1.1[0x17774d] <+445>: je 0x1775b1 ; <+33>
libcrypto.so.1.1[0x177753] <+451>: movl 0x15d09a(%rip), %r10d
libcrypto.so.1.1[0x17775a] <+458>: testl %r10d, %r10d
libcrypto.so.1.1[0x17775d] <+461>: jne 0x177655 ; <+197>
libcrypto.so.1.1[0x177763] <+467>: jmp 0x1775b1 ; <+33>
libcrypto.so.1.1[0x177768] <+472>: leaq -0x4cf(%rip), %rsi ; ___lldb_unnamed_symbol1507$$libcrypto.so.1.1
libcrypto.so.1.1[0x17776f] <+479>: leaq 0x15d086(%rip), %rdi
libcrypto.so.1.1[0x177776] <+486>: callq 0x1e2780 ; CRYPTO_THREAD_run_once
libcrypto.so.1.1[0x17777b] <+491>: testl %eax, %eax
libcrypto.so.1.1[0x17777d] <+493>: je 0x1775b1 ; <+33>
libcrypto.so.1.1[0x177783] <+499>: movl 0x15d06a(%rip), %r9d
libcrypto.so.1.1[0x17778a] <+506>: testl %r9d, %r9d
libcrypto.so.1.1[0x17778d] <+509>: jne 0x17765e ; <+206>
libcrypto.so.1.1[0x177793] <+515>: jmp 0x1775b1 ; <+33>
libcrypto.so.1.1[0x177798] <+520>: leaq -0x6cf(%rip), %rsi ; ___lldb_unnamed_symbol1494$$libcrypto.so.1.1
libcrypto.so.1.1[0x17779f] <+527>: leaq 0x15d04a(%rip), %rdi
libcrypto.so.1.1[0x1777a6] <+534>: callq 0x1e2780 ; CRYPTO_THREAD_run_once
libcrypto.so.1.1[0x1777ab] <+539>: testl %eax, %eax
libcrypto.so.1.1[0x1777ad] <+541>: je 0x1775b1 ; <+33>
libcrypto.so.1.1[0x1777b3] <+547>: movl 0x15d032(%rip), %r8d
libcrypto.so.1.1[0x1777ba] <+554>: testl %r8d, %r8d
libcrypto.so.1.1[0x1777bd] <+557>: jne 0x177667 ; <+215>
libcrypto.so.1.1[0x1777c3] <+563>: jmp 0x1775b1 ; <+33>
libcrypto.so.1.1[0x1777c8] <+568>: leaq -0x54f(%rip), %rsi ; ___lldb_unnamed_symbol1506$$libcrypto.so.1.1
libcrypto.so.1.1[0x1777cf] <+575>: leaq 0x15d01a(%rip), %rdi
libcrypto.so.1.1[0x1777d6] <+582>: callq 0x1e2780 ; CRYPTO_THREAD_run_once
libcrypto.so.1.1[0x1777db] <+587>: testl %eax, %eax
libcrypto.so.1.1[0x1777dd] <+589>: je 0x1775b1 ; <+33>
libcrypto.so.1.1[0x1777e3] <+595>: movl 0x15d003(%rip), %edi
libcrypto.so.1.1[0x1777e9] <+601>: testl %edi, %edi
libcrypto.so.1.1[0x1777eb] <+603>: jne 0x177670 ; <+224>
libcrypto.so.1.1[0x1777f1] <+609>: jmp 0x1775b1 ; <+33>
libcrypto.so.1.1[0x1777f6] <+614>: leaq -0x71d(%rip), %rsi ; ___lldb_unnamed_symbol1495$$libcrypto.so.1.1
libcrypto.so.1.1[0x1777fd] <+621>: leaq 0x15cfe4(%rip), %rdi
libcrypto.so.1.1[0x177804] <+628>: callq 0x1e2780 ; CRYPTO_THREAD_run_once
libcrypto.so.1.1[0x177809] <+633>: testl %eax, %eax
libcrypto.so.1.1[0x17780b] <+635>: je 0x1775b1 ; <+33>
libcrypto.so.1.1[0x177811] <+641>: movl 0x15cfcd(%rip), %esi
libcrypto.so.1.1[0x177817] <+647>: testl %esi, %esi
libcrypto.so.1.1[0x177819] <+649>: jne 0x177679 ; <+233>
libcrypto.so.1.1[0x17781f] <+655>: jmp 0x1775b1 ; <+33>
libcrypto.so.1.1[0x177824] <+660>: leaq -0x5cb(%rip), %rsi ; ___lldb_unnamed_symbol1505$$libcrypto.so.1.1
libcrypto.so.1.1[0x17782b] <+667>: leaq 0x15cfb6(%rip), %rdi
libcrypto.so.1.1[0x177832] <+674>: callq 0x1e2780 ; CRYPTO_THREAD_run_once
libcrypto.so.1.1[0x177837] <+679>: testl %eax, %eax
libcrypto.so.1.1[0x177839] <+681>: je 0x1775b1 ; <+33>
libcrypto.so.1.1[0x17783f] <+687>: movl 0x15cf9f(%rip), %ecx
libcrypto.so.1.1[0x177845] <+693>: testl %ecx, %ecx
libcrypto.so.1.1[0x177847] <+695>: jne 0x177682 ; <+242>
libcrypto.so.1.1[0x17784d] <+701>: jmp 0x1775b1 ; <+33>
libcrypto.so.1.1[0x177852] <+706>: callq 0x1e2850 ; ___lldb_unnamed_symbol1948$$libcrypto.so.1.1
libcrypto.so.1.1[0x177857] <+711>: testl %eax, %eax
libcrypto.so.1.1[0x177859] <+713>: jne 0x17768e ; <+254>
libcrypto.so.1.1[0x17785f] <+719>: jmp 0x1775b1 ; <+33>
libcrypto.so.1.1[0x177864] <+724>: leaq -0x62b(%rip), %rsi ; ___lldb_unnamed_symbol1504$$libcrypto.so.1.1
libcrypto.so.1.1[0x17786b] <+731>: leaq 0x15cf6e(%rip), %rdi
libcrypto.so.1.1[0x177872] <+738>: callq 0x1e2780 ; CRYPTO_THREAD_run_once
libcrypto.so.1.1[0x177877] <+743>: testl %eax, %eax
libcrypto.so.1.1[0x177879] <+745>: je 0x1775b1 ; <+33>
libcrypto.so.1.1[0x17787f] <+751>: movl 0x15cf4b(%rip), %edx
libcrypto.so.1.1[0x177885] <+757>: testl %edx, %edx
libcrypto.so.1.1[0x177887] <+759>: jne 0x177697 ; <+263>
libcrypto.so.1.1[0x17788d] <+765>: jmp 0x1775b1 ; <+33>
libcrypto.so.1.1[0x177892] <+770>: movq 0x15cf87(%rip), %rdi
libcrypto.so.1.1[0x177899] <+777>: callq 0x1e2700 ; CRYPTO_THREAD_write_lock
libcrypto.so.1.1[0x17789e] <+782>: leaq -0x685(%rip), %rsi ; ___lldb_unnamed_symbol1503$$libcrypto.so.1.1
libcrypto.so.1.1[0x1778a5] <+789>: leaq 0x15cf34(%rip), %rdi
libcrypto.so.1.1[0x1778ac] <+796>: movq %rbp, 0x15cf25(%rip)
libcrypto.so.1.1[0x1778b3] <+803>: callq 0x1e2780 ; CRYPTO_THREAD_run_once
libcrypto.so.1.1[0x1778b8] <+808>: movl %eax, %r12d
libcrypto.so.1.1[0x1778bb] <+811>: testl %eax, %eax
libcrypto.so.1.1[0x1778bd] <+813>: jne 0x177967 ; <+983>
libcrypto.so.1.1[0x1778c3] <+819>: movq 0x15cf56(%rip), %rdi
libcrypto.so.1.1[0x1778ca] <+826>: movq $0x0, 0x15cf03(%rip)
libcrypto.so.1.1[0x1778d5] <+837>: callq 0x1e2720 ; CRYPTO_THREAD_unlock
libcrypto.so.1.1[0x1778da] <+842>: jmp 0x1775b4 ; <+36>
libcrypto.so.1.1[0x1778df] <+847>: leaq -0x6f6(%rip), %rsi ; ___lldb_unnamed_symbol1502$$libcrypto.so.1.1
libcrypto.so.1.1[0x1778e6] <+854>: leaq 0x15cedf(%rip), %rdi
libcrypto.so.1.1[0x1778ed] <+861>: callq 0x1e2780 ; CRYPTO_THREAD_run_once
libcrypto.so.1.1[0x1778f2] <+866>: testl %eax, %eax
libcrypto.so.1.1[0x1778f4] <+868>: je 0x1775b1 ; <+33>
libcrypto.so.1.1[0x1778fa] <+874>: movl 0x15cec4(%rip), %eax
libcrypto.so.1.1[0x177900] <+880>: testl %eax, %eax
libcrypto.so.1.1[0x177902] <+882>: jne 0x1776a9 ; <+281>
libcrypto.so.1.1[0x177908] <+888>: jmp 0x1775b1 ; <+33>
libcrypto.so.1.1[0x17790d] <+893>: leaq -0x744(%rip), %rsi ; ___lldb_unnamed_symbol1501$$libcrypto.so.1.1
libcrypto.so.1.1[0x177914] <+900>: leaq 0x15cea5(%rip), %rdi
libcrypto.so.1.1[0x17791b] <+907>: callq 0x1e2780 ; CRYPTO_THREAD_run_once
libcrypto.so.1.1[0x177920] <+912>: testl %eax, %eax
libcrypto.so.1.1[0x177922] <+914>: je 0x1775b1 ; <+33>
libcrypto.so.1.1[0x177928] <+920>: cmpl $0x0, 0x15ce8d(%rip)
libcrypto.so.1.1[0x17792f] <+927>: jne 0x1776b2 ; <+290>
libcrypto.so.1.1[0x177935] <+933>: jmp 0x1775b1 ; <+33>
libcrypto.so.1.1[0x17793a] <+938>: leaq -0x791(%rip), %rsi ; ___lldb_unnamed_symbol1500$$libcrypto.so.1.1
libcrypto.so.1.1[0x177941] <+945>: leaq 0x15ce70(%rip), %rdi
libcrypto.so.1.1[0x177948] <+952>: callq 0x1e2780 ; CRYPTO_THREAD_run_once
libcrypto.so.1.1[0x17794d] <+957>: testl %eax, %eax
libcrypto.so.1.1[0x17794f] <+959>: je 0x1775b1 ; <+33>
libcrypto.so.1.1[0x177955] <+965>: cmpl $0x0, 0x15ce58(%rip)
libcrypto.so.1.1[0x17795c] <+972>: jne 0x1776bb ; <+299>
libcrypto.so.1.1[0x177962] <+978>: jmp 0x1775b1 ; <+33>
libcrypto.so.1.1[0x177967] <+983>: movl 0x15ce63(%rip), %ebp
libcrypto.so.1.1[0x17796d] <+989>: movq 0x15ceac(%rip), %rdi
libcrypto.so.1.1[0x177974] <+996>: movq $0x0, 0x15ce59(%rip)
libcrypto.so.1.1[0x17797f] <+1007>: callq 0x1e2720 ; CRYPTO_THREAD_unlock
libcrypto.so.1.1[0x177984] <+1012>: testl %ebp, %ebp
libcrypto.so.1.1[0x177986] <+1014>: jg 0x1776a0 ; <+272>
libcrypto.so.1.1[0x17798c] <+1020>: jmp 0x1775b1 ; <+33>
libcrypto.so.1.1[0x177991] <+1025>: leaq -0x808(%rip), %rsi ; ___lldb_unnamed_symbol1499$$libcrypto.so.1.1
libcrypto.so.1.1[0x177998] <+1032>: leaq 0x15ce11(%rip), %rdi
libcrypto.so.1.1[0x17799f] <+1039>: callq 0x1e2780 ; CRYPTO_THREAD_run_once
libcrypto.so.1.1[0x1779a4] <+1044>: testl %eax, %eax
libcrypto.so.1.1[0x1779a6] <+1046>: je 0x1775b1 ; <+33>
libcrypto.so.1.1[0x1779ac] <+1052>: cmpl $0x0, 0x15cdf9(%rip)
libcrypto.so.1.1[0x1779b3] <+1059>: jne 0x1776c4 ; <+308>
libcrypto.so.1.1[0x1779b9] <+1065>: jmp 0x1775b1 ; <+33>
libcrypto.so.1.1[0x1779be] <+1070>: leaq -0x8d5(%rip), %rsi ; ___lldb_unnamed_symbol1496$$libcrypto.so.1.1
libcrypto.so.1.1[0x1779c5] <+1077>: leaq 0x15cddc(%rip), %rdi
libcrypto.so.1.1[0x1779cc] <+1084>: callq 0x1e2780 ; CRYPTO_THREAD_run_once
libcrypto.so.1.1[0x1779d1] <+1089>: testl %eax, %eax
libcrypto.so.1.1[0x1779d3] <+1091>: je 0x1775b1 ; <+33>
libcrypto.so.1.1[0x1779d9] <+1097>: cmpl $0x0, 0x15cdc0(%rip)
libcrypto.so.1.1[0x1779e0] <+1104>: jne 0x1776d9 ; <+329>
libcrypto.so.1.1[0x1779e6] <+1110>: jmp 0x1775b1 ; <+33>
libcrypto.so.1.1[0x1779eb] <+1115>: callq 0x1535f0 ; ENGINE_register_all_complete
libcrypto.so.1.1[0x1779f0] <+1120>: jmp 0x1776cd ; <+317>
(lldb) @bartonjs I see a test against 0x80000 ... |
::kermit arms:: Yaaaaaaay! Looks like we won't have a problem on 20.04. Hopefully that's enough to avoid needing to add our own locking/refcounting/whatever to literally every shim method. |
@marcwittke is it possible for you to try on Ubuntu 20.04 or later? We think that will fix it. It is an issue in the libcrypto on 18.04. |
I think so. We have two agents running right now on 18.04. I'll update one of them to 20.04 and let's see. Since it's intermitting, I think in a week I can give you a watermark whether it helped or not. |
I'm not sure we'd arrived at consensus that we wouldn't take a fix here ...18.04 is supported until 2028 and we'll presumably support it in .NET 7. This is also causing our automated tests to crash periodically. I'll leave this open for other customers to comment on the impact. But the recommendation above remains to move to 20.04 ID affected. |
As seen in dotnet/sdk#22872 (comment)
|
The only complete fix we could take would be to run literally every shim function to OpenSSL under the same mutex we use for loading exception strings, to work around applications doing work on background threads after the main thread has exited (because these crashes are only after The biggest offender seems to be SSL_do_handshake; so we /might/ be able to start the game of whack-a-mole by making TLS handshakes mutexed; but I don't think that the networking team would like that. (We could probably change our mutex to a rwlock so we don't utterly kill parallelism with TLS handshakes, but it's still not free) |
I've also not tried working with Canonical to get them to just patch in the support for OPENSSL_NO_ATEXIT. @richlander do you have any contacts there? |
I do. Hey @wiswaud -- can you get us a contact at Canonical who can help us with some OpenSSL issues on Ubuntu 18.04? |
I was given an official account Canonical account to report issues via their tracker. That was quick. @bartonjs Can you write a succinct description of the issue that I can copy/paste into the Canonical tracker? |
@richlander How's this? Bionic's OpenSSL 1.1.1 package (https://launchpad.net/ubuntu/bionic/+source/openssl) is the only version of openssl 1.1.1 on any distro that we've encountered that does not have support for the OPENSSL_NO_ATEXIT functionality from 1.1.1b (openssl/openssl@c2b3db2). The threading model in .NET has the possibility that background threads are still running when We feel that the stability of applications on Ubuntu 18.04 would be improved if the functionality of OPENSSL_NO_ATEXIT was merged into the bionic openssl 1.1.1 package, even if the constant isn't published into the header for the dev package. |
Perfect! Thanks much. |
I have been hitting this crash recently on my main devbox, which is Ubuntu 18.04. However, it started to happen relatively recently, at most a month ago build was stable. So maybe something has changed in the msbuild that makes this occur much more frequently or something like that. |
If you have a good repro, that would be useful. I am taking with Canonical now. |
Unfortunately I don't. It crashes on average once in a day or two when running ./build.sh script. |
That's OK. Let's see if we can get a fix and then maybe deploy some early fixes. |
Simple repro code for testing: using System;
using System.Runtime.InteropServices;
using System.Security.Cryptography;
atexit(AtExitHandler);
byte[] data = new byte[] { 0, 1, 2, 3, 4, 5 };
byte[] hashValue;
using (SHA256 sha256 = SHA256.Create())
{
hashValue = sha256.ComputeHash(data);
}
Console.WriteLine($"hash: {ToHex(hashValue)}");
[DllImport("libc", EntryPoint = "__cxa_atexit", CallingConvention = CallingConvention.Cdecl)]
static extern int atexit(Action a);
static void AtExitHandler()
{
byte[] randomBytes = new byte[16];
RandomNumberGenerator.Fill(randomBytes);
Console.WriteLine($"random: {ToHex(randomBytes)}");
}
static string ToHex(byte[] bytes)
{
return string.Join("", bytes.Select((b) => b.ToString("X2")));
} In case your mangled name of
we should probably do |
We are in the late stages of getting Canonical to publish a fix in Ubuntu 18.04 via their ESM program. I believe the easiest way to access that is via Ubuntu Pro. |
The fix has been released in libssl package version Here are my repro steps to acquire that package: https://gist.github.com/richlander/47333cbf90ee0ee3f51bcb0dbbb3a76f?permalink_comment_id=4676592#gistcomment-4676592 |
Now and then our build agent produces broken builds. The Error message reads:
##[error]Error: The process '/home/agent/agent/_work/_tool/dotnet/dotnet' failed with exit code null
The project is a dotnet core 3.1 web api solution with something like 30 projects, no unmanaged stuff at all.
root cause is a segfault as seen in dmesg
Environment info:
Build agents are equipped with 2vCPU and 2GB memory.
dotnet --info
is not available, as there is no runtime nor SDK installed. We're using the dotnet tool installer during build:I have no idea how to debug this. I'd like to provide more info, but need assistance to do so.
The text was updated successfully, but these errors were encountered: