Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dotnet build intermittently crashes with segfault on Ubuntu 18.04 #48411

Open
marcwittke opened this issue Jan 13, 2021 · 44 comments
Open

dotnet build intermittently crashes with segfault on Ubuntu 18.04 #48411

marcwittke opened this issue Jan 13, 2021 · 44 comments
Labels
area-System.Security tracking-external-issue The issue is caused by external problem (e.g. OS) - nothing we can do to fix it directly
Milestone

Comments

@marcwittke
Copy link

Now and then our build agent produces broken builds. The Error message reads:
##[error]Error: The process '/home/agent/agent/_work/_tool/dotnet/dotnet' failed with exit code null

The project is a dotnet core 3.1 web api solution with something like 30 projects, no unmanaged stuff at all.

root cause is a segfault as seen in dmesg

$ dmesg | grep dotnet

[17426.781072] dotnet[36429]: segfault at 18 ip 00007f9d65e87892 sp 00007f9d5e083bb0 error 4 in libpthread-2.27.so[7f9d65e7b000+1a000]
[1418646.055501] dotnet[36089]: segfault at 18 ip 00007f345cea9892 sp 00007f33b9703eb0 error 4 in libpthread-2.27.so[7f345ce9d000+1a000]
[2246615.917135] dotnet[87465]: segfault at 18 ip 00007fd998396382 sp 00007fd98fd373a0 error 4 in libpthread-2.27.so[7fd99838a000+1a000]
[2362725.938722] dotnet[21158]: segfault at 18 ip 00007fe8ee98a892 sp 00007fe8e637ee00 error 4 in libpthread-2.27.so[7fe8ee97e000+1a000]
[2432991.847286] dotnet[48481]: segfault at 18 ip 00007f7ac18e8892 sp 00007f7a46173b00 error 4 in libpthread-2.27.so[7f7ac18dc000+1a000]
[2704555.425939] dotnet[88757]: segfault at 18 ip 00007fe0bc6bb892 sp 00007fe0b48b4ae0 error 4 in libpthread-2.27.so[7fe0bc6af000+1a000]
[2846996.143322] dotnet[107654]: segfault at 18 ip 00007fad287ea892 sp 00007facad075b00 error 4 in libpthread-2.27.so[7fad287de000+1a000]
[2853616.129105] dotnet[15803]: segfault at 18 ip 00007f72657db892 sp 00007f725d1cfb00 error 4 in libpthread-2.27.so[7f72657cf000+1a000]
[3496394.984178] dotnet[59923]: segfault at 18 ip 00007f5d8ffe7892 sp 00007f5d889e1b00 error 4 in libpthread-2.27.so[7f5d8ffdb000+1a000]
[3630179.291391] dotnet[98248]: segfault at 18 ip 00007f8d8079a892 sp 00007f8d78993e00 error 4 in libpthread-2.27.so[7f8d8078e000+1a000]
[3633549.092183] dotnet[101217]: segfault at 18 ip 00007f617d49a892 sp 00007f60d9ce7e00 error 4 in libpthread-2.27.so[7f617d48e000+1a000]

Environment info:

NAME="Ubuntu"
VERSION="18.04.5 LTS (Bionic Beaver)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 18.04.5 LTS"
VERSION_ID="18.04"
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
VERSION_CODENAME=bionic
UBUNTU_CODENAME=bionic

Build agents are equipped with 2vCPU and 2GB memory.

dotnet --info is not available, as there is no runtime nor SDK installed. We're using the dotnet tool installer during build:

Tool to install: .NET Core sdk version 3.1.x.
Found version 3.1.405 in channel 3.1 for user specified version spec: 3.1.x
Version: 3.1.405 was found in cache.
Creating global tool path and pre-pending to PATH.

I have no idea how to debug this. I'd like to provide more info, but need assistance to do so.

@NecatiMeral
Copy link

I'm experiencing this on a self-hosted Azure DevOps BuildAgent which fails randomly on dotnet commands on .NET5.0:

[4562346.461844] .NET ThreadPool[870598]: segfault at 18 ip 00007f813e20c892 sp 00007f81281e5000 error 4 in libpthread-2.27.so[7f813e200000+1a000]
[4586429.064024] .NET ThreadPool[1032434]: segfault at 18 ip 00007f6a7b94f892 sp 00007f69ca7f8ba0 error 4 in libpthread-2.27.so[7f6a7b943000+1a000]
[4588177.547456] .NET ThreadPool[1063988]: segfault at 18 ip 00007f06d8288892 sp 00007f062cfaf9e0 error 4 in libpthread-2.27.so[7f06d827c000+1a000]

Dotnet get's installed on the agent by using the installer task:

2021-01-26T15:08:21.4116924Z Version 5.0.100 in Kanal "5.0" für benutzerseitig angegebene Versionsspezifikation gefunden: 5.0.100
2021-01-26T15:08:21.5900281Z URL zum Herunterladen von .NET Core sdk, Version 5.0.100 wird abgerufen.
2021-01-26T15:08:21.5937280Z Die Betriebssystemplattform wird ermittelt, um das richtige Downloadpaket für das Betriebssystem zu finden.
2021-01-26T15:08:21.5958925Z [command]/azp/agent/_work/_tasks/UseDotNet_b0ce7256-7898-45d3-9cb5-176b752bfea6/2.169.2/externals/get-os-distro.sh
2021-01-26T15:08:21.5960531Z Primary:linux-x64
2021-01-26T15:08:21.5961709Z Legacy:ubuntu.18.04-x64
2021-01-26T15:08:21.5963010Z Erkannte Plattform (primär): linux-x64
2021-01-26T15:08:21.5964368Z Erkannte Plattform (Legacy): ubuntu.18.04-x64
2021-01-26T15:08:21.5967575Z Version 5.0.100 wurde im Cache gefunden.
2021-01-26T15:08:21.5981248Z Der globale Toolpfad wird erstellt und PATH vorangestellt.

@wli3
Copy link

wli3 commented Feb 2, 2021

dotnet --info is not available, as there is no runtime nor SDK installed. We're using the dotnet tool installer during build:

Is a bit odd. @marcwittke could you run dotnet --info as part of the build after SDK is installed on the build agnet?

@wli3
Copy link

wli3 commented Feb 2, 2021

@vitek-karas does it ring a bell?

@wli3 wli3 removed their assignment Feb 2, 2021
@marcwittke
Copy link
Author

sure:

.NET Core SDK (reflecting any global.json):
 Version:   3.1.405
 Commit:    65f9d75b1c

Runtime Environment:
 OS Name:     ubuntu
 OS Version:  18.04
 OS Platform: Linux
 RID:         ubuntu.18.04-x64
 Base Path:   /home/agent/agent/_work/_tool/dotnet/sdk/3.1.405/

Host (useful for support):
  Version: 3.1.11
  Commit:  f5eceb8105

.NET Core SDKs installed:
  2.1.805 [/home/agent/agent/_work/_tool/dotnet/sdk]
  3.1.100 [/home/agent/agent/_work/_tool/dotnet/sdk]
  3.1.404 [/home/agent/agent/_work/_tool/dotnet/sdk]
  3.1.405 [/home/agent/agent/_work/_tool/dotnet/sdk]

.NET Core runtimes installed:
  Microsoft.AspNetCore.All 2.1.17 [/home/agent/agent/_work/_tool/dotnet/shared/Microsoft.AspNetCore.All]
  Microsoft.AspNetCore.App 2.1.17 [/home/agent/agent/_work/_tool/dotnet/shared/Microsoft.AspNetCore.App]
  Microsoft.AspNetCore.App 3.1.0 [/home/agent/agent/_work/_tool/dotnet/shared/Microsoft.AspNetCore.App]
  Microsoft.AspNetCore.App 3.1.10 [/home/agent/agent/_work/_tool/dotnet/shared/Microsoft.AspNetCore.App]
  Microsoft.AspNetCore.App 3.1.11 [/home/agent/agent/_work/_tool/dotnet/shared/Microsoft.AspNetCore.App]
  Microsoft.NETCore.App 2.1.17 [/home/agent/agent/_work/_tool/dotnet/shared/Microsoft.NETCore.App]
  Microsoft.NETCore.App 3.1.0 [/home/agent/agent/_work/_tool/dotnet/shared/Microsoft.NETCore.App]
  Microsoft.NETCore.App 3.1.10 [/home/agent/agent/_work/_tool/dotnet/shared/Microsoft.NETCore.App]
  Microsoft.NETCore.App 3.1.11 [/home/agent/agent/_work/_tool/dotnet/shared/Microsoft.NETCore.App]

well, a cleanup wouldn't be bad... Is it safe to delete the _tool folder?

@vitek-karas
Copy link
Member

@wli3 Nope - I don't remember anything like this. Maybe @janvorli would know - or at least who to send this to.
Crash dump would be ideal, but I don't know how to get one on linux in an automated job.

@adam230594
Copy link

I'm experiencing this on a self-hosted Azure DevOps BuildAgent which fails randomly on dotnet commands on .net core 3.1 projects

/usr/bin/dotnet build /azp/agent/_work/1/s/src/SFA.DAS.EpaoRegister.UnitTests/SFA.DAS.EpaoRegister.UnitTests.csproj -dl:CentralLogger,"/azp/agent/_work/_tasks/DotNetCoreCLI_5541a522-603c-47ad-91fc-a4b1d163081b/2.181.0/dotnet-build-helpers/Microsoft.TeamFoundation.DistributedTask.MSBuild.Logger.dll"*ForwardingLogger,"/azp/agent/_work/_tasks/DotNetCoreCLI_5541a522-603c-47ad-91fc-a4b1d163081b/2.181.0/dotnet-build-helpers/Microsoft.TeamFoundation.DistributedTask.MSBuild.Logger.dll" --configuration release --no-restore
Microsoft (R) Build Engine version 16.7.2+b60ddb6f4 for .NET
Copyright (C) Microsoft Corporation. All rights reserved.

  SFA.DAS.SharedOuterApi -> /azp/agent/_work/1/s/src/SFA.DAS.SharedOuterApi/bin/release/netcoreapp3.1/SFA.DAS.SharedOuterApi.dll
  SFA.DAS.EpaoRegister -> /azp/agent/_work/1/s/src/SFA.DAS.EpaoRegister/bin/release/netcoreapp3.1/SFA.DAS.EpaoRegister.dll
  SFA.DAS.EpaoRegister.UnitTests -> /azp/agent/_work/1/s/src/SFA.DAS.EpaoRegister.UnitTests/bin/release/netcoreapp3.1/SFA.DAS.EpaoRegister.UnitTests.dll

Build succeeded.
    0 Warning(s)
    0 Error(s)

Time Elapsed 00:00:01.15
/usr/bin/dotnet build /azp/agent/_work/1/s/src/SFA.DAS.EpaoRegister/SFA.DAS.EpaoRegister.csproj -dl:CentralLogger,"/azp/agent/_work/_tasks/DotNetCoreCLI_5541a522-603c-47ad-91fc-a4b1d163081b/2.181.0/dotnet-build-helpers/Microsoft.TeamFoundation.DistributedTask.MSBuild.Logger.dll"*ForwardingLogger,"/azp/agent/_work/_tasks/DotNetCoreCLI_5541a522-603c-47ad-91fc-a4b1d163081b/2.181.0/dotnet-build-helpers/Microsoft.TeamFoundation.DistributedTask.MSBuild.Logger.dll" --configuration release --no-restore
Microsoft (R) Build Engine version 16.7.2+b60ddb6f4 for .NET
Copyright (C) Microsoft Corporation. All rights reserved.

  SFA.DAS.SharedOuterApi -> /azp/agent/_work/1/s/src/SFA.DAS.SharedOuterApi/bin/release/netcoreapp3.1/SFA.DAS.SharedOuterApi.dll
  SFA.DAS.EpaoRegister -> /azp/agent/_work/1/s/src/SFA.DAS.EpaoRegister/bin/release/netcoreapp3.1/SFA.DAS.EpaoRegister.dll

Build succeeded.
    0 Warning(s)
    0 Error(s)

Time Elapsed 00:00:00.78
##[error]Error: The process '/usr/bin/dotnet' failed with exit code null

dotnet --info

root@azure-pipelines-build-agent-75ddfbcc4d-4ntn5:/azp# dotnet --info
.NET Core SDK (reflecting any global.json):
 Version:   3.1.405
 Commit:    3fae16e62e

Runtime Environment:
 OS Name:     ubuntu
 OS Version:  18.04
 OS Platform: Linux
 RID:         ubuntu.18.04-x64
 Base Path:   /usr/share/dotnet/sdk/3.1.405/

Host (useful for support):
  Version: 3.1.11
  Commit:  f5eceb8105

.NET Core SDKs installed:
  2.2.207 [/usr/share/dotnet/sdk]
  3.1.405 [/usr/share/dotnet/sdk]

.NET Core runtimes installed:
  Microsoft.AspNetCore.All 2.2.8 [/usr/share/dotnet/shared/Microsoft.AspNetCore.All]
  Microsoft.AspNetCore.App 2.2.8 [/usr/share/dotnet/shared/Microsoft.AspNetCore.App]
  Microsoft.AspNetCore.App 3.1.11 [/usr/share/dotnet/shared/Microsoft.AspNetCore.App]
  Microsoft.NETCore.App 2.2.8 [/usr/share/dotnet/shared/Microsoft.NETCore.App]
  Microsoft.NETCore.App 3.1.11 [/usr/share/dotnet/shared/Microsoft.NETCore.App]

The build succeeds but since the process is returning with exit code null the build process fails.

@wli3 wli3 transferred this issue from dotnet/sdk Feb 17, 2021
@dotnet-issue-labeler
Copy link

I couldn't figure out the best area label to add to this issue. If you have write-permissions please help me learn by adding exactly one area label.

@dotnet-issue-labeler dotnet-issue-labeler bot added the untriaged New issue has not been triaged by the area owner label Feb 17, 2021
@ghost
Copy link

ghost commented Feb 18, 2021

Tagging subscribers to this area: @vitek-karas, @agocke
See info in area-owners.md if you want to be subscribed.

Issue Details

Now and then our build agent produces broken builds. The Error message reads:
##[error]Error: The process '/home/agent/agent/_work/_tool/dotnet/dotnet' failed with exit code null

The project is a dotnet core 3.1 web api solution with something like 30 projects, no unmanaged stuff at all.

root cause is a segfault as seen in dmesg

$ dmesg | grep dotnet

[17426.781072] dotnet[36429]: segfault at 18 ip 00007f9d65e87892 sp 00007f9d5e083bb0 error 4 in libpthread-2.27.so[7f9d65e7b000+1a000]
[1418646.055501] dotnet[36089]: segfault at 18 ip 00007f345cea9892 sp 00007f33b9703eb0 error 4 in libpthread-2.27.so[7f345ce9d000+1a000]
[2246615.917135] dotnet[87465]: segfault at 18 ip 00007fd998396382 sp 00007fd98fd373a0 error 4 in libpthread-2.27.so[7fd99838a000+1a000]
[2362725.938722] dotnet[21158]: segfault at 18 ip 00007fe8ee98a892 sp 00007fe8e637ee00 error 4 in libpthread-2.27.so[7fe8ee97e000+1a000]
[2432991.847286] dotnet[48481]: segfault at 18 ip 00007f7ac18e8892 sp 00007f7a46173b00 error 4 in libpthread-2.27.so[7f7ac18dc000+1a000]
[2704555.425939] dotnet[88757]: segfault at 18 ip 00007fe0bc6bb892 sp 00007fe0b48b4ae0 error 4 in libpthread-2.27.so[7fe0bc6af000+1a000]
[2846996.143322] dotnet[107654]: segfault at 18 ip 00007fad287ea892 sp 00007facad075b00 error 4 in libpthread-2.27.so[7fad287de000+1a000]
[2853616.129105] dotnet[15803]: segfault at 18 ip 00007f72657db892 sp 00007f725d1cfb00 error 4 in libpthread-2.27.so[7f72657cf000+1a000]
[3496394.984178] dotnet[59923]: segfault at 18 ip 00007f5d8ffe7892 sp 00007f5d889e1b00 error 4 in libpthread-2.27.so[7f5d8ffdb000+1a000]
[3630179.291391] dotnet[98248]: segfault at 18 ip 00007f8d8079a892 sp 00007f8d78993e00 error 4 in libpthread-2.27.so[7f8d8078e000+1a000]
[3633549.092183] dotnet[101217]: segfault at 18 ip 00007f617d49a892 sp 00007f60d9ce7e00 error 4 in libpthread-2.27.so[7f617d48e000+1a000]

Environment info:

NAME="Ubuntu"
VERSION="18.04.5 LTS (Bionic Beaver)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 18.04.5 LTS"
VERSION_ID="18.04"
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
VERSION_CODENAME=bionic
UBUNTU_CODENAME=bionic

Build agents are equipped with 2vCPU and 2GB memory.

dotnet --info is not available, as there is no runtime nor SDK installed. We're using the dotnet tool installer during build:

Tool to install: .NET Core sdk version 3.1.x.
Found version 3.1.405 in channel 3.1 for user specified version spec: 3.1.x
Version: 3.1.405 was found in cache.
Creating global tool path and pre-pending to PATH.

I have no idea how to debug this. I'd like to provide more info, but need assistance to do so.

Author: marcwittke
Assignees: -
Labels:

area-Host, untriaged

Milestone: -

@BrennanConroy
Copy link
Member

@janvorli
Copy link
Member

I'll take a look at the dumps.

@janvorli
Copy link
Member

@BrennanConroy what is the distro that the dumps came from?

@BrennanConroy
Copy link
Member

BrennanConroy commented Mar 24, 2021

Helix queue ubuntu.1804.amd64.open

For the first link:
Runtime 6.0.0-preview.3.21167.1
Sdk 6.0.100-preview.3.21168.19

@janvorli
Copy link
Member

What I can see in the dump is that the main thread has already exited and the crashing secondary thread is attempting to run some OpenSSL code and a lock address inside of libcrypto passed to CRYPTO_THREAD_write_lock is set to NULL. This sounds like the same issue as #34231. Only that this time, it doesn't stem from the ERR_reason_error_string like in that issue, but from the following:

(lldb) clrstack -f
OS Thread Id: 0x25be (1)
        Child SP               IP Call Site
00007FA9E95108C0 00007FA9F1249892 libpthread.so.0!__pthread_rwlock_wrlock + 18
00007FA9E9510900 00007FA975A91989 libcrypto.so.1.1!CRYPTO_THREAD_write_lock + 9
00007FA9E9510910 00007FA975A53013 libcrypto.so.1.1!RAND_get_rand_method + 51
00007FA9E9510930 00007FA975A5333E libcrypto.so.1.1!RAND_priv_bytes + 14
00007FA9E9510950 00007FA9759759BD libcrypto.so.1.1!___lldb_unnamed_symbol375$$libcrypto.so.1.1 + 413
00007FA9E95109C0 00007FA975975B96 libcrypto.so.1.1!___lldb_unnamed_symbol376$$libcrypto.so.1.1 + 166
00007FA9E9510A10 00007FA975A0095B libcrypto.so.1.1!___lldb_unnamed_symbol984$$libcrypto.so.1.1 + 91
00007FA9E9510A50 00007FA9759BF41A libcrypto.so.1.1!___lldb_unnamed_symbol795$$libcrypto.so.1.1 + 906
00007FA9E9510AC0 00007FA9759BFD5D libcrypto.so.1.1!___lldb_unnamed_symbol796$$libcrypto.so.1.1 + 1229
00007FA9E9510BA0 00007FA9759BEDA4 libcrypto.so.1.1!EC_POINTs_mul + 324
00007FA9E9510C00 00007FA9759BEE10 libcrypto.so.1.1!EC_POINT_mul + 64
00007FA9E9510C40 00007FA9759C24DF libcrypto.so.1.1!___lldb_unnamed_symbol811$$libcrypto.so.1.1 + 175
00007FA9E9510CA0 00007FA9759BCD49 libcrypto.so.1.1!ECDH_compute_key + 89
00007FA9E9510D00 00007FA9759C18BC libcrypto.so.1.1!___lldb_unnamed_symbol802$$libcrypto.so.1.1 + 76
00007FA9E9510D20 00007FA9759C1A35 libcrypto.so.1.1!___lldb_unnamed_symbol803$$libcrypto.so.1.1 + 245
00007FA9E9510D80 00007FA975DA9317 libssl.so.1.1!___lldb_unnamed_symbol195$$libssl.so.1.1 + 343
00007FA9E9510DC0 00007FA975DCB304 libssl.so.1.1!___lldb_unnamed_symbol509$$libssl.so.1.1 + 1028
00007FA9E9510E10 00007FA975DC9157 libssl.so.1.1!___lldb_unnamed_symbol488$$libssl.so.1.1 + 1383
00007FA9E9510EE0 00007FA975DB54C4 libssl.so.1.1!SSL_do_handshake + 84
00007FA9E9510EE0 00007FA975DB54C4 libssl.so.1.1!SSL_do_handshake + 84
00007FA9E9510F20 00007FA97A6BB20E
00007FA9E9510F30                  [InlinedCallFrame: 00007fa9e9510f30] System.Net.Security.dll!Interop+Ssl.SslDoHandshake(Microsoft.Win32.SafeHandles.SafeSslHandle)
00007FA9E9510F30                  [InlinedCallFrame: 00007fa9e9510f30] System.Net.Security.dll!Interop+Ssl.SslDoHandshake(Microsoft.Win32.SafeHandles.SafeSslHandle)
00007FA9E9510F20 00007FA97A6BB20E System.Diagnostics.Process.dll!ILStubClass.IL_STUB_PInvoke(Microsoft.Win32.SafeHandles.SafeSslHandle) + 142
00007FA9E9510FC0 00007FA978D39EF2 System.Net.Security.dll!Interop+OpenSsl.DoSslHandshake(Microsoft.Win32.SafeHandles.SafeSslHandle, System.ReadOnlySpan`1<Byte>, Byte[] ByRef, Int32 ByRef) + 130
00007FA9E9511020 00007FA978D39168 System.Net.Security.dll!System.Net.Security.SslStreamPal.HandshakeInternal(System.Net.Security.SafeFreeCredentials, System.Net.Security.SafeDeleteSslContext ByRef, System.ReadOnlySpan`1<Byte>, Byte[] ByRef, System.Net.Security.SslAuthenticationOptions) + 168
00007FA9E95110D0 00007FA978D3791A System.Net.Security.dll!System.Net.Security.SecureChannel.GenerateToken(System.ReadOnlySpan`1<Byte>, Byte[] ByRef) + 138
00007FA9E9511140 00007FA978D3770E System.Net.Security.dll!System.Net.Security.SecureChannel.NextMessage(System.ReadOnlySpan`1<Byte>) + 62
00007FA9E9511190 00007FA978D3ABA7 System.Net.Security.dll!System.Net.Security.SslStream.ProcessBlob(Int32) + 327
00007FA9E9511200 00007FA978D63E66 System.Net.Security.dll!System.Net.Security.SslStream+<ReceiveBlobAsync>d__172`1[[System.Net.Security.AsyncReadWriteAdapter, System.Net.Security]].MoveNext() + 2230
00007FA9E95113D0 00007FA97A6BF2C0 System.Private.CoreLib.dll!System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1+AsyncStateMachineBox`1[[System.__Canon, System.Private.CoreLib],[System.Net.Security.SslStream+<ReceiveBlobAsync>d__172`1[[System.Net.Security.AsyncReadWriteAdapter, System.Net.Security]], System.Net.Security]].ExecutionContextCallback(System.Object) + 128 [/_/src/libraries/System.Private.CoreLib/src/System/Runtime/CompilerServices/AsyncTaskMethodBuilderT.cs @ 287]
00007FA9E9511410 00007FA97A6D2DF5 System.Private.CoreLib.dll!System.Threading.ExecutionContext.RunInternal(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object) + 149 [/_/src/libraries/System.Private.CoreLib/src/System/Threading/ExecutionContext.cs @ 208]
00007FA9E9511460 00007FA97A6BF0E0 System.Private.CoreLib.dll!System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1+AsyncStateMachineBox`1[[System.__Canon, System.Private.CoreLib],[System.Net.Security.SslStream+<ReceiveBlobAsync>d__172`1[[System.Net.Security.AsyncReadWriteAdapter, System.Net.Security]], System.Net.Security]].MoveNext(System.Threading.Thread) + 288 [/_/src/libraries/System.Private.CoreLib/src/System/Runtime/CompilerServices/AsyncTaskMethodBuilderT.cs @ 336]
00007FA9E95114E0 00007FA97A6BEF99 System.Private.CoreLib.dll!System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1+AsyncStateMachineBox`1[[System.__Canon, System.Private.CoreLib],[System.Net.Security.SslStream+<ReceiveBlobAsync>d__172`1[[System.Net.Security.AsyncReadWriteAdapter, System.Net.Security]], System.Net.Security]].MoveNext() + 25 [/_/src/libraries/System.Private.CoreLib/src/System/Runtime/CompilerServices/AsyncTaskMethodBuilderT.cs @ 302]
00007FA9E9511500 00007FA97A6D2FC6 System.Private.CoreLib.dll!System.Threading.Tasks.AwaitTaskContinuation.RunOrScheduleAction(System.Runtime.CompilerServices.IAsyncStateMachineBox, Boolean) + 214 [/_/src/libraries/System.Private.CoreLib/src/System/Threading/Tasks/TaskContinuation.cs @ 805]
00007FA9E9511540 00007FA97A6D2554 System.Private.CoreLib.dll!System.Threading.Tasks.Task.RunContinuations(System.Object) + 212 [/_/src/libraries/System.Private.CoreLib/src/System/Threading/Tasks/Task.cs @ 3472]
00007FA9E95115F0 00007FA9763E4970 System.Private.CoreLib.dll!System.Threading.Tasks.Task`1[[System.Int32, System.Private.CoreLib]].TrySetResult(Int32) + 144 [/_/src/libraries/System.Private.CoreLib/src/System/Threading/Tasks/Future.cs @ 404]
00007FA9E9511620 00007FA9763E8BB6 System.Private.CoreLib.dll!System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1[[System.Int32, System.Private.CoreLib]].SetExistingTaskResult(System.Threading.Tasks.Task`1<Int32>, Int32) + 86 [/_/src/libraries/System.Private.CoreLib/src/System/Runtime/CompilerServices/AsyncTaskMethodBuilderT.cs @ 443]
00007FA9E9511650 00007FA9763E8D24 System.Private.CoreLib.dll!System.Runtime.CompilerServices.AsyncValueTaskMethodBuilder`1[[System.Int32, System.Private.CoreLib]].SetResult(Int32) + 116 [/_/src/libraries/System.Private.CoreLib/src/System/Runtime/CompilerServices/AsyncValueTaskMethodBuilderT.cs @ 67]
00007FA9E9511680 00007FA978D68248 System.Net.Security.dll!System.Net.Security.SslStream+<<FillHandshakeBufferAsync>g__InternalFillHandshakeBufferAsync|181_0>d`1[[System.Net.Security.AsyncReadWriteAdapter, System.Net.Security]].MoveNext() + 488
00007FA9E9511730 00007FA97A6BEF5E System.Private.CoreLib.dll!System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1+AsyncStateMachineBox`1[[System.Int32, System.Private.CoreLib],[System.Net.Security.SslStream+<<FillHandshakeBufferAsync>g__InternalFillHandshakeBufferAsync|181_0>d`1[[System.Net.Security.AsyncReadWriteAdapter, System.Net.Security]], System.Net.Security]].ExecutionContextCallback(System.Object) + 62 [/_/src/libraries/System.Private.CoreLib/src/System/Runtime/CompilerServices/AsyncTaskMethodBuilderT.cs @ 287]
00007FA9E9511750 00007FA97A6D2DF5 System.Private.CoreLib.dll!System.Threading.ExecutionContext.RunInternal(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object) + 149 [/_/src/libraries/System.Private.CoreLib/src/System/Threading/ExecutionContext.cs @ 208]
00007FA9E95117A0 00007FA97A6BEE29 System.Private.CoreLib.dll!System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1+AsyncStateMachineBox`1[[System.Int32, System.Private.CoreLib],[System.Net.Security.SslStream+<<FillHandshakeBufferAsync>g__InternalFillHandshakeBufferAsync|181_0>d`1[[System.Net.Security.AsyncReadWriteAdapter, System.Net.Security]], System.Net.Security]].MoveNext(System.Threading.Thread) + 217 [/_/src/libraries/System.Private.CoreLib/src/System/Runtime/CompilerServices/AsyncTaskMethodBuilderT.cs @ 336]
00007FA9E95117F0 00007FA97A6BED29 System.Private.CoreLib.dll!System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1+AsyncStateMachineBox`1[[System.Int32, System.Private.CoreLib],[System.Net.Security.SslStream+<<FillHandshakeBufferAsync>g__InternalFillHandshakeBufferAsync|181_0>d`1[[System.Net.Security.AsyncReadWriteAdapter, System.Net.Security]], System.Net.Security]].MoveNext() + 25 [/_/src/libraries/System.Private.CoreLib/src/System/Runtime/CompilerServices/AsyncTaskMethodBuilderT.cs @ 302]
00007FA9E9511810 00007FA9762BA852 System.Private.CoreLib.dll!System.Threading.ThreadPool+<>c.<.cctor>b__82_0(System.Object) + 34 [/_/src/libraries/System.Private.CoreLib/src/System/Threading/ThreadPoolWorkQueue.cs @ 1055]
00007FA9E9511820 00007FA97A943A29 System.Net.Sockets.dll!System.Net.Sockets.Socket+AwaitableSocketAsyncEventArgs.InvokeContinuation(System.Action`1<System.Object>, System.Object, Boolean, Boolean) + 361
00007FA9E9511870 00007FA97A9437E3 System.Net.Sockets.dll!System.Net.Sockets.Socket+AwaitableSocketAsyncEventArgs.OnCompleted(System.Net.Sockets.SocketAsyncEventArgs) + 179
00007FA9E95118D0 00007FA97A95A6C3 System.Net.Sockets.dll!System.Net.Sockets.SocketAsyncEventArgs.OnCompletedInternal() + 83
00007FA9E95118F0 00007FA97A9446BE System.Net.Sockets.dll!System.Net.Sockets.SocketAsyncEventArgs.FinishOperationAsyncSuccess(Int32, System.Net.Sockets.SocketFlags) + 46
00007FA9E9511910 00007FA97A945BB6 System.Net.Sockets.dll!System.Net.Sockets.SocketAsyncEventArgs.TransferCompletionCallbackCore(Int32, Byte[], Int32, System.Net.Sockets.SocketFlags, System.Net.Sockets.SocketError) + 54
00007FA9E9511940 00007FA97A945AF4 System.Net.Sockets.dll!System.Net.Sockets.SocketAsyncContext+BufferMemoryReceiveOperation.InvokeCallback(Boolean) + 132
00007FA9E9511990 00007FA97A964B2B System.Net.Sockets.dll!System.Net.Sockets.SocketAsyncContext+OperationQueue`1[[System.__Canon, System.Private.CoreLib]].ProcessAsyncOperation(System.__Canon) + 91
00007FA9E95119C0 00007FA97A945917 System.Net.Sockets.dll!System.Net.Sockets.SocketAsyncContext+ReadOperation.System.Threading.IThreadPoolWorkItem.Execute() + 39
00007FA9E95119D0 00007FA97A944588 System.Net.Sockets.dll!System.Net.Sockets.SocketAsyncContext.HandleEvents(SocketEvents) + 120
00007FA9E9511A00 00007FA97A9444B1 System.Net.Sockets.dll!System.Net.Sockets.SocketAsyncEngine.System.Threading.IThreadPoolWorkItem.Execute() + 129
00007FA9E9511A40 00007FA97A6D6EAC System.Private.CoreLib.dll!System.Threading.ThreadPoolWorkQueue.Dispatch() + 364 [/_/src/libraries/System.Private.CoreLib/src/System/Threading/ThreadPoolWorkQueue.cs @ 769]
00007FA9E9511AC0 00007FA9762CF8C8 System.Private.CoreLib.dll!System.Threading.PortableThreadPool+WorkerThread.WorkerThreadStart() + 264 [/_/src/libraries/System.Private.CoreLib/src/System/Threading/PortableThreadPool.WorkerThread.cs @ 58]
00007FA9E9511B80 00007FA9762B6028 System.Private.CoreLib.dll!System.Threading.Thread.StartCallback() + 104 [/_/src/coreclr/System.Private.CoreLib/src/System/Threading/Thread.CoreCLR.cs @ 105]
00007FA9E9511BA0 00007FA9EFF60487 libcoreclr.so!___lldb_unnamed_symbol9589$$libcoreclr.so + 124
00007FA9E9511BC0 00007FA9EFDBF1CE libcoreclr.so!___lldb_unnamed_symbol4452$$libcoreclr.so + 254
00007FA9E9511C50 00007FA9EFDD0372 libcoreclr.so!___lldb_unnamed_symbol4638$$libcoreclr.so + 146
00007FA9E9511CA0 00007FA9EFD8680A libcoreclr.so!___lldb_unnamed_symbol3792$$libcoreclr.so + 330
00007FA9E9511CF0                  [DebuggerU2MCatchHandlerFrame: 00007fa9e9511cf0]
00007FA9E9511DC0 00007FA9EFD86E0D libcoreclr.so!___lldb_unnamed_symbol3793$$libcoreclr.so + 45
00007FA9E9511DF0 00007FA9EFDD044C libcoreclr.so!___lldb_unnamed_symbol4639$$libcoreclr.so + 188
00007FA9E9511E50 00007FA9F00F3B0E libcoreclr.so!___lldb_unnamed_symbol15450$$libcoreclr.so + 590
00007FA9E9511F00 00007FA9F12446DB libpthread.so.0!start_thread + 219
00007FA9E9511FC0 00007FA9F042A71F libc.so.6!__clone + 63

cc: @bartonjs

@bartonjs
Copy link
Member

Given that Ubuntu 18.04 has explicitly removed support for NO_ATEXIT, I worry we'll end up just finding one intermittent problem after another. The previous fix assumed that everything other than the string table was graceful about post-exit calls, but apparently calls into the RNG hit a failure while trying to reinitialize it.

Feels like our choices are:

  • Change the runtime so that it suspends all threads before calling exit
  • Change the shim to guard every function with an if-shutting-down-exit without locking (still has race conditions)
  • Change the shim to guard every function with an if-shutting-down-exit while using something like interlocked increment/decrement to notify the atexit handler that we can release the library for further shutdown
  • Accept that we'll get occasional calls like this on Ubuntu, especially from applications that use background threads.

@danmoseley
Copy link
Member

Is it feasible/useful to offer a change to OpenSSL?

Although perhaps this is a problem others might have to solve when interopping with a different native library that has similar expectations.

@bartonjs
Copy link
Member

Is it feasible/useful to offer a change to OpenSSL?

OpenSSL supports the scenario, and we opt into it (OPENSSL_INIT_NO_ATEXIT):

static int32_t EnsureOpenSsl11Initialized()
{
// In OpenSSL 1.0 we call OPENSSL_add_all_algorithms_conf() and ERR_load_crypto_strings(),
// so do the same for 1.1
OPENSSL_init_ssl(
// OPENSSL_add_all_algorithms_conf
OPENSSL_INIT_ADD_ALL_CIPHERS |
OPENSSL_INIT_ADD_ALL_DIGESTS |
OPENSSL_INIT_LOAD_CONFIG |
// Do not unload on process exit, as the CLR may still have threads running
OPENSSL_INIT_NO_ATEXIT |
// ERR_load_crypto_strings
OPENSSL_INIT_LOAD_CRYPTO_STRINGS |
OPENSSL_INIT_LOAD_SSL_STRINGS,
NULL);

The Ubuntu 18.04 build.... somewhere that I found before that I didn't write down and am having trouble finding again... explicitly removes support for that option.

@danmoseley
Copy link
Member

Ah got it. And later versions - 20.04 etc?

@danmoseley
Copy link
Member

Change the shim to guard every function with an if-shutting-down-exit while using something like interlocked increment/decrement to notify the atexit handler that we can release the library for further shutdown

This seems like the only reasonable possibility. It seems like the next critical thing to know is whether this also affects 20.04+. That would make it more important to fix since presumably 20.04 or later is an option for many 18.04 customers.

@bartonjs we know how to find that out? Here's what I have on my 20.04 machine with apt-get upgrade run:

dan@LAPTOP-P6UJDVTA:/usr$ file ./lib/x86_64-linux-gnu/libcrypto.so.1.1
./lib/x86_64-linux-gnu/libcrypto.so.1.1: ELF 64-bit LSB shared object, x86-64, version 1 (SYSV), dynamically linked, BuildID[sha1]=d30abd770d1215fff0f9a0fa9f12b1de5b50da29, stripped
dan@LAPTOP-P6UJDVTA:/usr$ file ./lib/x86_64-linux-gnu/libssl.so.1.1
./lib/x86_64-linux-gnu/libssl.so.1.1: ELF 64-bit LSB shared object, x86-64, version 1 (SYSV), dynamically linked, BuildID[sha1]=4ef02cf97dd73cb0a88495e6dbf584dd6aa5aa22, stripped

From the above info I'm unsure how to determine.

@danmoseley
Copy link
Member

Those SHA's aren't in the OpenSSL repo and it's not clear where in https://launchpad.net/ubuntu to find the sources Ubuntu used.

Anyway I don't know what to look for.

@bartonjs
Copy link
Member

https://packages.ubuntu.com/source/focal/openssl says that Focal is based on OpenSSL 1.1.1f (plus servicing patches), and in 1.1.1f the source looked like

https://github.com/openssl/openssl/blob/36eadf1f84daa965041cce410b4ff32cbda4ef08/crypto/init.c#L620-L656

So if you do something like

$ lldb /usr/lib/x86_64-linux-gnu/libcrypto.so.1.1
(lldb) target create "/usr/lib/x86_64-linux-gnu/libcrypto.so.1.1"
Current executable set to '/usr/lib/x86_64-linux-gnu/libcrypto.so.1.1' (x86_64).
(lldb) dis -n OPENSSL_init_crypto
libcrypto.so.1.1`OPENSSL_init_crypto:
libcrypto.so.1.1[0x176bc0] <+0>:    pushq  %rbp
libcrypto.so.1.1[0x176bc1] <+1>:    pushq  %rbx
libcrypto.so.1.1[0x176bc2] <+2>:    movq   %rdi, %rbx
libcrypto.so.1.1[0x176bc5] <+5>:    subq   $0x8, %rsp
libcrypto.so.1.1[0x176bc9] <+9>:    movl   0x352c59(%rip), %eax
libcrypto.so.1.1[0x176bcf] <+15>:   testl  %eax, %eax
libcrypto.so.1.1[0x176bd1] <+17>:   je     0x176c18                  ; <+88>
libcrypto.so.1.1[0x176bd3] <+19>:   testl  $0x40000, %edi            ; imm = 0x40000
libcrypto.so.1.1[0x176bd9] <+25>:   je     0x176bf0                  ; <+48>
libcrypto.so.1.1[0x176bdb] <+27>:   xorl   %ebp, %ebp
libcrypto.so.1.1[0x176bdd] <+29>:   addq   $0x8, %rsp
libcrypto.so.1.1[0x176be1] <+33>:   movl   %ebp, %eax
libcrypto.so.1.1[0x176be3] <+35>:   popq   %rbx
libcrypto.so.1.1[0x176be4] <+36>:   popq   %rbp
libcrypto.so.1.1[0x176be5] <+37>:   retq
libcrypto.so.1.1[0x176be6] <+38>:   nopw   %cs:(%rax,%rax)
libcrypto.so.1.1[0x176bf0] <+48>:   leaq   0xc23cc(%rip), %rcx
libcrypto.so.1.1[0x176bf7] <+55>:   movl   $0x252, %r8d              ; imm = 0x252
libcrypto.so.1.1[0x176bfd] <+61>:   movl   $0x46, %edx
libcrypto.so.1.1[0x176c02] <+66>:   movl   $0x74, %esi
libcrypto.so.1.1[0x176c07] <+71>:   movl   $0xf, %edi
libcrypto.so.1.1[0x176c0c] <+76>:   callq  0x1580e0                  ; ERR_put_error
libcrypto.so.1.1[0x176c11] <+81>:   jmp    0x176bdb                  ; <+27>
libcrypto.so.1.1[0x176c13] <+83>:   nopl   (%rax,%rax)
libcrypto.so.1.1[0x176c18] <+88>:   movq   %rsi, %rbp
libcrypto.so.1.1[0x176c1b] <+91>:   leaq   0x352bee(%rip), %rdi
libcrypto.so.1.1[0x176c22] <+98>:   leaq   -0x2e9(%rip), %rsi        ; ___lldb_unnamed_symbol1395$$libcrypto.so.1.1
libcrypto.so.1.1[0x176c29] <+105>:  callq  0x1df9f0                  ; CRYPTO_THREAD_run_once
libcrypto.so.1.1[0x176c2e] <+110>:  testl  %eax, %eax
libcrypto.so.1.1[0x176c30] <+112>:  je     0x176bdb                  ; <+27>
libcrypto.so.1.1[0x176c32] <+114>:  movl   0x352bd0(%rip), %eax
libcrypto.so.1.1[0x176c38] <+120>:  testl  %eax, %eax
libcrypto.so.1.1[0x176c3a] <+122>:  je     0x176bdb                  ; <+27>
libcrypto.so.1.1[0x176c3c] <+124>:  testl  $0x40000, %ebx            ; imm = 0x40000
libcrypto.so.1.1[0x176c42] <+130>:  je     0x176d70                  ; <+432>
libcrypto.so.1.1[0x176c48] <+136>:  testb  $0x1, %bl
libcrypto.so.1.1[0x176c4b] <+139>:  jne    0x176d10                  ; <+336>
libcrypto.so.1.1[0x176c51] <+145>:  testb  $0x2, %bl
libcrypto.so.1.1[0x176c54] <+148>:  jne    0x176d40                  ; <+384>
libcrypto.so.1.1[0x176c5a] <+154>:  testb  $0x10, %bl
libcrypto.so.1.1[0x176c5d] <+157>:  jne    0x176da0                  ; <+480>
libcrypto.so.1.1[0x176c63] <+163>:  testb  $0x4, %bl
libcrypto.so.1.1[0x176c66] <+166>:  jne    0x176dd0                  ; <+528>
libcrypto.so.1.1[0x176c6c] <+172>:  testb  $0x20, %bl
libcrypto.so.1.1[0x176c6f] <+175>:  jne    0x176e00                  ; <+576>
libcrypto.so.1.1[0x176c75] <+181>:  testb  $0x8, %bl
libcrypto.so.1.1[0x176c78] <+184>:  jne    0x176e2e                  ; <+622>
libcrypto.so.1.1[0x176c7e] <+190>:  testl  $0x20000, %ebx            ; imm = 0x20000

(from Ubuntu 18.04)

hopefully there'll be something that looks like it's doing a test for 0x80000. If so, the problem is just gone on 20.04.

I've previously said that Ubuntu "removed" the support. Looking again, I don't see a patch that removes the support... but I also don't see one that adds it. The OPENSSL_INIT_NO_ATEXIT support was backported for OpenSSL 1.1.1b. It looks like Ubuntu 18.04 is 1.1.1 (RTM) plus servicing, and their servicing did something other than "catch up to 1.1.1-stable".

@danmoseley
Copy link
Member

danmoseley commented Apr 29, 2021

Ubuntu 20.04 output
dan@LAPTOP-P6UJDVTA:/usr$ cat /etc/os-release | grep VERSION
VERSION="20.04.2 LTS (Focal Fossa)"
VERSION_ID="20.04"
VERSION_CODENAME=focal
(lldb) target create "/usr/lib/x86_64-linux-gnu/libcrypto.so.1.1"
Current executable set to '/usr/lib/x86_64-linux-gnu/libcrypto.so.1.1' (x86_64).
(lldb) dis -n OPENSSL_init_crypto
libcrypto.so.1.1`OPENSSL_init_crypto:
libcrypto.so.1.1[0x177590] <+0>:    endbr64
libcrypto.so.1.1[0x177594] <+4>:    movl   0x15d296(%rip), %eax
libcrypto.so.1.1[0x17759a] <+10>:   pushq  %r12
libcrypto.so.1.1[0x17759c] <+12>:   pushq  %rbp
libcrypto.so.1.1[0x17759d] <+13>:   pushq  %rbx
libcrypto.so.1.1[0x17759e] <+14>:   movq   %rdi, %rbx
libcrypto.so.1.1[0x1775a1] <+17>:   testl  %eax, %eax
libcrypto.so.1.1[0x1775a3] <+19>:   je     0x1775c0                  ; <+48>
libcrypto.so.1.1[0x1775a5] <+21>:   testl  $0x40000, %edi            ; imm = 0x40000
libcrypto.so.1.1[0x1775ab] <+27>:   je     0x1776e8                  ; <+344>
libcrypto.so.1.1[0x1775b1] <+33>:   xorl   %r12d, %r12d
libcrypto.so.1.1[0x1775b4] <+36>:   movl   %r12d, %eax
libcrypto.so.1.1[0x1775b7] <+39>:   popq   %rbx
libcrypto.so.1.1[0x1775b8] <+40>:   popq   %rbp
libcrypto.so.1.1[0x1775b9] <+41>:   popq   %r12
libcrypto.so.1.1[0x1775bb] <+43>:   retq
libcrypto.so.1.1[0x1775bc] <+44>:   nopl   (%rax)
libcrypto.so.1.1[0x1775c0] <+48>:   movq   %rsi, %rbp
libcrypto.so.1.1[0x1775c3] <+51>:   leaq   0x15d24e(%rip), %rdi
libcrypto.so.1.1[0x1775ca] <+58>:   leaq   -0x2d1(%rip), %rsi        ; ___lldb_unnamed_symbol1509$$libcrypto.so.1.1
libcrypto.so.1.1[0x1775d1] <+65>:   callq  0x1e2780                  ; CRYPTO_THREAD_run_once
libcrypto.so.1.1[0x1775d6] <+70>:   testl  %eax, %eax
libcrypto.so.1.1[0x1775d8] <+72>:   je     0x1775b1                  ; <+33>
libcrypto.so.1.1[0x1775da] <+74>:   movl   0x15d230(%rip), %eax
libcrypto.so.1.1[0x1775e0] <+80>:   testl  %eax, %eax
libcrypto.so.1.1[0x1775e2] <+82>:   je     0x1775b1                  ; <+33>
libcrypto.so.1.1[0x1775e4] <+84>:   movl   $0x1, %r12d
libcrypto.so.1.1[0x1775ea] <+90>:   testl  $0x40000, %ebx            ; imm = 0x40000
libcrypto.so.1.1[0x1775f0] <+96>:   jne    0x1775b4                  ; <+36>
libcrypto.so.1.1[0x1775f2] <+98>:   testl  $0x80000, %ebx            ; imm = 0x80000
libcrypto.so.1.1[0x1775f8] <+104>:  je     0x177718                  ; <+392>
libcrypto.so.1.1[0x1775fe] <+110>:  leaq   -0x565(%rip), %rsi        ; ___lldb_unnamed_symbol1491$$libcrypto.so.1.1
libcrypto.so.1.1[0x177605] <+117>:  leaq   0x15d200(%rip), %rdi
libcrypto.so.1.1[0x17760c] <+124>:  callq  0x1e2780                  ; CRYPTO_THREAD_run_once
libcrypto.so.1.1[0x177611] <+129>:  testl  %eax, %eax
libcrypto.so.1.1[0x177613] <+131>:  je     0x1775b1                  ; <+33>
libcrypto.so.1.1[0x177615] <+133>:  movl   0x15d1ec(%rip), %r12d
libcrypto.so.1.1[0x17761c] <+140>:  testl  %r12d, %r12d
libcrypto.so.1.1[0x17761f] <+143>:  je     0x1775b1                  ; <+33>
libcrypto.so.1.1[0x177621] <+145>:  leaq   -0x578(%rip), %rsi        ; ___lldb_unnamed_symbol1492$$libcrypto.so.1.1
libcrypto.so.1.1[0x177628] <+152>:  leaq   0x15d1d5(%rip), %rdi
libcrypto.so.1.1[0x17762f] <+159>:  callq  0x1e2780                  ; CRYPTO_THREAD_run_once
libcrypto.so.1.1[0x177634] <+164>:  testl  %eax, %eax
libcrypto.so.1.1[0x177636] <+166>:  je     0x1775b1                  ; <+33>
libcrypto.so.1.1[0x17763c] <+172>:  movl   0x15d1bd(%rip), %r11d
libcrypto.so.1.1[0x177643] <+179>:  testl  %r11d, %r11d
libcrypto.so.1.1[0x177646] <+182>:  je     0x1775b1                  ; <+33>
libcrypto.so.1.1[0x17764c] <+188>:  testb  $0x1, %bl
libcrypto.so.1.1[0x17764f] <+191>:  jne    0x177738                  ; <+424>
libcrypto.so.1.1[0x177655] <+197>:  testb  $0x2, %bl
libcrypto.so.1.1[0x177658] <+200>:  jne    0x177768                  ; <+472>
libcrypto.so.1.1[0x17765e] <+206>:  testb  $0x10, %bl
libcrypto.so.1.1[0x177661] <+209>:  jne    0x177798                  ; <+520>
libcrypto.so.1.1[0x177667] <+215>:  testb  $0x4, %bl
libcrypto.so.1.1[0x17766a] <+218>:  jne    0x1777c8                  ; <+568>
libcrypto.so.1.1[0x177670] <+224>:  testb  $0x20, %bl
libcrypto.so.1.1[0x177673] <+227>:  jne    0x1777f6                  ; <+614>
libcrypto.so.1.1[0x177679] <+233>:  testb  $0x8, %bl
libcrypto.so.1.1[0x17767c] <+236>:  jne    0x177824                  ; <+660>
libcrypto.so.1.1[0x177682] <+242>:  testl  $0x20000, %ebx            ; imm = 0x20000
libcrypto.so.1.1[0x177688] <+248>:  jne    0x177852                  ; <+706>
libcrypto.so.1.1[0x17768e] <+254>:  testb  $-0x80, %bl
libcrypto.so.1.1[0x177691] <+257>:  jne    0x177864                  ; <+724>
libcrypto.so.1.1[0x177697] <+263>:  testb  $0x40, %bl
libcrypto.so.1.1[0x17769a] <+266>:  jne    0x177892                  ; <+770>
libcrypto.so.1.1[0x1776a0] <+272>:  testb  $0x1, %bh
libcrypto.so.1.1[0x1776a3] <+275>:  jne    0x1778df                  ; <+847>
libcrypto.so.1.1[0x1776a9] <+281>:  testb  $0x8, %bh
libcrypto.so.1.1[0x1776ac] <+284>:  jne    0x17790d                  ; <+893>
libcrypto.so.1.1[0x1776b2] <+290>:  testb  $0x2, %bh
libcrypto.so.1.1[0x1776b5] <+293>:  jne    0x17793a                  ; <+938>
libcrypto.so.1.1[0x1776bb] <+299>:  testb  $0x4, %bh
libcrypto.so.1.1[0x1776be] <+302>:  jne    0x177991                  ; <+1025>
libcrypto.so.1.1[0x1776c4] <+308>:  testb  $-0x2, %bh
libcrypto.so.1.1[0x1776c7] <+311>:  jne    0x1779eb                  ; <+1115>
libcrypto.so.1.1[0x1776cd] <+317>:  testl  $0x10000, %ebx            ; imm = 0x10000
libcrypto.so.1.1[0x1776d3] <+323>:  jne    0x1779be                  ; <+1070>
libcrypto.so.1.1[0x1776d9] <+329>:  movl   $0x1, %r12d
libcrypto.so.1.1[0x1776df] <+335>:  jmp    0x1775b4                  ; <+36>
libcrypto.so.1.1[0x1776e4] <+340>:  nopl   (%rax)
libcrypto.so.1.1[0x1776e8] <+344>:  xorl   %r12d, %r12d
libcrypto.so.1.1[0x1776eb] <+347>:  movl   $0x270, %r8d              ; imm = 0x270
libcrypto.so.1.1[0x1776f1] <+353>:  movl   $0x46, %edx
libcrypto.so.1.1[0x1776f6] <+358>:  movl   $0x74, %esi
libcrypto.so.1.1[0x1776fb] <+363>:  leaq   0xc86d1(%rip), %rcx
libcrypto.so.1.1[0x177702] <+370>:  movl   $0xf, %edi
libcrypto.so.1.1[0x177707] <+375>:  callq  0x157990                  ; ERR_put_error
libcrypto.so.1.1[0x17770c] <+380>:  movl   %r12d, %eax
libcrypto.so.1.1[0x17770f] <+383>:  popq   %rbx
libcrypto.so.1.1[0x177710] <+384>:  popq   %rbp
libcrypto.so.1.1[0x177711] <+385>:  popq   %r12
libcrypto.so.1.1[0x177713] <+387>:  retq
libcrypto.so.1.1[0x177714] <+388>:  nopl   (%rax)
libcrypto.so.1.1[0x177718] <+392>:  leaq   -0x44f(%rip), %rsi        ; ___lldb_unnamed_symbol1508$$libcrypto.so.1.1
libcrypto.so.1.1[0x17771f] <+399>:  leaq   0x15d0e6(%rip), %rdi
libcrypto.so.1.1[0x177726] <+406>:  callq  0x1e2780                  ; CRYPTO_THREAD_run_once
libcrypto.so.1.1[0x17772b] <+411>:  testl  %eax, %eax
libcrypto.so.1.1[0x17772d] <+413>:  jne    0x177615                  ; <+133>
libcrypto.so.1.1[0x177733] <+419>:  jmp    0x1775b1                  ; <+33>
libcrypto.so.1.1[0x177738] <+424>:  leaq   -0x67f(%rip), %rsi        ; ___lldb_unnamed_symbol1493$$libcrypto.so.1.1
libcrypto.so.1.1[0x17773f] <+431>:  leaq   0x15d0b6(%rip), %rdi
libcrypto.so.1.1[0x177746] <+438>:  callq  0x1e2780                  ; CRYPTO_THREAD_run_once
libcrypto.so.1.1[0x17774b] <+443>:  testl  %eax, %eax
libcrypto.so.1.1[0x17774d] <+445>:  je     0x1775b1                  ; <+33>
libcrypto.so.1.1[0x177753] <+451>:  movl   0x15d09a(%rip), %r10d
libcrypto.so.1.1[0x17775a] <+458>:  testl  %r10d, %r10d
libcrypto.so.1.1[0x17775d] <+461>:  jne    0x177655                  ; <+197>
libcrypto.so.1.1[0x177763] <+467>:  jmp    0x1775b1                  ; <+33>
libcrypto.so.1.1[0x177768] <+472>:  leaq   -0x4cf(%rip), %rsi        ; ___lldb_unnamed_symbol1507$$libcrypto.so.1.1
libcrypto.so.1.1[0x17776f] <+479>:  leaq   0x15d086(%rip), %rdi
libcrypto.so.1.1[0x177776] <+486>:  callq  0x1e2780                  ; CRYPTO_THREAD_run_once
libcrypto.so.1.1[0x17777b] <+491>:  testl  %eax, %eax
libcrypto.so.1.1[0x17777d] <+493>:  je     0x1775b1                  ; <+33>
libcrypto.so.1.1[0x177783] <+499>:  movl   0x15d06a(%rip), %r9d
libcrypto.so.1.1[0x17778a] <+506>:  testl  %r9d, %r9d
libcrypto.so.1.1[0x17778d] <+509>:  jne    0x17765e                  ; <+206>
libcrypto.so.1.1[0x177793] <+515>:  jmp    0x1775b1                  ; <+33>
libcrypto.so.1.1[0x177798] <+520>:  leaq   -0x6cf(%rip), %rsi        ; ___lldb_unnamed_symbol1494$$libcrypto.so.1.1
libcrypto.so.1.1[0x17779f] <+527>:  leaq   0x15d04a(%rip), %rdi
libcrypto.so.1.1[0x1777a6] <+534>:  callq  0x1e2780                  ; CRYPTO_THREAD_run_once
libcrypto.so.1.1[0x1777ab] <+539>:  testl  %eax, %eax
libcrypto.so.1.1[0x1777ad] <+541>:  je     0x1775b1                  ; <+33>
libcrypto.so.1.1[0x1777b3] <+547>:  movl   0x15d032(%rip), %r8d
libcrypto.so.1.1[0x1777ba] <+554>:  testl  %r8d, %r8d
libcrypto.so.1.1[0x1777bd] <+557>:  jne    0x177667                  ; <+215>
libcrypto.so.1.1[0x1777c3] <+563>:  jmp    0x1775b1                  ; <+33>
libcrypto.so.1.1[0x1777c8] <+568>:  leaq   -0x54f(%rip), %rsi        ; ___lldb_unnamed_symbol1506$$libcrypto.so.1.1
libcrypto.so.1.1[0x1777cf] <+575>:  leaq   0x15d01a(%rip), %rdi
libcrypto.so.1.1[0x1777d6] <+582>:  callq  0x1e2780                  ; CRYPTO_THREAD_run_once
libcrypto.so.1.1[0x1777db] <+587>:  testl  %eax, %eax
libcrypto.so.1.1[0x1777dd] <+589>:  je     0x1775b1                  ; <+33>
libcrypto.so.1.1[0x1777e3] <+595>:  movl   0x15d003(%rip), %edi
libcrypto.so.1.1[0x1777e9] <+601>:  testl  %edi, %edi
libcrypto.so.1.1[0x1777eb] <+603>:  jne    0x177670                  ; <+224>
libcrypto.so.1.1[0x1777f1] <+609>:  jmp    0x1775b1                  ; <+33>
libcrypto.so.1.1[0x1777f6] <+614>:  leaq   -0x71d(%rip), %rsi        ; ___lldb_unnamed_symbol1495$$libcrypto.so.1.1
libcrypto.so.1.1[0x1777fd] <+621>:  leaq   0x15cfe4(%rip), %rdi
libcrypto.so.1.1[0x177804] <+628>:  callq  0x1e2780                  ; CRYPTO_THREAD_run_once
libcrypto.so.1.1[0x177809] <+633>:  testl  %eax, %eax
libcrypto.so.1.1[0x17780b] <+635>:  je     0x1775b1                  ; <+33>
libcrypto.so.1.1[0x177811] <+641>:  movl   0x15cfcd(%rip), %esi
libcrypto.so.1.1[0x177817] <+647>:  testl  %esi, %esi
libcrypto.so.1.1[0x177819] <+649>:  jne    0x177679                  ; <+233>
libcrypto.so.1.1[0x17781f] <+655>:  jmp    0x1775b1                  ; <+33>
libcrypto.so.1.1[0x177824] <+660>:  leaq   -0x5cb(%rip), %rsi        ; ___lldb_unnamed_symbol1505$$libcrypto.so.1.1
libcrypto.so.1.1[0x17782b] <+667>:  leaq   0x15cfb6(%rip), %rdi
libcrypto.so.1.1[0x177832] <+674>:  callq  0x1e2780                  ; CRYPTO_THREAD_run_once
libcrypto.so.1.1[0x177837] <+679>:  testl  %eax, %eax
libcrypto.so.1.1[0x177839] <+681>:  je     0x1775b1                  ; <+33>
libcrypto.so.1.1[0x17783f] <+687>:  movl   0x15cf9f(%rip), %ecx
libcrypto.so.1.1[0x177845] <+693>:  testl  %ecx, %ecx
libcrypto.so.1.1[0x177847] <+695>:  jne    0x177682                  ; <+242>
libcrypto.so.1.1[0x17784d] <+701>:  jmp    0x1775b1                  ; <+33>
libcrypto.so.1.1[0x177852] <+706>:  callq  0x1e2850                  ; ___lldb_unnamed_symbol1948$$libcrypto.so.1.1
libcrypto.so.1.1[0x177857] <+711>:  testl  %eax, %eax
libcrypto.so.1.1[0x177859] <+713>:  jne    0x17768e                  ; <+254>
libcrypto.so.1.1[0x17785f] <+719>:  jmp    0x1775b1                  ; <+33>
libcrypto.so.1.1[0x177864] <+724>:  leaq   -0x62b(%rip), %rsi        ; ___lldb_unnamed_symbol1504$$libcrypto.so.1.1
libcrypto.so.1.1[0x17786b] <+731>:  leaq   0x15cf6e(%rip), %rdi
libcrypto.so.1.1[0x177872] <+738>:  callq  0x1e2780                  ; CRYPTO_THREAD_run_once
libcrypto.so.1.1[0x177877] <+743>:  testl  %eax, %eax
libcrypto.so.1.1[0x177879] <+745>:  je     0x1775b1                  ; <+33>
libcrypto.so.1.1[0x17787f] <+751>:  movl   0x15cf4b(%rip), %edx
libcrypto.so.1.1[0x177885] <+757>:  testl  %edx, %edx
libcrypto.so.1.1[0x177887] <+759>:  jne    0x177697                  ; <+263>
libcrypto.so.1.1[0x17788d] <+765>:  jmp    0x1775b1                  ; <+33>
libcrypto.so.1.1[0x177892] <+770>:  movq   0x15cf87(%rip), %rdi
libcrypto.so.1.1[0x177899] <+777>:  callq  0x1e2700                  ; CRYPTO_THREAD_write_lock
libcrypto.so.1.1[0x17789e] <+782>:  leaq   -0x685(%rip), %rsi        ; ___lldb_unnamed_symbol1503$$libcrypto.so.1.1
libcrypto.so.1.1[0x1778a5] <+789>:  leaq   0x15cf34(%rip), %rdi
libcrypto.so.1.1[0x1778ac] <+796>:  movq   %rbp, 0x15cf25(%rip)
libcrypto.so.1.1[0x1778b3] <+803>:  callq  0x1e2780                  ; CRYPTO_THREAD_run_once
libcrypto.so.1.1[0x1778b8] <+808>:  movl   %eax, %r12d
libcrypto.so.1.1[0x1778bb] <+811>:  testl  %eax, %eax
libcrypto.so.1.1[0x1778bd] <+813>:  jne    0x177967                  ; <+983>
libcrypto.so.1.1[0x1778c3] <+819>:  movq   0x15cf56(%rip), %rdi
libcrypto.so.1.1[0x1778ca] <+826>:  movq   $0x0, 0x15cf03(%rip)
libcrypto.so.1.1[0x1778d5] <+837>:  callq  0x1e2720                  ; CRYPTO_THREAD_unlock
libcrypto.so.1.1[0x1778da] <+842>:  jmp    0x1775b4                  ; <+36>
libcrypto.so.1.1[0x1778df] <+847>:  leaq   -0x6f6(%rip), %rsi        ; ___lldb_unnamed_symbol1502$$libcrypto.so.1.1
libcrypto.so.1.1[0x1778e6] <+854>:  leaq   0x15cedf(%rip), %rdi
libcrypto.so.1.1[0x1778ed] <+861>:  callq  0x1e2780                  ; CRYPTO_THREAD_run_once
libcrypto.so.1.1[0x1778f2] <+866>:  testl  %eax, %eax
libcrypto.so.1.1[0x1778f4] <+868>:  je     0x1775b1                  ; <+33>
libcrypto.so.1.1[0x1778fa] <+874>:  movl   0x15cec4(%rip), %eax
libcrypto.so.1.1[0x177900] <+880>:  testl  %eax, %eax
libcrypto.so.1.1[0x177902] <+882>:  jne    0x1776a9                  ; <+281>
libcrypto.so.1.1[0x177908] <+888>:  jmp    0x1775b1                  ; <+33>
libcrypto.so.1.1[0x17790d] <+893>:  leaq   -0x744(%rip), %rsi        ; ___lldb_unnamed_symbol1501$$libcrypto.so.1.1
libcrypto.so.1.1[0x177914] <+900>:  leaq   0x15cea5(%rip), %rdi
libcrypto.so.1.1[0x17791b] <+907>:  callq  0x1e2780                  ; CRYPTO_THREAD_run_once
libcrypto.so.1.1[0x177920] <+912>:  testl  %eax, %eax
libcrypto.so.1.1[0x177922] <+914>:  je     0x1775b1                  ; <+33>
libcrypto.so.1.1[0x177928] <+920>:  cmpl   $0x0, 0x15ce8d(%rip)
libcrypto.so.1.1[0x17792f] <+927>:  jne    0x1776b2                  ; <+290>
libcrypto.so.1.1[0x177935] <+933>:  jmp    0x1775b1                  ; <+33>
libcrypto.so.1.1[0x17793a] <+938>:  leaq   -0x791(%rip), %rsi        ; ___lldb_unnamed_symbol1500$$libcrypto.so.1.1
libcrypto.so.1.1[0x177941] <+945>:  leaq   0x15ce70(%rip), %rdi
libcrypto.so.1.1[0x177948] <+952>:  callq  0x1e2780                  ; CRYPTO_THREAD_run_once
libcrypto.so.1.1[0x17794d] <+957>:  testl  %eax, %eax
libcrypto.so.1.1[0x17794f] <+959>:  je     0x1775b1                  ; <+33>
libcrypto.so.1.1[0x177955] <+965>:  cmpl   $0x0, 0x15ce58(%rip)
libcrypto.so.1.1[0x17795c] <+972>:  jne    0x1776bb                  ; <+299>
libcrypto.so.1.1[0x177962] <+978>:  jmp    0x1775b1                  ; <+33>
libcrypto.so.1.1[0x177967] <+983>:  movl   0x15ce63(%rip), %ebp
libcrypto.so.1.1[0x17796d] <+989>:  movq   0x15ceac(%rip), %rdi
libcrypto.so.1.1[0x177974] <+996>:  movq   $0x0, 0x15ce59(%rip)
libcrypto.so.1.1[0x17797f] <+1007>: callq  0x1e2720                  ; CRYPTO_THREAD_unlock
libcrypto.so.1.1[0x177984] <+1012>: testl  %ebp, %ebp
libcrypto.so.1.1[0x177986] <+1014>: jg     0x1776a0                  ; <+272>
libcrypto.so.1.1[0x17798c] <+1020>: jmp    0x1775b1                  ; <+33>
libcrypto.so.1.1[0x177991] <+1025>: leaq   -0x808(%rip), %rsi        ; ___lldb_unnamed_symbol1499$$libcrypto.so.1.1
libcrypto.so.1.1[0x177998] <+1032>: leaq   0x15ce11(%rip), %rdi
libcrypto.so.1.1[0x17799f] <+1039>: callq  0x1e2780                  ; CRYPTO_THREAD_run_once
libcrypto.so.1.1[0x1779a4] <+1044>: testl  %eax, %eax
libcrypto.so.1.1[0x1779a6] <+1046>: je     0x1775b1                  ; <+33>
libcrypto.so.1.1[0x1779ac] <+1052>: cmpl   $0x0, 0x15cdf9(%rip)
libcrypto.so.1.1[0x1779b3] <+1059>: jne    0x1776c4                  ; <+308>
libcrypto.so.1.1[0x1779b9] <+1065>: jmp    0x1775b1                  ; <+33>
libcrypto.so.1.1[0x1779be] <+1070>: leaq   -0x8d5(%rip), %rsi        ; ___lldb_unnamed_symbol1496$$libcrypto.so.1.1
libcrypto.so.1.1[0x1779c5] <+1077>: leaq   0x15cddc(%rip), %rdi
libcrypto.so.1.1[0x1779cc] <+1084>: callq  0x1e2780                  ; CRYPTO_THREAD_run_once
libcrypto.so.1.1[0x1779d1] <+1089>: testl  %eax, %eax
libcrypto.so.1.1[0x1779d3] <+1091>: je     0x1775b1                  ; <+33>
libcrypto.so.1.1[0x1779d9] <+1097>: cmpl   $0x0, 0x15cdc0(%rip)
libcrypto.so.1.1[0x1779e0] <+1104>: jne    0x1776d9                  ; <+329>
libcrypto.so.1.1[0x1779e6] <+1110>: jmp    0x1775b1                  ; <+33>
libcrypto.so.1.1[0x1779eb] <+1115>: callq  0x1535f0                  ; ENGINE_register_all_complete
libcrypto.so.1.1[0x1779f0] <+1120>: jmp    0x1776cd                  ; <+317>

(lldb)

@bartonjs I see a test against 0x80000 ...

@bartonjs
Copy link
Member

::kermit arms:: Yaaaaaaay!

Looks like we won't have a problem on 20.04. Hopefully that's enough to avoid needing to add our own locking/refcounting/whatever to literally every shim method.

@danmoseley
Copy link
Member

@marcwittke is it possible for you to try on Ubuntu 20.04 or later? We think that will fix it. It is an issue in the libcrypto on 18.04.

@marcwittke
Copy link
Author

I think so. We have two agents running right now on 18.04. I'll update one of them to 20.04 and let's see. Since it's intermitting, I think in a week I can give you a watermark whether it helped or not.

@danmoseley
Copy link
Member

I'm not sure we'd arrived at consensus that we wouldn't take a fix here ...18.04 is supported until 2028 and we'll presumably support it in .NET 7. This is also causing our automated tests to crash periodically.

I'll leave this open for other customers to comment on the impact. But the recommendation above remains to move to 20.04 ID affected.

@danmoseley danmoseley reopened this Dec 8, 2021
@hoyosjs
Copy link
Member

hoyosjs commented Dec 8, 2021

As seen in dotnet/sdk#22872 (comment)

This seems to be #48411 which happens on 18.04 as seen here.

The stack:

00 00007f0e`d15f7c10 00007f0f`305b5959     libpthread_2_27!pthread_rwlock_wrlock+0x12
01 00007f0e`d15f7c50 00007f0f`30577013     libcrypto_so_1!CRYPTO_THREAD_write_lock+0x9
02 00007f0e`d15f7c60 00007f0f`305772f0     libcrypto_so_1!RAND_get_rand_method+0x33
03 00007f0e`d15f7c80 00007f0f`3053449f     libcrypto_so_1!RAND_bytes+0x10
04 00007f0e`d15f7ca0 00007f0f`30542a97     libcrypto_so_1!EVP_MD_CTX_ctrl+0x132f
05 00007f0e`d15f7cd0 00007f0f`a31a5804     libcrypto_so_1!EVP_CIPHER_CTX_ctrl+0x17
06 00007f0e`d15f7ce0 00007f0f`a319761a     libssl_so_1!SSL_in_before+0x13bd4
07 00007f0e`d15f7e30 00007f0f`a3192006     libssl_so_1!SSL_in_before+0x59ea
08 00007f0e`d15f7e40 00007f0f`a317e4e4     libssl_so_1!SSL_in_before+0x3d6
09 00007f0e`d15f7f10 00007f0f`334590a0     libssl_so_1!SSL_do_handshake+0x54
0a 00007f0e`d15f7f50 00007f0f`334571b6     Interop+Ssl.<SslDoHandshake>g____PInvoke__|26_0(IntPtr)+0x40
0b 00007f0e`d15f7ff0 00007f0f`3345bdec     System_Net_Security!Interop+Ssl.SslDoHandshake(Microsoft.Win32.SafeHandles.SafeSslHandle)+0x56 [/_/src/libraries/System.Net.Security/src/Microsoft.Interop.DllImportGenerator/Microsoft.Interop.DllImportGenerator/GeneratedDllImports.g.cs @ 3487] 
0c 00007f0e`d15f8030 00007f0f`3347a3e5     System_Net_Security!Interop+OpenSsl.DoSslHandshake(Microsoft.Win32.SafeHandles.SafeSslHandle, System.ReadOnlySpan`1<Byte>, Byte[] ByRef, Int32 ByRef)+0x8c [/_/src/libraries/Common/src/Interop/Unix/System.Security.Cryptography.Native/Interop.OpenSsl.cs @ 338] 
0d 00007f0e`d15f80a0 00007f0f`33467c75     System_Net_Security!System.Net.Security.SslStreamPal.HandshakeInternal(System.Net.Security.SafeFreeCredentials, System.Net.Security.SafeDeleteSslContext ByRef, System.ReadOnlySpan`1<Byte>, Byte[] ByRef, System.Net.Security.SslAuthenticationOptions)+0xb5 [/_/src/libraries/System.Net.Security/src/System/Net/Security/SslStreamPal.Unix.cs @ 161] 
0e 00007f0e`d15f8170 00007f0f`33467989     System_Net_Security!System.Net.Security.SecureChannel.GenerateToken(System.ReadOnlySpan`1<Byte>, Byte[] ByRef)+0x155 [/_/src/libraries/System.Net.Security/src/System/Net/Security/SecureChannel.cs @ 803] 
0f 00007f0e`d15f8210 00007f0f`3346ced7     System_Net_Security!System.Net.Security.SecureChannel.NextMessage(System.ReadOnlySpan`1<Byte>)+0x39 [/_/src/libraries/System.Net.Security/src/System/Net/Security/SecureChannel.cs @ 725] 
10 00007f0e`d15f8280 00007f0f`3347e742     System_Net_Security!System.Net.Security.SslStream.ProcessBlob(Int32)+0x157 [/_/src/libraries/System.Net.Security/src/System/Net/Security/SslStream.Implementation.cs @ 593] 
11 00007f0e`d15f8310 00000000`00000000     System_Net_Security!System.Net.Security.SslStream+<ReceiveBlobAsync>d__174`1[[System.Net.Security.AsyncReadWriteAdapter, System.Net.Security]].MoveNext()+0x9a2 [/_/src/libraries/System.Net.Security/src/System/Net/Security/SslStream.Implementation.cs @ 555] 

pthread_rwlock_wrlock has the following disassembly from entry to faulting point:

libpthread_2_27!pthread_rwlock_wrlock:
00007f0f`ac2fc880 4157            push    r15
00007f0f`ac2fc882 4156            push    r14
00007f0f`ac2fc884 4155            push    r13
00007f0f`ac2fc886 4154            push    r12
00007f0f`ac2fc888 55              push    rbp
00007f0f`ac2fc889 53              push    rbx
00007f0f`ac2fc88a 4889fb          mov     rbx,rdi
00007f0f`ac2fc88d 4883ec08        sub     rsp,8
00007f0f`ac2fc891 90              nop
00007f0f`ac2fc892 8b5718          mov     edx,dword ptr [rdi+18h]

The segv is from reading RDI + 0x18 = 0x18. RBX and RDX are indeed 0. RDI in SysV is the first parameter passed, which is pthread_rwlock_t*. That's passed in from https://github.com/openssl/openssl/blob/b1553c89285cb05a28d185423bc3df9b505db92a/crypto/threads_pthread.c#L75-L86; called from RAND_get_rand_method with a C-static lock, rand_meth_lock, which doesn't support reinitialization in 18.04.

@agocke agocke added area-System.Security untriaged New issue has not been triaged by the area owner and removed untriaged New issue has not been triaged by the area owner area-Host labels Feb 14, 2022
@bartonjs
Copy link
Member

bartonjs commented Jul 7, 2022

I'm not sure we'd arrived at consensus that we wouldn't take a fix here

The only complete fix we could take would be to run literally every shim function to OpenSSL under the same mutex we use for loading exception strings, to work around applications doing work on background threads after the main thread has exited (because these crashes are only after exit() has been called / main() has exited).

The biggest offender seems to be SSL_do_handshake; so we /might/ be able to start the game of whack-a-mole by making TLS handshakes mutexed; but I don't think that the networking team would like that. (We could probably change our mutex to a rwlock so we don't utterly kill parallelism with TLS handshakes, but it's still not free)

@bartonjs
Copy link
Member

bartonjs commented Jul 7, 2022

I've also not tried working with Canonical to get them to just patch in the support for OPENSSL_NO_ATEXIT. @richlander do you have any contacts there?

@richlander
Copy link
Member

I do. Hey @wiswaud -- can you get us a contact at Canonical who can help us with some OpenSSL issues on Ubuntu 18.04?

@richlander
Copy link
Member

I was given an official account Canonical account to report issues via their tracker. That was quick.

@bartonjs Can you write a succinct description of the issue that I can copy/paste into the Canonical tracker?

@bartonjs
Copy link
Member

bartonjs commented Jul 8, 2022

@richlander How's this?

Bionic's OpenSSL 1.1.1 package (https://launchpad.net/ubuntu/bionic/+source/openssl) is the only version of openssl 1.1.1 on any distro that we've encountered that does not have support for the OPENSSL_NO_ATEXIT functionality from 1.1.1b (openssl/openssl@c2b3db2).

The threading model in .NET has the possibility that background threads are still running when exit() is called, which can cause SIGSEGV if a background thread interacts with OpenSSL after/while it has unloaded. For that reason, we always initialize OpenSSL 1.1.1 with the OPENSSL_NO_ATEXIT flag (which, of all the distros we run on only has no effect on Bionic).

We feel that the stability of applications on Ubuntu 18.04 would be improved if the functionality of OPENSSL_NO_ATEXIT was merged into the bionic openssl 1.1.1 package, even if the constant isn't published into the header for the dev package.

@richlander
Copy link
Member

Perfect! Thanks much.

@janvorli
Copy link
Member

janvorli commented Jul 8, 2022

I have been hitting this crash recently on my main devbox, which is Ubuntu 18.04. However, it started to happen relatively recently, at most a month ago build was stable. So maybe something has changed in the msbuild that makes this occur much more frequently or something like that.

@richlander
Copy link
Member

If you have a good repro, that would be useful. I am taking with Canonical now.

@janvorli
Copy link
Member

janvorli commented Jul 9, 2022

Unfortunately I don't. It crashes on average once in a day or two when running ./build.sh script.

@richlander
Copy link
Member

That's OK. Let's see if we can get a fix and then maybe deploy some early fixes.

@jeffhandley jeffhandley added the tracking-external-issue The issue is caused by external problem (e.g. OS) - nothing we can do to fix it directly label Jul 11, 2022
@jeffhandley jeffhandley added this to the 7.0.0 milestone Jul 11, 2022
@ghost ghost removed the untriaged New issue has not been triaged by the area owner label Jul 11, 2022
@jeffhandley jeffhandley modified the milestones: 7.0.0, Future Jul 27, 2022
@krwq
Copy link
Member

krwq commented Aug 30, 2023

Simple repro code for testing:

using System;
using System.Runtime.InteropServices;
using System.Security.Cryptography;

atexit(AtExitHandler);

byte[] data = new byte[] { 0, 1, 2, 3, 4, 5 };
byte[] hashValue;

using (SHA256 sha256 = SHA256.Create())
{
    hashValue = sha256.ComputeHash(data);
}

Console.WriteLine($"hash: {ToHex(hashValue)}");

[DllImport("libc", EntryPoint = "__cxa_atexit", CallingConvention = CallingConvention.Cdecl)]
static extern int atexit(Action a);

static void AtExitHandler()
{
    byte[] randomBytes = new byte[16];
    RandomNumberGenerator.Fill(randomBytes);
    Console.WriteLine($"random: {ToHex(randomBytes)}");
}

static string ToHex(byte[] bytes)
{
    return string.Join("", bytes.Select((b) => b.ToString("X2")));
}

In case your mangled name of atexit differs to get a correct one:

nm -D `ldd \`which echo\` | grep libc | cut '--delimiter= ' -f 3` | grep 'atexit\>' | cut '--delimiter= ' -f 3
# for me prints: __cxa_atexit

we should probably do .so and wrap atexit file since atexit is only source compatible and in some places it's documented that it takes 2 extra args but it's meant to be a simple demonstration of issue... adjust as needed...

@richlander
Copy link
Member

richlander commented Aug 30, 2023

We are in the late stages of getting Canonical to publish a fix in Ubuntu 18.04 via their ESM program. I believe the easiest way to access that is via Ubuntu Pro.

@dotnet dotnet unlocked this conversation Aug 30, 2023
@richlander
Copy link
Member

The fix has been released in libssl package version 1.1.1-1ubuntu2.1~18.04.23+esm1.

Here are my repro steps to acquire that package: https://gist.github.com/richlander/47333cbf90ee0ee3f51bcb0dbbb3a76f?permalink_comment_id=4676592#gistcomment-4676592

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-System.Security tracking-external-issue The issue is caused by external problem (e.g. OS) - nothing we can do to fix it directly
Projects
None yet
Development

No branches or pull requests