Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bundled Tracer fails to load with .NET 8 and COMPlus_EnableDiagnostics=0 #5030

Closed
marcovr opened this issue Jan 8, 2024 · 11 comments
Closed

Comments

@marcovr
Copy link

marcovr commented Jan 8, 2024

Describe the bug
While upgrading our applications to .NET 8 we experienced missing Traces and APM metrics. We then found out that the bundled tracer fails to load / attach to the .NET process when running with .NET 8 and having the environment variable COMPlus_EnableDiagnostics=0 set.

To Reproduce
Steps to reproduce the behavior:

  1. Create a new dotnet webapp using .NET 8 with reference to Datadog.Trace.Bundle
  2. Set env variables for Tracer as well as COMPlus_EnableDiagnostics=0
  3. Run application
  4. Tracer is not loaded - Traces are missing!

Have a look at my minimal reproducible repo
As you can see in this run, the bundled tracer is successfully loaded with .NET 7 regardless whether COMPlus_EnableDiagnostics=0 is set or not. But with .NET 8 it fails to load with COMPlus_EnableDiagnostics=0:

[WARNING]: The native loader library is not loaded into the process

Note: the lines "[FAILURE]: Error connecting to Agent" can be ignored - That is expected because I'm not running an agent :-)

Expected behavior
The tracer works with COMPlus_EnableDiagnostics=0

ScreenshotsVisualization

Bundled Tracer works? .NET 7 .NET 8
COMPlus_EnableDiagnostics=1
COMPlus_EnableDiagnostics=0

Runtime environment (please complete the following information):

  • Instrumentation mode: automatic with Datadog.Trace.Bundle
  • Tracer version: 2.44.0
  • OS: Debian GNU/Linux 12 (bookworm)
  • CLR: .NET 8.0.0

Additional context
I only tested this in a containerized environment, but assume that other environments are also affected

@marcovr
Copy link
Author

marcovr commented Jan 8, 2024

I just realized that the linked logs are apparently private, so here is what I get:

✅ .NET 7 COMPlus_EnableDiagnostics=0

Running checks on process 1
Process name: dotnet

---- STARTING TRACER SETUP CHECKS -----
Target process is running with .NET Core
1. Checking Modules Needed so the Tracer Loads:
 [SUCCESS]: The tracer version 2.44.0.0 is loaded into the process.
2. Checking DD_DOTNET_TRACER_HOME and related configuration value:
 [SUCCESS]: DD_DOTNET_TRACER_HOME is set to '/app/datadog' and the directory was
found correctly.
3. Checking CORECLR_PROFILER_PATH and related configuration value:
 [SUCCESS]: The environment variable CORECLR_PROFILER_PATH is set to the correct
value of /app/datadog/linux-x64/Datadog.Trace.ClrProfiler.Native.so.
4. Checking CORECLR_PROFILER and related configuration value:
 [SUCCESS]: The environment variable CORECLR_PROFILER is set to the correct 
value of {846F5F1C-F9AE-4B07-969E-05C26BC060D8}.
5. Checking CORECLR_ENABLE_PROFILING and related configuration value:
 [SUCCESS]: The environment variable CORECLR_ENABLE_PROFILING is set to the 
correct value of 1.

---- CONFIGURATION CHECKS -----
1. Checking if tracing is disabled using DD_TRACE_ENABLED.
 [INFO]: DD_TRACE_ENABLED is not set, the default value is true.
2. Checking if profiling is enabled using DD_PROFILING_ENABLED.
 [INFO]: DD_PROFILING_ENABLED is not set, the continuous profiler is disabled.

---- DATADOG AGENT CHECKS -----
Detected agent url: http://127.0.0.1:8126/. Note: this url may be incorrect if 
you configured the application through a configuration file.
Connecting to Agent at endpoint http://127.0.0.1:8126/ using HTTP
 [FAILURE]: Error connecting to Agent at http://127.0.0.1:8126/: Connection 
refused (127.0.0.1:8126)

✅ .NET 7 COMPlus_EnableDiagnostics unset

Running checks on process 1
Process name: dotnet

---- STARTING TRACER SETUP CHECKS -----
Target process is running with .NET Core
1. Checking Modules Needed so the Tracer Loads:
 [SUCCESS]: The tracer version 2.44.0.0 is loaded into the process.
2. Checking DD_DOTNET_TRACER_HOME and related configuration value:
 [SUCCESS]: DD_DOTNET_TRACER_HOME is set to '/app/datadog' and the directory was
found correctly.
3. Checking CORECLR_PROFILER_PATH and related configuration value:
 [SUCCESS]: The environment variable CORECLR_PROFILER_PATH is set to the correct
value of /app/datadog/linux-x64/Datadog.Trace.ClrProfiler.Native.so.
4. Checking CORECLR_PROFILER and related configuration value:
 [SUCCESS]: The environment variable CORECLR_PROFILER is set to the correct 
value of {846F5F1C-F9AE-4B07-969E-05C26BC060D8}.
5. Checking CORECLR_ENABLE_PROFILING and related configuration value:
 [SUCCESS]: The environment variable CORECLR_ENABLE_PROFILING is set to the 
correct value of 1.

---- CONFIGURATION CHECKS -----
1. Checking if tracing is disabled using DD_TRACE_ENABLED.
 [INFO]: DD_TRACE_ENABLED is not set, the default value is true.
2. Checking if profiling is enabled using DD_PROFILING_ENABLED.
 [INFO]: DD_PROFILING_ENABLED is not set, the continuous profiler is disabled.

---- DATADOG AGENT CHECKS -----
Detected agent url: http://127.0.0.1:8126/. Note: this url may be incorrect if 
you configured the application through a configuration file.
Connecting to Agent at endpoint http://127.0.0.1:8126/ using HTTP
 [FAILURE]: Error connecting to Agent at http://127.0.0.1:8126/: Connection 
refused (127.0.0.1:8126)

❌ .NET 8 COMPlus_EnableDiagnostics=0

Running checks on process 1
Process name: dotnet

---- STARTING TRACER SETUP CHECKS -----
Target process is running with .NET Core
1. Checking Modules Needed so the Tracer Loads:
 [WARNING]: The native loader library is not loaded into the process
 [WARNING]: The native tracer library is not loaded into the process
 [SUCCESS]: The tracer version 2.44.0.0 is loaded into the process.
2. Checking DD_DOTNET_TRACER_HOME and related configuration value:
 [SUCCESS]: DD_DOTNET_TRACER_HOME is set to '/app/datadog' and the directory was
found correctly.
3. Checking CORECLR_PROFILER_PATH and related configuration value:
 [SUCCESS]: The environment variable CORECLR_PROFILER_PATH is set to the correct
value of /app/datadog/linux-x64/Datadog.Trace.ClrProfiler.Native.so.
4. Checking CORECLR_PROFILER and related configuration value:
 [SUCCESS]: The environment variable CORECLR_PROFILER is set to the correct 
value of {846F5F1C-F9AE-4B07-969E-05C26BC060D8}.
5. Checking CORECLR_ENABLE_PROFILING and related configuration value:
 [SUCCESS]: The environment variable CORECLR_ENABLE_PROFILING is set to the 
correct value of 1.
6. Checking if process tracing configuration matches Installer or Bundler:
Installer related documentation: 
https://docs.datadoghq.com/tracing/trace_collection/dd_libraries/dotnet-core?tab
=linux#install-the-tracer
 [FAILURE]: Error trying to check the Linux installer directory: Could not find 
a part of the path '/opt/datadog'.

Note the lines:

[WARNING]: The native loader library is not loaded into the process

✅ .NET 8 COMPlus_EnableDiagnostics unset

Running checks on process 1
Process name: dotnet

---- STARTING TRACER SETUP CHECKS -----
Target process is running with .NET Core
1. Checking Modules Needed so the Tracer Loads:
 [SUCCESS]: The tracer version 2.44.0.0 is loaded into the process.
2. Checking DD_DOTNET_TRACER_HOME and related configuration value:
 [SUCCESS]: DD_DOTNET_TRACER_HOME is set to '/app/datadog' and the directory was
found correctly.
3. Checking CORECLR_PROFILER_PATH and related configuration value:
 [SUCCESS]: The environment variable CORECLR_PROFILER_PATH is set to the correct
value of /app/datadog/linux-x64/Datadog.Trace.ClrProfiler.Native.so.
4. Checking CORECLR_PROFILER and related configuration value:
 [SUCCESS]: The environment variable CORECLR_PROFILER is set to the correct 
value of {846F5F1C-F9AE-4B07-969E-05C26BC060D8}.
5. Checking CORECLR_ENABLE_PROFILING and related configuration value:
 [SUCCESS]: The environment variable CORECLR_ENABLE_PROFILING is set to the 
correct value of 1.

---- CONFIGURATION CHECKS -----
1. Checking if tracing is disabled using DD_TRACE_ENABLED.
 [INFO]: DD_TRACE_ENABLED is not set, the default value is true.
2. Checking if profiling is enabled using DD_PROFILING_ENABLED.
 [INFO]: DD_PROFILING_ENABLED is not set, the continuous profiler is disabled.

---- DATADOG AGENT CHECKS -----
Detected agent url: http://127.0.0.1:8126/. Note: this url may be incorrect if 
you configured the application through a configuration file.
Connecting to Agent at endpoint http://127.0.0.1:8126/ using HTTP
 [FAILURE]: Error connecting to Agent at http://127.0.0.1:8126/: Connection 
refused (127.0.0.1:8126)

@andrewlock
Copy link
Member

Hi @marcovr, thanks for flagging this. It appears that this was a behavior change in .NET 8 which disables the profiling APIs we rely on when you set COMPlus_EnableDiagnostics=0.

Unfortunately, as it's in the runtime, there's nothing we can do about it, however they suggest the following workaround:

To emulate previous behavior, I suggest setting the following to ensure the behavior is as intended:

DOTNET_EnableDiagnostics=1
DOTNET_EnableDiagnostics_IPC=0
DOTNET_EnableDiagnostics_Debugger=0
DOTNET_EnableDiagnostics_Profiler=1

Could you give that a try and make sure that fixes your issue? Thanks!

@marcovr
Copy link
Author

marcovr commented Jan 8, 2024

Ohh I see. Not your fault then 😉
I can confirm that your suggestion works as expected.

But maybe it could be worth adding a note in the Readme / setup documentation about this change?
I spent quite a while trying to figure out what was causing the issue

@andrewlock
Copy link
Member

Yep, makes sense - will look at getting that added somewhere - thanks! 🙂

@nwesoccer
Copy link

Having DOTNET_EnableDiagnostics=1 though prevents the use of read-only root filesystem for dotnet containers. Any work-around for having dotnet, read-only root filesystem, AND datadog tracing all at the same time?

@marcovr
Copy link
Author

marcovr commented Jan 25, 2024

@nwesoccer that was the reason why we had originally set it to 0 as well 😄

But if you set all the following environment variables, it does indeed work with a readonly filesystem because no IPC files are written:

DOTNET_EnableDiagnostics=1
DOTNET_EnableDiagnostics_IPC=0
DOTNET_EnableDiagnostics_Debugger=0
DOTNET_EnableDiagnostics_Profiler=1

See the corresponding docs

@nwesoccer
Copy link

@marcovr I'm sorry, my test scenario was dotnet 7 as we have projects with both 7 and 8. I suppose that means for dotnet 7 we'll need DOTNET_EnableDiagnostics=0 (since the above list doesn't work with dotnet 7 and read-only) and for dotnet 8 the above mentioned list does work for dotnet8 and read-only?

@marcovr
Copy link
Author

marcovr commented Jan 25, 2024

Yes, exactly. We solved this by building customized base images where depending on the .NET version, a different set of variables is set.

@nwesoccer
Copy link

@marcovr Makes sense, Thanks!!

@andrewlock
Copy link
Member

Just FYI, we've added detection of this scenario to the dd-dotnet diagnostic tool

Out of interest though @marcovr/@nwesoccer why are you setting COMPlus_EnableDiagnostics=0/DOTNET_EnableDiagnostics=0 🤔

@marcovr
Copy link
Author

marcovr commented Feb 23, 2024

Cool, thanks 🙂

The reason why we had set COMPlus_EnableDiagnostics=0 was to run our application in containers with a read only file system.

When running any .NET application on a read only file system without this variable set, the runtime fails to start and produces the following output:

Failed to create CoreCLR, HRESULT: 0x8007000E

I suppose this happens because the runtime fails creating the debug pipes.

Interestingly though, this appears to have been fixed with .NET 8.

@marcovr marcovr closed this as completed Apr 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants