Add workaround for ASP.NET ConfigBuilder issue #6147

andrewlock · 2024-10-11T14:30:20Z

Summary of changes

Adds a check that IisPreStartInit has completed before we run any automatic instrumentations

Reason for change

We recently had a case where a customer was using AzureKeyVaultConfigBuilder with ASP.NET's web.config to load configuration into AppSettings, which was causing the application to deadlock on startup.

After some investigation, we isolated the problem as happening specifically when there are 2 apps running inside an app pool:

App 1 starts
- The config builder populates and loads from key vault
- We initialize the tracer, and set up instrumentation
- App1 works fine 👍
App 2 starts
- The config builder tries to populate and load from key vault
  - This requires making HTTP requests
  - Due to the instrumentation run for app 1, we instrument the HTTP request
  - This invokes CallTargetInvoker which tries to initialize the tracer
  - Initializing the tracer requires reading AppSettings so that we can populate configuration
    - The config builder tries to populate and load from key vault
    - ♻️re-entry 💥

So the key thing is that we need to make sure we don't run our automatic instrumentation until after the IIS pre-init stage is completely, to avoid re-entry and recursion during setup.

Implementation details

The implementation is leveraging work we did years ago to fix essentially the same problem: #1157. The problem back then was with Liblog and NLog, but we did all the work to inject flags for tracking when it is safe to make changes.

Given the native work already exists, we can piggy-pack on those hooks and make sure we don't run any automatic instrumentations unless the domain as finished pre-initialization.

This is still relatively fragile, as things like adding a static reference to IDatadogLogger would cause a static initialization to happen too early, and ultimately deadlock/crash the app, so added a bunch of comments to try to highlight the issue

Test coverage

This is kind of a pain to test because it requires a custom config builder, plus two applications running in an app pool. I tested manually by

Created a custom ConfigBuilder based on the AzureKeyVaultConfigBuilder implementation. The custom builder simply makes a generic HTTP request when a value is requested
Create a generic .NET 472 asp.net app
Add to the web.config to set up the builder (see below)
Publish the app
Create two asp.net apps in IIS, using the same app pool
Hit the first app - all good 👍
Hit the second app - 💥 The configBuilder 'CustomBuilder' failed while processing the configuration section 'appSettings'.: The ConfigurationBuilder 'CustomBuilder[Microsoft.Configuration.ConfigurationBuilders.CustomConfigBuilder]' has recursively re-entered processing of the 'appSettings' section (I based my implementation on v3 which specifically detects re-entry)
Made the fix, now it works 🎉

web.config for my dummy test:

<configuration>
  <configSections>
    <section name="configBuilders"
      type="System.Configuration.ConfigurationBuildersSection, 
      System.Configuration, Version=4.0.0.0, Culture=neutral, 
      PublicKeyToken=b03f5f7f11d50a3a"
      restartOnExternalChanges="false" requirePermission="false" />
  </configSections>

  <configBuilders>
    <builders>
      <add name="CustomBuilder" preloadSecretNames="false" Uri="https://some-value-prd.vault.azure.net/"
      type="Microsoft.Configuration.ConfigurationBuilders.CustomConfigBuilder, CustomConfigBuilder, Version=1.0.0.0, Culture=neutral" />
    </builders>
  </configBuilders>

  <appSettings configBuilders="CustomBuilder">
    <add key="DummyKey1" value="DummyKey1 value from web.config" />
    <add key="DummyKey2" value="DummyKey2 value from web.config" />
  </appSettings>

Complete test solution is here:
ConfigBuilderIssueRepro.zip

Other details

Fixes: https://datadoghq.atlassian.net/browse/APMS-13426

datadog-ddstaging · 2024-10-11T14:52:30Z

Datadog Report

Branch report: andrew/fix-recurrsion-in-aspnet
Commit report: 8b4c8b6
Test service: dd-trace-dotnet

✅ 0 Failed, 363283 Passed, 2064 Skipped, 15h 8m 50.56s Total Time

andrewlock · 2024-10-11T15:04:55Z

Execution-Time Benchmarks Report ⏱️

Execution-time results for samples comparing the following branches/commits:

Execution-time benchmarks measure the whole time it takes to execute a program. And are intended to measure the one-off costs. Cases where the execution time results for the PR are worse than latest master results are shown in red. The following thresholds were used for comparing the execution times:

Welch test with statistical test for significance of 5%
Only results indicating a difference greater than 5% and 5 ms are considered.

Note that these results are based on a single point-in-time result for each branch. For full results, see the dashboard.

Graphs show the p99 interval based on the mean and StdDev of the test run, as well as the mean value of the run (shown as a diamond below the graph).

gantt
    title Execution time (ms) FakeDbCommand (.NET Framework 4.6.2) 
    dateFormat  X
    axisFormat %s
    todayMarker off
    section Baseline
    This PR (6147) - mean (71ms)  : 68, 73
     .   : milestone, 71,
    master - mean (70ms)  : 68, 72
     .   : milestone, 70,

    section CallTarget+Inlining+NGEN
    This PR (6147) - mean (1,114ms)  : 1091, 1138
     .   : milestone, 1114,
    master - mean (1,111ms)  : 1094, 1128
     .   : milestone, 1111,

gantt
    title Execution time (ms) FakeDbCommand (.NET Core 3.1) 
    dateFormat  X
    axisFormat %s
    todayMarker off
    section Baseline
    This PR (6147) - mean (110ms)  : 107, 113
     .   : milestone, 110,
    master - mean (109ms)  : 106, 112
     .   : milestone, 109,

    section CallTarget+Inlining+NGEN
    This PR (6147) - mean (773ms)  : 761, 786
     .   : milestone, 773,
    master - mean (768ms)  : 751, 784
     .   : milestone, 768,

gantt
    title Execution time (ms) FakeDbCommand (.NET 6) 
    dateFormat  X
    axisFormat %s
    todayMarker off
    section Baseline
    This PR (6147) - mean (93ms)  : 89, 97
     .   : milestone, 93,
    master - mean (93ms)  : 89, 96
     .   : milestone, 93,

    section CallTarget+Inlining+NGEN
    This PR (6147) - mean (727ms)  : 708, 747
     .   : milestone, 727,
    master - mean (730ms)  : 711, 750
     .   : milestone, 730,

gantt
    title Execution time (ms) HttpMessageHandler (.NET Framework 4.6.2) 
    dateFormat  X
    axisFormat %s
    todayMarker off
    section Baseline
    This PR (6147) - mean (190ms)  : 187, 192
     .   : milestone, 190,
    master - mean (191ms)  : 186, 195
     .   : milestone, 191,

    section CallTarget+Inlining+NGEN
    This PR (6147) - mean (1,202ms)  : 1178, 1227
     .   : milestone, 1202,
    master - mean (1,204ms)  : 1176, 1232
     .   : milestone, 1204,

gantt
    title Execution time (ms) HttpMessageHandler (.NET Core 3.1) 
    dateFormat  X
    axisFormat %s
    todayMarker off
    section Baseline
    This PR (6147) - mean (276ms)  : 272, 280
     .   : milestone, 276,
    master - mean (275ms)  : 271, 280
     .   : milestone, 275,

    section CallTarget+Inlining+NGEN
    This PR (6147) - mean (940ms)  : 923, 958
     .   : milestone, 940,
    master - mean (950ms)  : 935, 966
     .   : milestone, 950,

gantt
    title Execution time (ms) HttpMessageHandler (.NET 6) 
    dateFormat  X
    axisFormat %s
    todayMarker off
    section Baseline
    This PR (6147) - mean (264ms)  : 260, 268
     .   : milestone, 264,
    master - mean (265ms)  : 261, 268
     .   : milestone, 265,

    section CallTarget+Inlining+NGEN
    This PR (6147) - mean (922ms)  : 903, 940
     .   : milestone, 922,
    master - mean (928ms)  : 906, 950
     .   : milestone, 928,

andrewlock · 2024-10-11T16:27:17Z

Throughput/Crank Report ⚡

Throughput results for AspNetCoreSimpleController comparing the following branches/commits:

Cases where throughput results for the PR are worse than latest master (5% drop or greater), results are shown in red.

Note that these results are based on a single point-in-time result for each branch. For full results, see one of the many, many dashboards!

gantt
    title Throughput Linux x64 (Total requests) 
    dateFormat  X
    axisFormat %s
    section Baseline
    This PR (6147) (11.119M)   : 0, 11119397
    master (11.119M)   : 0, 11118546
    benchmarks/2.9.0 (11.081M)   : 0, 11080577

    section Automatic
    This PR (6147) (7.426M)   : 0, 7425613
    master (7.321M)   : 0, 7321152
    benchmarks/2.9.0 (7.732M)   : 0, 7732233

    section Trace stats
    master (7.757M)   : 0, 7756918

    section Manual
    master (11.210M)   : 0, 11209846

    section Manual + Automatic
    This PR (6147) (6.821M)   : 0, 6821227
    master (6.740M)   : 0, 6740459

    section DD_TRACE_ENABLED=0
    master (10.109M)   : 0, 10109349

gantt
    title Throughput Linux arm64 (Total requests) 
    dateFormat  X
    axisFormat %s
    section Baseline
    This PR (6147) (9.609M)   : 0, 9608625
    master (9.561M)   : 0, 9560673
    benchmarks/2.9.0 (9.798M)   : 0, 9798067

    section Automatic
    This PR (6147) (6.520M)   : 0, 6520075
    master (6.578M)   : 0, 6578194

    section Trace stats
    master (6.868M)   : 0, 6867545

    section Manual
    master (9.568M)   : 0, 9568124

    section Manual + Automatic
    This PR (6147) (6.032M)   : 0, 6031910
    master (6.203M)   : 0, 6203435

    section DD_TRACE_ENABLED=0
    master (8.858M)   : 0, 8857806

gantt
    title Throughput Windows x64 (Total requests) 
    dateFormat  X
    axisFormat %s
    section Baseline
    This PR (6147) (10.313M)   : 0, 10313173
    master (9.889M)   : 0, 9889232
    benchmarks/2.9.0 (10.067M)   : 0, 10067315

    section Automatic
    This PR (6147) (6.686M)   : 0, 6685782
    master (6.396M)   : 0, 6396160
    benchmarks/2.9.0 (7.552M)   : 0, 7552193

    section Trace stats
    master (7.205M)   : 0, 7205159

    section Manual
    master (9.853M)   : 0, 9852595

    section Manual + Automatic
    This PR (6147) (6.263M)   : 0, 6262824
    master (5.939M)   : 0, 5939480

    section DD_TRACE_ENABLED=0
    master (9.222M)   : 0, 9221606

andrewlock · 2024-10-11T17:24:58Z

Benchmarks Report for tracer 🐌

Benchmarks for #6147 compared to master:

1 benchmarks are faster, with geometric mean 1.148
1 benchmarks have fewer allocations
1 benchmarks have more allocations

The following thresholds were used for comparing the benchmark speeds:

Mann–Whitney U test with statistical test for significance of 5%
Only results indicating a difference greater than 10% and 0.3 ns are considered.

Allocation changes below 0.5% are ignored.

Benchmark details

Benchmarks.Trace.ActivityBenchmark - Same speed ✔️ Same allocations ✔️

Raw results

Branch	Method	Toolchain	Mean	StdError	StdDev	Gen 0	Gen 1	Gen 2	Allocated
master	`StartStopWithChild`	net6.0	7.8μs	43.4ns	310ns	0.0152	0.00758	0	5.43 KB
master	`StartStopWithChild`	netcoreapp3.1	9.86μs	54.5ns	318ns	0.0242	0.00968	0	5.61 KB
master	`StartStopWithChild`	net472	16.1μs	59ns	228ns	1.02	0.311	0.0957	6.06 KB
#6147	`StartStopWithChild`	net6.0	7.62μs	42.8ns	290ns	0.0214	0.0129	0.00429	5.42 KB
#6147	`StartStopWithChild`	netcoreapp3.1	9.96μs	54.9ns	329ns	0.0147	0.00489	0	5.62 KB
#6147	`StartStopWithChild`	net472	16μs	37.8ns	146ns	1.02	0.294	0.0955	6.07 KB

Benchmarks.Trace.AgentWriterBenchmark - Same speed ✔️ Same allocations ✔️

Raw results

Branch	Method	Toolchain	Mean	StdError	StdDev	Gen 0	Allocated
master	`WriteAndFlushEnrichedTraces`	net6.0	478μs	127ns	459ns	0	2.7 KB
master	`WriteAndFlushEnrichedTraces`	netcoreapp3.1	653μs	493ns	1.91μs	0	2.7 KB
master	`WriteAndFlushEnrichedTraces`	net472	837μs	387ns	1.45μs	0.422	3.3 KB
#6147	`WriteAndFlushEnrichedTraces`	net6.0	473μs	138ns	517ns	0	2.7 KB
#6147	`WriteAndFlushEnrichedTraces`	netcoreapp3.1	653μs	522ns	2.02μs	0	2.7 KB
#6147	`WriteAndFlushEnrichedTraces`	net472	837μs	603ns	2.34μs	0.419	3.3 KB

Benchmarks.Trace.AspNetCoreBenchmark - Same speed ✔️ Same allocations ✔️

Raw results

Branch	Method	Toolchain	Mean	StdError	StdDev	Gen 0	Allocated
master	`SendRequest`	net6.0	206μs	1.55μs	15.5μs	0.212	18.45 KB
master	`SendRequest`	netcoreapp3.1	231μs	1.35μs	12.6μs	0.216	20.61 KB
master	`SendRequest`	net472	0.000903ns	0.000508ns	0.00197ns	0	0 b
#6147	`SendRequest`	net6.0	202μs	1.19μs	11.3μs	0.21	18.45 KB
#6147	`SendRequest`	netcoreapp3.1	239μs	1.52μs	15.1μs	0.232	20.61 KB
#6147	`SendRequest`	net472	0.000473ns	0.000331ns	0.00119ns	0	0 b

Benchmarks.Trace.CIVisibilityProtocolWriterBenchmark - Same speed ✔️ More allocations ⚠️

More allocations ⚠️ in #6147

Benchmark	Base Allocated	Diff Allocated	Change	Change %
Benchmarks.Trace.CIVisibilityProtocolWriterBenchmark.WriteAndFlushEnrichedTraces‑netcoreapp3.1	41.49 KB	41.71 KB	221 B	0.53%

Raw results

Branch	Method	Toolchain	Mean	StdError	StdDev	Gen 0	Gen 1	Gen 2	Allocated
master	`WriteAndFlushEnrichedTraces`	net6.0	598μs	3.4μs	23.6μs	0.563	0	0	41.7 KB
master	`WriteAndFlushEnrichedTraces`	netcoreapp3.1	767μs	4.24μs	26.2μs	0.383	0	0	41.49 KB
master	`WriteAndFlushEnrichedTraces`	net472	875μs	2.76μs	10.3μs	8.33	2.31	0.463	53.34 KB
#6147	`WriteAndFlushEnrichedTraces`	net6.0	606μs	3.33μs	20.3μs	0.568	0	0	41.71 KB
#6147	`WriteAndFlushEnrichedTraces`	netcoreapp3.1	706μs	3.74μs	21.8μs	0.355	0	0	41.71 KB
#6147	`WriteAndFlushEnrichedTraces`	net472	872μs	4.15μs	17.6μs	8.45	2.53	0.422	53.36 KB

Benchmarks.Trace.DbCommandBenchmark - Same speed ✔️ Same allocations ✔️

Raw results

Branch	Method	Toolchain	Mean	StdError	StdDev	Gen 0	Gen 1	Allocated
master	`ExecuteNonQuery`	net6.0	1.36μs	0.76ns	2.84ns	0.0143	0	1.02 KB
master	`ExecuteNonQuery`	netcoreapp3.1	1.79μs	1.54ns	5.76ns	0.0135	0	1.02 KB
master	`ExecuteNonQuery`	net472	2.12μs	1.71ns	6.61ns	0.157	0	987 B
#6147	`ExecuteNonQuery`	net6.0	1.36μs	0.817ns	2.95ns	0.0143	0	1.02 KB
#6147	`ExecuteNonQuery`	netcoreapp3.1	1.73μs	1.62ns	6.08ns	0.0138	0	1.02 KB
#6147	`ExecuteNonQuery`	net472	2.07μs	0.806ns	2.91ns	0.156	0.00104	987 B

Benchmarks.Trace.ElasticsearchBenchmark - Same speed ✔️ Same allocations ✔️

Raw results

Branch	Method	Toolchain	Mean	StdError	StdDev	Gen 0	Allocated
master	`CallElasticsearch`	net6.0	1.19μs	2.91ns	11.3ns	0.0135	976 B
master	`CallElasticsearch`	netcoreapp3.1	1.59μs	0.604ns	2.26ns	0.0128	976 B
master	`CallElasticsearch`	net472	2.59μs	1.14ns	4.4ns	0.157	995 B
master	`CallElasticsearchAsync`	net6.0	1.3μs	0.557ns	2.16ns	0.0136	952 B
master	`CallElasticsearchAsync`	netcoreapp3.1	1.66μs	0.745ns	2.79ns	0.0141	1.02 KB
master	`CallElasticsearchAsync`	net472	2.47μs	1.53ns	5.94ns	0.167	1.05 KB
#6147	`CallElasticsearch`	net6.0	1.3μs	1.64ns	6.13ns	0.0133	976 B
#6147	`CallElasticsearch`	netcoreapp3.1	1.55μs	0.625ns	2.34ns	0.0132	976 B
#6147	`CallElasticsearch`	net472	2.62μs	2.17ns	8.39ns	0.158	995 B
#6147	`CallElasticsearchAsync`	net6.0	1.29μs	0.556ns	2.08ns	0.0136	952 B
#6147	`CallElasticsearchAsync`	netcoreapp3.1	1.57μs	0.701ns	2.72ns	0.014	1.02 KB
#6147	`CallElasticsearchAsync`	net472	2.52μs	1.39ns	5.4ns	0.166	1.05 KB

Benchmarks.Trace.GraphQLBenchmark - Faster 🎉 Same allocations ✔️

Faster 🎉 in #6147

Benchmark	base/diff	Base Median (ns)	Diff Median (ns)	Modality
Benchmarks.Trace.GraphQLBenchmark.ExecuteAsync‑net6.0	1.148	1,379.44	1,202.07

Raw results

Branch	Method	Toolchain	Mean	StdError	StdDev	Gen 0	Allocated
master	`ExecuteAsync`	net6.0	1.38μs	1.03ns	3.99ns	0.0131	952 B
master	`ExecuteAsync`	netcoreapp3.1	1.67μs	1.8ns	6.74ns	0.0123	952 B
master	`ExecuteAsync`	net472	1.78μs	0.713ns	2.76ns	0.145	915 B
#6147	`ExecuteAsync`	net6.0	1.2μs	0.525ns	1.96ns	0.0132	952 B
#6147	`ExecuteAsync`	netcoreapp3.1	1.69μs	0.847ns	3.17ns	0.0129	952 B
#6147	`ExecuteAsync`	net472	1.78μs	0.595ns	2.3ns	0.145	915 B

Benchmarks.Trace.HttpClientBenchmark - Same speed ✔️ Fewer allocations 🎉

Fewer allocations 🎉 in #6147

Benchmark	Base Allocated	Diff Allocated	Change	Change %
Benchmarks.Trace.HttpClientBenchmark.SendAsync‑net472	3.15 KB	3.07 KB	-73 B	-2.32%

Raw results

Branch	Method	Toolchain	Mean	StdError	StdDev	Gen 0	Allocated
master	`SendAsync`	net6.0	4.18μs	2.36ns	9.13ns	0.031	2.22 KB
master	`SendAsync`	netcoreapp3.1	5.01μs	1.48ns	5.54ns	0.0378	2.76 KB
master	`SendAsync`	net472	7.91μs	1.26ns	4.72ns	0.498	3.15 KB
#6147	`SendAsync`	net6.0	4.31μs	6.64ns	24.8ns	0.0315	2.22 KB
#6147	`SendAsync`	netcoreapp3.1	5.03μs	2.1ns	8.15ns	0.0378	2.76 KB
#6147	`SendAsync`	net472	7.37μs	1.18ns	4.58ns	0.488	3.07 KB

Benchmarks.Trace.ILoggerBenchmark - Same speed ✔️ Same allocations ✔️

Raw results

Branch	Method	Toolchain	Mean	StdError	StdDev	Gen 0	Allocated
master	`EnrichedLog`	net6.0	1.51μs	0.841ns	3.03ns	0.0228	1.64 KB
master	`EnrichedLog`	netcoreapp3.1	2.2μs	1.81ns	6.75ns	0.0222	1.64 KB
master	`EnrichedLog`	net472	2.68μs	1.03ns	3.98ns	0.249	1.57 KB
#6147	`EnrichedLog`	net6.0	1.44μs	0.751ns	2.81ns	0.023	1.64 KB
#6147	`EnrichedLog`	netcoreapp3.1	2.26μs	1.06ns	3.98ns	0.0215	1.64 KB
#6147	`EnrichedLog`	net472	2.55μs	1.26ns	4.72ns	0.249	1.57 KB

Benchmarks.Trace.Log4netBenchmark - Same speed ✔️ Same allocations ✔️

Raw results

Branch	Method	Toolchain	Mean	StdError	StdDev	Gen 0	Gen 1	Allocated
master	`EnrichedLog`	net6.0	117μs	221ns	856ns	0	0	4.28 KB
master	`EnrichedLog`	netcoreapp3.1	121μs	273ns	1.02μs	0.0597	0	4.28 KB
master	`EnrichedLog`	net472	150μs	188ns	704ns	0.673	0.224	4.46 KB
#6147	`EnrichedLog`	net6.0	115μs	107ns	401ns	0.0586	0	4.28 KB
#6147	`EnrichedLog`	netcoreapp3.1	122μs	133ns	516ns	0	0	4.28 KB
#6147	`EnrichedLog`	net472	152μs	136ns	528ns	0.685	0.228	4.46 KB

Benchmarks.Trace.NLogBenchmark - Same speed ✔️ Same allocations ✔️

Raw results

Branch	Method	Toolchain	Mean	StdError	StdDev	Gen 0	Allocated
master	`EnrichedLog`	net6.0	3.08μs	1.03ns	3.98ns	0.0313	2.2 KB
master	`EnrichedLog`	netcoreapp3.1	4.23μs	1.69ns	6.56ns	0.0297	2.2 KB
master	`EnrichedLog`	net472	4.7μs	1.57ns	6.09ns	0.32	2.02 KB
#6147	`EnrichedLog`	net6.0	3.14μs	1.09ns	4.21ns	0.0297	2.2 KB
#6147	`EnrichedLog`	netcoreapp3.1	4.26μs	1.4ns	5.42ns	0.0296	2.2 KB
#6147	`EnrichedLog`	net472	5.01μs	1.2ns	4.49ns	0.32	2.02 KB

Benchmarks.Trace.RedisBenchmark - Same speed ✔️ Same allocations ✔️

Raw results

Branch	Method	Toolchain	Mean	StdError	StdDev	Gen 0	Gen 1	Allocated
master	`SendReceive`	net6.0	1.4μs	0.902ns	3.49ns	0.0161	0	1.14 KB
master	`SendReceive`	netcoreapp3.1	1.77μs	2.36ns	9.13ns	0.015	0	1.14 KB
master	`SendReceive`	net472	2.15μs	0.688ns	2.48ns	0.184	0.00107	1.16 KB
#6147	`SendReceive`	net6.0	1.37μs	0.694ns	2.69ns	0.0156	0	1.14 KB
#6147	`SendReceive`	netcoreapp3.1	1.77μs	0.55ns	2.13ns	0.0149	0	1.14 KB
#6147	`SendReceive`	net472	2.15μs	0.939ns	3.64ns	0.183	0	1.16 KB

Benchmarks.Trace.SerilogBenchmark - Same speed ✔️ Same allocations ✔️

Raw results

Branch	Method	Toolchain	Mean	StdError	StdDev	Gen 0	Allocated
master	`EnrichedLog`	net6.0	2.55μs	0.732ns	2.83ns	0.0226	1.6 KB
master	`EnrichedLog`	netcoreapp3.1	3.94μs	1.57ns	5.87ns	0.022	1.65 KB
master	`EnrichedLog`	net472	4.36μs	1.31ns	4.89ns	0.322	2.04 KB
#6147	`EnrichedLog`	net6.0	2.84μs	1.23ns	4.62ns	0.0226	1.6 KB
#6147	`EnrichedLog`	netcoreapp3.1	3.82μs	2.55ns	9.89ns	0.021	1.65 KB
#6147	`EnrichedLog`	net472	4.49μs	2.58ns	9.67ns	0.324	2.04 KB

Benchmarks.Trace.SpanBenchmark - Same speed ✔️ Same allocations ✔️

Raw results

Branch	Method	Toolchain	Mean	StdError	StdDev	Gen 0	Allocated
master	`StartFinishSpan`	net6.0	400ns	0.414ns	1.6ns	0.00801	576 B
master	`StartFinishSpan`	netcoreapp3.1	586ns	0.379ns	1.47ns	0.00756	576 B
master	`StartFinishSpan`	net472	720ns	0.739ns	2.86ns	0.0917	578 B
master	`StartFinishScope`	net6.0	486ns	0.378ns	1.46ns	0.00984	696 B
master	`StartFinishScope`	netcoreapp3.1	762ns	4.05ns	19.8ns	0.00946	696 B
master	`StartFinishScope`	net472	859ns	0.956ns	3.7ns	0.104	658 B
#6147	`StartFinishSpan`	net6.0	416ns	0.311ns	1.2ns	0.00809	576 B
#6147	`StartFinishSpan`	netcoreapp3.1	619ns	0.726ns	2.81ns	0.00768	576 B
#6147	`StartFinishSpan`	net472	725ns	0.761ns	2.95ns	0.0918	578 B
#6147	`StartFinishScope`	net6.0	493ns	0.749ns	2.9ns	0.00982	696 B
#6147	`StartFinishScope`	netcoreapp3.1	791ns	0.576ns	2.23ns	0.0091	696 B
#6147	`StartFinishScope`	net472	857ns	0.633ns	2.37ns	0.104	658 B

Benchmarks.Trace.TraceAnnotationsBenchmark - Same speed ✔️ Same allocations ✔️

Raw results

Branch	Method	Toolchain	Mean	StdError	StdDev	Gen 0	Allocated
master	`RunOnMethodBegin`	net6.0	659ns	1.67ns	6.47ns	0.00979	696 B
master	`RunOnMethodBegin`	netcoreapp3.1	942ns	1.28ns	4.44ns	0.00906	696 B
master	`RunOnMethodBegin`	net472	1.08μs	0.95ns	3.68ns	0.104	658 B
#6147	`RunOnMethodBegin`	net6.0	660ns	0.409ns	1.58ns	0.00997	696 B
#6147	`RunOnMethodBegin`	netcoreapp3.1	916ns	0.575ns	2.23ns	0.00931	696 B
#6147	`RunOnMethodBegin`	net472	1.1μs	0.651ns	2.52ns	0.104	658 B

tracer/src/Datadog.Trace/ClrProfiler/CallTarget/CallTargetInvoker.cs

zacharycmontoya

LGTM, just one question on a code comment for future readers. Great fix!

## Summary of changes - Adds the process ID to the managed logs filename - ~Enables buffering in .NET Core~ Reverted this change, as not required and caused test failures. Should be invetigated later ## Reason for change While investigating #6147, we discovered that if you have two app pools, running under different users, the second app domain is unable to write managed logs. This is because it's unable to hold the shared mutex that's used to synchronize access to the file. ## Implementation details Include the pID in the filename, so each process writes to a separate file. <details><summary>I originally enabled buffering, but removed it</summary> As a consequence, if we only have one tracer instance writing to a managed file, then we no longer need to have a shared logger, and can enable buffering, which should improve performance by buffering writes to the file. <ul> <li>We can't assume this in .NET Framework, as we can have multiple app domains writing to the same file, so this is only enabled for .NET Core.</li> <li>_Theoretically_ this is a problem in version conflict, where we could have a v2 tracer loaded at the same time as a v3 tracer. However, as this PR changes the file name to differ from v2, that's not an issue, so we should still be safe</li> <li>For the buffering, we need to choose a flush interval. I plucked 1s out of the sky - any other suggestions?</li> </details> However, I reverted this change as it broke the dynamic instrumentation tests, because they ended up reading partial lines. We _should_ fix this so that we can get the perf benefit of buffering etc, but it's tangential to the bug fix, so removed for now. ## Test coverage Yeah... the existing integration tests obviously write a bunch of logs... any suggestion for additional tests we should add would be gratefully received... ## Other details I also updated the file-rolling behaviour. Having daily file rolling _in addition_ to split by pid _and_ file size limit seemed more confusing than anything else. - For customers that have multi-day-long processes with high log volume, the results should be broadly the same, as they're limited by rolling file size - For customers that have multi-day-long processes with low log volume, log volume overall may increase, as they won't roll _until_ they hit the log limit? I think, this one I'm struggling to visualize - For customers with short-lived processes, they will have more _files_ but these will use the same amount of space.

bouwkast

Wow thanks a lot for all of the detailed descriptions here!

Wondering if we should somehow add this reproduction app to the repo? Can look at that later though.

andrewlock · 2024-10-22T16:39:13Z

Wondering if we should somehow add this reproduction app to the repo? Can look at that later though.

Absolutely @bouwkast, I have started looking at this, but it's a bit of a pain, and didn't want to block the fix going in given it's manually tested 😅 I will work on adding the repro + test subsequently ASAP though

## Summary of changes Fixes the "LoaderOptimization" IIS tests ## Reason for change While working on a separate issue, discovered that the LoaderOptimzation tests (in the `msi_integration_tests` stage aren't testing anything because the MSI isn't _actually_ being installed 🤦 ## Implementation details The "fix" is - `dotnet_tracer_msi` => `DOTNET_TRACER_MSI` in the `docker compose build` step - Add the msi to the `.dockerignore` so that it's actually added to the container 😬 Also took the chance to refactor this a little too (in preparation for a stacked PR) - Map the logs directory inside the container into `build_data/logs` - Rename the generic "IIS" name in the docker file/docker-compose to flag that it's explicitly for testing the LoaderOptimization behaviour - Added a check to the end of the smoke tests that makes sure we have some tracing logs. We could be more specific about this, run an agent inside the container etc, but this is better than nothing and is simple for now ## Test coverage Discovered this wasn't working in a separate PR, so as long as the tests pass (and the artifacts for the msi stage confirm we _do_ have logs, then we should be good ## Other details Prerequiste for testing the behaviour [in this PR](#6147)

andrewlock · 2024-11-01T10:45:21Z

Wondering if we should somehow add this reproduction app to the repo? Can look at that later though.

FYI @bouwkast, I added this in:

Add smoke test for config builder instrumentation issue #6224

## Summary of changes Adds a smoke test for the issue fixed in #6147 ## Reason for change The issue in #6147 is complex and subtle, with a high risk of regression, so we want to have smoke tests to catch it` ## Implementation details Add a sample app very similar to the one tested with in #6147. Without the fix, [the test fails with](https://dev.azure.com/datadoghq/dd-trace-dotnet/_build/results?buildId=166792&view=results) ```html Description: An error occurred during the processing of a configuration file required to service this request. Please review the specific error details below and modify your configuration file appropriately. Parser Error Message: The configBuilder 'CustomBuilder' failed while processing the configuration section 'appSettings'.: The ConfigurationBuilder 'CustomBuilder[Microsoft.Configuration.ConfigurationBuilders.CustomConfigBuilder]' has recursively re-entered processing of the 'appSettings' section. ``` After the fix (now merged) the tests pass ## Test coverage This is primarily a smoke test, so we confirm - The sites are running and responding to HTTP requests correctly - We instrument the application (it has logs) - There are no errors in the logs ## Other details For some reason, I had massive issues getting the app to "recognize" the `DD_TRACE_AGENT_URL=http://test-agent.windows:8126` env var setting that's required to talk to the test agent. I have no idea why, and the only resolution I could find was to use powershell to set the variables inside the docker image --------- Co-authored-by: Kevin Gosse <[email protected]>

@daniel-romano-DD

…trust (#6290) ## Summary of changes Update the `CallTargetInvoker` to explicitly bail-out if partial-trust is detected. ## Reason for change We don't support partial trust. In the native profiler, we have code that explicitly bails out if we're in partial trust. Unfortunately... IIS. In IIS, we could have two apps running in the same app pool—one in full trust, one in partial trust. The full trust app will be instrumented as normal. Unfortunately, because they're running in the same app pool, _the partial trust app is effectively also instrumented_. In this scenario, we'll likely create errors in the partial trust app (we have seen `MemberAccessException`s for example), and we still can't _really_ work there. ## Implementation details The problem is essentially the same as a recent fixed issue (#6147) - multiple apps in the same pool causing oddities with instrumentation. The fix in this PR piggy-backs in the work there, essentially - We check whether the app domain is in partial trust. - If it is, we bail out of running any CallTarget instrumentation. This should solve the issue for calltarget and for manual instrumentation (which is just calltarget in v3). The issue remains for callsite instrumentation unfortunately, and likely cannot easily be mitigated @daniel-romano-DD ## Test coverage _sigh_ I spent all day trying to create an aspnet app that I could run in partial trust. All day. I failed. If anyone can produce one that will run in IIS under partial trust, then I have the setup prepared. Unfortunately, as I can't even pass that low bar, I can't even _manually_ test this 🙁 All I can confirm is that it doesn't affect "full trust" applications that we use in CI 😅 ## Other details Seriously, can _anyone_ get a partial trust app running? 🤷‍♂️

andrewlock added type:bug area:automatic-instrumentation Automatic instrumentation managed C# code (Datadog.Trace.ClrProfiler.Managed) labels Oct 11, 2024

andrewlock requested a review from a team as a code owner October 11, 2024 14:30

andrewlock added 2 commits October 15, 2024 17:39

Add workaround for configbuilder issue

253692f

hacking some stuff

69ab96d

andrewlock force-pushed the andrew/fix-recurrsion-in-aspnet branch from 06a7a15 to 69ab96d Compare October 15, 2024 16:41

zacharycmontoya reviewed Oct 16, 2024

View reviewed changes

tracer/src/Datadog.Trace/ClrProfiler/CallTarget/CallTargetInvoker.cs Outdated Show resolved Hide resolved

zacharycmontoya approved these changes Oct 16, 2024

View reviewed changes

andrewlock added the status:do-not-merge Work is done. Can review, but do not merge yet. label Oct 16, 2024

andrewlock mentioned this pull request Oct 17, 2024

Include pID in managed logs filename #6161

Merged

Add fix for multiple-apps per pool scenario

8b4c8b6

andrewlock requested review from a team as code owners October 22, 2024 12:38

bouwkast approved these changes Oct 22, 2024

View reviewed changes

lucaspimentel approved these changes Oct 22, 2024

View reviewed changes

kevingosse approved these changes Oct 23, 2024

View reviewed changes

andrewlock mentioned this pull request Oct 28, 2024

Fix IIS LoaderOptimisation tests not testing anything #6209

Merged

andrewlock merged commit e8cd262 into master Oct 31, 2024
62 checks passed

andrewlock deleted the andrew/fix-recurrsion-in-aspnet branch October 31, 2024 16:11

github-actions bot added this to the vNext-v3 milestone Oct 31, 2024

andrewlock added identified-by:customer and removed status:do-not-merge Work is done. Can review, but do not merge yet. labels Oct 31, 2024

andrewlock mentioned this pull request Oct 31, 2024

Add smoke test for config builder instrumentation issue #6224

Merged

andrewlock mentioned this pull request Nov 14, 2024

Ensure that we never run any call target instrumentations in partial trust #6290

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add workaround for ASP.NET ConfigBuilder issue #6147

Add workaround for ASP.NET ConfigBuilder issue #6147

andrewlock commented Oct 11, 2024 •

edited

Loading

datadog-ddstaging bot commented Oct 11, 2024 •

edited

Loading

andrewlock commented Oct 11, 2024 •

edited

Loading

andrewlock commented Oct 11, 2024 •

edited

Loading

andrewlock commented Oct 11, 2024 •

edited

Loading

Raw results

Raw results

Raw results

More allocations ⚠️ in #6147

Raw results

Raw results

Raw results

Faster 🎉 in #6147

Raw results

Fewer allocations 🎉 in #6147

Raw results

Raw results

Raw results

Raw results

Raw results

Raw results

Raw results

Raw results

zacharycmontoya left a comment

bouwkast left a comment

andrewlock commented Oct 22, 2024

andrewlock commented Nov 1, 2024

Add workaround for ASP.NET ConfigBuilder issue #6147

Add workaround for ASP.NET ConfigBuilder issue #6147

Conversation

andrewlock commented Oct 11, 2024 • edited Loading

Summary of changes

Reason for change

Implementation details

Test coverage

Other details

datadog-ddstaging bot commented Oct 11, 2024 • edited Loading

Datadog Report

andrewlock commented Oct 11, 2024 • edited Loading

Execution-Time Benchmarks Report ⏱️

andrewlock commented Oct 11, 2024 • edited Loading

Throughput/Crank Report ⚡

andrewlock commented Oct 11, 2024 • edited Loading

Benchmarks Report for tracer 🐌

Benchmark details

Raw results

Raw results

Raw results

More allocations ⚠️ in #6147

Raw results

Raw results

Raw results

Faster 🎉 in #6147

Raw results

Fewer allocations 🎉 in #6147

Raw results

Raw results

Raw results

Raw results

Raw results

Raw results

Raw results

Raw results

zacharycmontoya left a comment

Choose a reason for hiding this comment

bouwkast left a comment

Choose a reason for hiding this comment

andrewlock commented Oct 22, 2024

andrewlock commented Nov 1, 2024

andrewlock commented Oct 11, 2024 •

edited

Loading

datadog-ddstaging bot commented Oct 11, 2024 •

edited

Loading

andrewlock commented Oct 11, 2024 •

edited

Loading

andrewlock commented Oct 11, 2024 •

edited

Loading

andrewlock commented Oct 11, 2024 •

edited

Loading