Skip the mass transit test to see if it solves flake issues (#5861 -> v2) #5911

andrewlock · 2024-08-16T07:12:18Z

Summary of changes

Skip the mass transit smoke test as it seems to be a cause of a lot of flakiness

Reason for change

We've seen a lot of errors in the CheckBuildlogsForErr stage:

CheckBuildLogsForErr: 03:08:39 [Error] An error occurred while sending data to the agent at http://127.0.0.1:39573/v0.4/traces. If the error isn't transient, please check https://docs.datadoghq.com/tracing/troubleshooting/connection_errors/?code-lang=dotnet for guidance. System.Net.Http.HttpRequestException: Error while copying content to a stream.

These seemed to get a lot worse after we disabled keep-alive, but that's anecdotal.

Implementation details

It's not entirely clear if the problem is just coincidentally related to the MassTransit test (i.e. it's a test ordering process) or if it's actually something about the test.

As a check I tried skipping the test in this branch and did 4 full (all TFM) integration tests runs, and didn't see the issue again. It's all still anecdotal, but rather trade off flakiness here. If the problem reappears subsequently, we can look into it again further.

Test coverage

Did 4 full runs, and didn't see the issue again

Other details

Backport of #5861 (as still getting a lot of flake on the release/2.x branch)

## Summary of changes Skip the mass transit smoke test as it seems to be a cause of a lot of flakiness ## Reason for change We've seen a lot of errors in the `CheckBuildlogsForErr` stage: ``` CheckBuildLogsForErr: 03:08:39 [Error] An error occurred while sending data to the agent at http://127.0.0.1:39573/v0.4/traces. If the error isn't transient, please check https://docs.datadoghq.com/tracing/troubleshooting/connection_errors/?code-lang=dotnet for guidance. System.Net.Http.HttpRequestException: Error while copying content to a stream. ``` These seemed to get a lot worse after we disabled keep-alive, but that's anecdotal. ## Implementation details It's not entirely clear if the problem is just coincidentally related to the MassTransit test (i.e. it's a test ordering process) or if it's actually something about the test. As a check I tried skipping the test in this branch and did 4 full (all TFM) integration tests runs, and didn't see the issue again. It's all still anecdotal, but rather trade off flakiness here. If the problem reappears subsequently, we can look into it again further. ## Test coverage Did 4 full runs, and didn't see the issue again

datadog-ddstaging · 2024-08-16T07:44:06Z

Datadog Report

Branch report: andrew/ci/masstransit-fix-backport
Commit report: ce7f3a8
Test service: dd-trace-dotnet

✅ 0 Failed, 353745 Passed, 1797 Skipped, 14h 35m 6.14s Total Time

andrewlock · 2024-08-16T07:45:18Z

Execution-Time Benchmarks Report ⏱️

Execution-time results for samples comparing the following branches/commits:

Execution-time benchmarks measure the whole time it takes to execute a program. And are intended to measure the one-off costs. Cases where the execution time results for the PR are worse than latest master results are shown in red. The following thresholds were used for comparing the execution times:

Welch test with statistical test for significance of 5%
Only results indicating a difference greater than 5% and 5 ms are considered.

Note that these results are based on a single point-in-time result for each branch. For full results, see the dashboard.

Graphs show the p99 interval based on the mean and StdDev of the test run, as well as the mean value of the run (shown as a diamond below the graph).

gantt
    title Execution time (ms) FakeDbCommand (.NET Framework 4.6.2) 
    dateFormat  X
    axisFormat %s
    todayMarker off
    section Baseline
    This PR (5911) - mean (77ms)  : 62, 92
     .   : milestone, 77,

    section CallTarget+Inlining+NGEN
    This PR (5911) - mean (1,072ms)  : 1050, 1093
     .   : milestone, 1072,

gantt
    title Execution time (ms) FakeDbCommand (.NET Core 3.1) 
    dateFormat  X
    axisFormat %s
    todayMarker off
    section Baseline
    This PR (5911) - mean (111ms)  : 107, 114
     .   : milestone, 111,

    section CallTarget+Inlining+NGEN
    This PR (5911) - mean (782ms)  : 763, 801
     .   : milestone, 782,

gantt
    title Execution time (ms) FakeDbCommand (.NET 6) 
    dateFormat  X
    axisFormat %s
    todayMarker off
    section Baseline
    This PR (5911) - mean (94ms)  : 92, 97
     .   : milestone, 94,

    section CallTarget+Inlining+NGEN
    This PR (5911) - mean (723ms)  : 705, 742
     .   : milestone, 723,

gantt
    title Execution time (ms) HttpMessageHandler (.NET Framework 4.6.2) 
    dateFormat  X
    axisFormat %s
    todayMarker off
    section Baseline
    This PR (5911) - mean (191ms)  : 188, 195
     .   : milestone, 191,

    section CallTarget+Inlining+NGEN
    This PR (5911) - mean (1,155ms)  : 1128, 1182
     .   : milestone, 1155,

gantt
    title Execution time (ms) HttpMessageHandler (.NET Core 3.1) 
    dateFormat  X
    axisFormat %s
    todayMarker off
    section Baseline
    This PR (5911) - mean (277ms)  : 272, 281
     .   : milestone, 277,

    section CallTarget+Inlining+NGEN
    This PR (5911) - mean (947ms)  : 926, 969
     .   : milestone, 947,

gantt
    title Execution time (ms) HttpMessageHandler (.NET 6) 
    dateFormat  X
    axisFormat %s
    todayMarker off
    section Baseline
    This PR (5911) - mean (265ms)  : 260, 270
     .   : milestone, 265,

    section CallTarget+Inlining+NGEN
    This PR (5911) - mean (930ms)  : 908, 951
     .   : milestone, 930,

andrewlock · 2024-08-16T09:42:43Z

Benchmarks Report for tracer 🐌

Benchmarks for #5911 compared to master:

1 benchmarks are faster, with geometric mean 1.128
All benchmarks have the same allocations

The following thresholds were used for comparing the benchmark speeds:

Mann–Whitney U test with statistical test for significance of 5%
Only results indicating a difference greater than 10% and 0.3 ns are considered.

Allocation changes below 0.5% are ignored.

Benchmark details

Benchmarks.Trace.ActivityBenchmark - Same speed ✔️ Same allocations ✔️

Raw results

Branch	Method	Toolchain	Mean	StdError	StdDev	Gen 0	Gen 1	Gen 2	Allocated
master	`StartStopWithChild`	net6.0	7.87μs	42.6ns	319ns	0.0162	0.00808	0	5.42 KB
master	`StartStopWithChild`	netcoreapp3.1	9.85μs	54.6ns	349ns	0.0145	0.00966	0	5.62 KB
master	`StartStopWithChild`	net472	16μs	39.9ns	154ns	1.03	0.318	0.0955	6.07 KB
#5911	`StartStopWithChild`	net6.0	7.75μs	43.5ns	275ns	0.0184	0.00737	0	5.43 KB
#5911	`StartStopWithChild`	netcoreapp3.1	9.77μs	53.7ns	304ns	0.0145	0.00482	0	5.62 KB
#5911	`StartStopWithChild`	net472	16μs	43.2ns	167ns	1.02	0.297	0.0939	6.06 KB

Benchmarks.Trace.AgentWriterBenchmark - Same speed ✔️ Same allocations ✔️

Raw results

Branch	Method	Toolchain	Mean	StdError	StdDev	Gen 0	Allocated
master	`WriteAndFlushEnrichedTraces`	net6.0	472μs	288ns	1.12μs	0	2.7 KB
master	`WriteAndFlushEnrichedTraces`	netcoreapp3.1	632μs	224ns	838ns	0	2.7 KB
master	`WriteAndFlushEnrichedTraces`	net472	855μs	505ns	1.89μs	0.428	3.3 KB
#5911	`WriteAndFlushEnrichedTraces`	net6.0	483μs	397ns	1.54μs	0	2.7 KB
#5911	`WriteAndFlushEnrichedTraces`	netcoreapp3.1	639μs	277ns	1.07μs	0	2.7 KB
#5911	`WriteAndFlushEnrichedTraces`	net472	827μs	291ns	1.09μs	0.414	3.3 KB

Benchmarks.Trace.AspNetCoreBenchmark - Same speed ✔️ Same allocations ✔️

Raw results

Branch	Method	Toolchain	Mean	StdError	StdDev	Gen 0	Allocated
master	`SendRequest`	net6.0	192μs	1.02μs	6.79μs	0.184	18.45 KB
master	`SendRequest`	netcoreapp3.1	209μs	1.18μs	7.91μs	0.207	20.61 KB
master	`SendRequest`	net472	0.000596ns	0.000353ns	0.00132ns	0	0 b
#5911	`SendRequest`	net6.0	188μs	1.04μs	6.55μs	0.187	18.45 KB
#5911	`SendRequest`	netcoreapp3.1	215μs	1.23μs	11.3μs	0.218	20.61 KB
#5911	`SendRequest`	net472	0.00107ns	0.00058ns	0.00209ns	0	0 b

Benchmarks.Trace.CIVisibilityProtocolWriterBenchmark - Same speed ✔️ Same allocations ✔️

Raw results

Branch	Method	Toolchain	Mean	StdError	StdDev	Gen 0	Gen 1	Gen 2	Allocated
master	`WriteAndFlushEnrichedTraces`	net6.0	566μs	1.87μs	7.25μs	0.561	0	0	41.73 KB
master	`WriteAndFlushEnrichedTraces`	netcoreapp3.1	664μs	2.09μs	7.82μs	0.326	0	0	41.98 KB
master	`WriteAndFlushEnrichedTraces`	net472	855μs	4.18μs	17.7μs	8.63	2.47	0.411	53.28 KB
#5911	`WriteAndFlushEnrichedTraces`	net6.0	550μs	2.68μs	11.7μs	0.553	0	0	41.64 KB
#5911	`WriteAndFlushEnrichedTraces`	netcoreapp3.1	694μs	3.03μs	10.9μs	0.349	0	0	41.94 KB
#5911	`WriteAndFlushEnrichedTraces`	net472	844μs	3.84μs	14.9μs	8.08	2.55	0.425	53.31 KB

Benchmarks.Trace.DbCommandBenchmark - Same speed ✔️ Same allocations ✔️

Raw results

Branch	Method	Toolchain	Mean	StdError	StdDev	Gen 0	Allocated
master	`ExecuteNonQuery`	net6.0	1.17μs	0.924ns	3.58ns	0.0143	1.02 KB
master	`ExecuteNonQuery`	netcoreapp3.1	1.66μs	0.756ns	2.73ns	0.0133	1.02 KB
master	`ExecuteNonQuery`	net472	2.05μs	1.8ns	6.96ns	0.157	987 B
#5911	`ExecuteNonQuery`	net6.0	1.27μs	1.37ns	4.74ns	0.0146	1.02 KB
#5911	`ExecuteNonQuery`	netcoreapp3.1	1.8μs	2.66ns	9.97ns	0.0133	1.02 KB
#5911	`ExecuteNonQuery`	net472	2.04μs	2.67ns	10.4ns	0.157	987 B

Benchmarks.Trace.ElasticsearchBenchmark - Same speed ✔️ Same allocations ✔️

Raw results

Branch	Method	Toolchain	Mean	StdError	StdDev	Gen 0	Gen 1	Allocated
master	`CallElasticsearch`	net6.0	1.18μs	0.457ns	1.77ns	0.0136	0	976 B
master	`CallElasticsearch`	netcoreapp3.1	1.47μs	0.46ns	1.72ns	0.0133	0	976 B
master	`CallElasticsearch`	net472	2.51μs	1.61ns	6.23ns	0.158	0	995 B
master	`CallElasticsearchAsync`	net6.0	1.21μs	0.956ns	3.7ns	0.0133	0	952 B
master	`CallElasticsearchAsync`	netcoreapp3.1	1.59μs	0.972ns	3.64ns	0.0136	0	1.02 KB
master	`CallElasticsearchAsync`	net472	2.65μs	1.82ns	7.06ns	0.166	0	1.05 KB
#5911	`CallElasticsearch`	net6.0	1.16μs	1.28ns	4.79ns	0.0134	0	976 B
#5911	`CallElasticsearch`	netcoreapp3.1	1.51μs	2.72ns	9.82ns	0.0129	0	976 B
#5911	`CallElasticsearch`	net472	2.41μs	1.06ns	3.98ns	0.157	0.0012	995 B
#5911	`CallElasticsearchAsync`	net6.0	1.32μs	0.691ns	2.68ns	0.0132	0	952 B
#5911	`CallElasticsearchAsync`	netcoreapp3.1	1.69μs	0.823ns	3.19ns	0.0135	0	1.02 KB
#5911	`CallElasticsearchAsync`	net472	2.69μs	1.06ns	4.11ns	0.167	0	1.05 KB

Benchmarks.Trace.GraphQLBenchmark - Same speed ✔️ Same allocations ✔️

Raw results

Branch	Method	Toolchain	Mean	StdError	StdDev	Gen 0	Allocated
master	`ExecuteAsync`	net6.0	1.17μs	0.935ns	3.5ns	0.0134	952 B
master	`ExecuteAsync`	netcoreapp3.1	1.68μs	0.805ns	3.12ns	0.0125	952 B
master	`ExecuteAsync`	net472	1.81μs	1.03ns	3.85ns	0.145	915 B
#5911	`ExecuteAsync`	net6.0	1.26μs	1.01ns	3.77ns	0.0132	952 B
#5911	`ExecuteAsync`	netcoreapp3.1	1.58μs	0.394ns	1.47ns	0.0127	952 B
#5911	`ExecuteAsync`	net472	1.82μs	0.69ns	2.67ns	0.145	915 B

Benchmarks.Trace.HttpClientBenchmark - Same speed ✔️ Same allocations ✔️

Raw results

Branch	Method	Toolchain	Mean	StdError	StdDev	Gen 0	Allocated
master	`SendAsync`	net6.0	4.08μs	1.74ns	6.5ns	0.0305	2.22 KB
master	`SendAsync`	netcoreapp3.1	5.04μs	2.85ns	11.1ns	0.038	2.76 KB
master	`SendAsync`	net472	7.78μs	1.67ns	6.24ns	0.497	3.15 KB
#5911	`SendAsync`	net6.0	4.03μs	1.53ns	5.94ns	0.0303	2.22 KB
#5911	`SendAsync`	netcoreapp3.1	5.15μs	1.93ns	7.22ns	0.036	2.76 KB
#5911	`SendAsync`	net472	7.82μs	2.02ns	7.8ns	0.497	3.15 KB

Benchmarks.Trace.ILoggerBenchmark - Same speed ✔️ Same allocations ✔️

Raw results

Branch	Method	Toolchain	Mean	StdError	StdDev	Gen 0	Allocated
master	`EnrichedLog`	net6.0	1.5μs	0.663ns	2.48ns	0.0234	1.64 KB
master	`EnrichedLog`	netcoreapp3.1	2.12μs	0.486ns	1.75ns	0.0221	1.64 KB
master	`EnrichedLog`	net472	2.71μs	4.48ns	17.4ns	0.249	1.57 KB
#5911	`EnrichedLog`	net6.0	1.55μs	0.732ns	2.74ns	0.0226	1.64 KB
#5911	`EnrichedLog`	netcoreapp3.1	2.24μs	0.907ns	3.39ns	0.0224	1.64 KB
#5911	`EnrichedLog`	net472	2.77μs	2.28ns	8.84ns	0.249	1.57 KB

Benchmarks.Trace.Log4netBenchmark - Same speed ✔️ Same allocations ✔️

Raw results

Branch	Method	Toolchain	Mean	StdError	StdDev	Gen 0	Gen 1	Allocated
master	`EnrichedLog`	net6.0	114μs	163ns	630ns	0.0575	0	4.28 KB
master	`EnrichedLog`	netcoreapp3.1	120μs	146ns	566ns	0	0	4.28 KB
master	`EnrichedLog`	net472	149μs	159ns	615ns	0.673	0.224	4.46 KB
#5911	`EnrichedLog`	net6.0	117μs	208ns	779ns	0.0576	0	4.28 KB
#5911	`EnrichedLog`	netcoreapp3.1	119μs	240ns	931ns	0	0	4.28 KB
#5911	`EnrichedLog`	net472	147μs	117ns	422ns	0.658	0.219	4.46 KB

Benchmarks.Trace.NLogBenchmark - Same speed ✔️ Same allocations ✔️

Raw results

Branch	Method	Toolchain	Mean	StdError	StdDev	Gen 0	Allocated
master	`EnrichedLog`	net6.0	3.19μs	0.898ns	3.48ns	0.0303	2.2 KB
master	`EnrichedLog`	netcoreapp3.1	4.34μs	1.63ns	6.3ns	0.0298	2.2 KB
master	`EnrichedLog`	net472	4.83μs	1.27ns	4.75ns	0.319	2.02 KB
#5911	`EnrichedLog`	net6.0	3.02μs	0.53ns	2.05ns	0.0302	2.2 KB
#5911	`EnrichedLog`	netcoreapp3.1	4.14μs	1.07ns	3.86ns	0.029	2.2 KB
#5911	`EnrichedLog`	net472	5μs	1.41ns	5.45ns	0.321	2.02 KB

Benchmarks.Trace.RedisBenchmark - Same speed ✔️ Same allocations ✔️

Raw results

Branch	Method	Toolchain	Mean	StdError	StdDev	Gen 0	Gen 1	Allocated
master	`SendReceive`	net6.0	1.33μs	0.654ns	2.53ns	0.016	0	1.14 KB
master	`SendReceive`	netcoreapp3.1	1.78μs	1.33ns	5.13ns	0.0152	0	1.14 KB
master	`SendReceive`	net472	2.24μs	2.28ns	8.82ns	0.183	0.00112	1.16 KB
#5911	`SendReceive`	net6.0	1.34μs	0.497ns	1.86ns	0.0162	0	1.14 KB
#5911	`SendReceive`	netcoreapp3.1	1.75μs	1.75ns	6.57ns	0.0156	0	1.14 KB
#5911	`SendReceive`	net472	2.27μs	2.06ns	8ns	0.183	0	1.16 KB

Benchmarks.Trace.SerilogBenchmark - Same speed ✔️ Same allocations ✔️

Raw results

Branch	Method	Toolchain	Mean	StdError	StdDev	Gen 0	Allocated
master	`EnrichedLog`	net6.0	2.79μs	0.556ns	2.08ns	0.0222	1.6 KB
master	`EnrichedLog`	netcoreapp3.1	3.88μs	1.29ns	5ns	0.0214	1.65 KB
master	`EnrichedLog`	net472	4.43μs	1.19ns	4.46ns	0.323	2.04 KB
#5911	`EnrichedLog`	net6.0	2.88μs	0.83ns	3.11ns	0.0216	1.6 KB
#5911	`EnrichedLog`	netcoreapp3.1	3.81μs	3.52ns	13.6ns	0.0209	1.65 KB
#5911	`EnrichedLog`	net472	4.31μs	2.47ns	9.57ns	0.323	2.04 KB

Benchmarks.Trace.SpanBenchmark - Faster 🎉 Same allocations ✔️

Faster 🎉 in #5911

Benchmark	base/diff	Base Median (ns)	Diff Median (ns)	Modality
Benchmarks.Trace.SpanBenchmark.StartFinishScope‑net6.0	1.128	547.60	485.28

Raw results

Branch	Method	Toolchain	Mean	StdError	StdDev	Gen 0	Allocated
master	`StartFinishSpan`	net6.0	401ns	0.224ns	0.868ns	0.00809	576 B
master	`StartFinishSpan`	netcoreapp3.1	567ns	0.387ns	1.5ns	0.0079	576 B
master	`StartFinishSpan`	net472	598ns	0.647ns	2.5ns	0.0916	578 B
master	`StartFinishScope`	net6.0	547ns	0.227ns	0.878ns	0.00964	696 B
master	`StartFinishScope`	netcoreapp3.1	747ns	0.654ns	2.53ns	0.0093	696 B
master	`StartFinishScope`	net472	889ns	0.598ns	2.32ns	0.104	658 B
#5911	`StartFinishSpan`	net6.0	399ns	0.285ns	1.07ns	0.00802	576 B
#5911	`StartFinishSpan`	netcoreapp3.1	554ns	0.63ns	2.27ns	0.00768	576 B
#5911	`StartFinishSpan`	net472	627ns	0.515ns	2ns	0.0916	578 B
#5911	`StartFinishScope`	net6.0	485ns	0.385ns	1.49ns	0.00978	696 B
#5911	`StartFinishScope`	netcoreapp3.1	702ns	0.985ns	3.81ns	0.00919	696 B
#5911	`StartFinishScope`	net472	835ns	0.816ns	3.16ns	0.104	658 B

Benchmarks.Trace.TraceAnnotationsBenchmark - Same speed ✔️ Same allocations ✔️

Raw results

Branch	Method	Toolchain	Mean	StdError	StdDev	Gen 0	Allocated
master	`RunOnMethodBegin`	net6.0	652ns	0.672ns	2.6ns	0.00986	696 B
master	`RunOnMethodBegin`	netcoreapp3.1	929ns	0.986ns	3.82ns	0.00924	696 B
master	`RunOnMethodBegin`	net472	1.06μs	0.959ns	3.71ns	0.104	658 B
#5911	`RunOnMethodBegin`	net6.0	589ns	0.49ns	1.9ns	0.00971	696 B
#5911	`RunOnMethodBegin`	netcoreapp3.1	900ns	0.845ns	3.27ns	0.00947	696 B
#5911	`RunOnMethodBegin`	net472	1.09μs	1.38ns	5.35ns	0.104	658 B

andrewlock added area:builds project files, build scripts, pipelines, versioning, releases, packages area:tests unit tests, integration tests area:test-apps apps used to test integrations labels Aug 16, 2024

andrewlock requested a review from a team as a code owner August 16, 2024 07:12

bouwkast approved these changes Aug 16, 2024

View reviewed changes

andrewlock merged commit ed6c355 into release/2.x Aug 16, 2024
50 of 54 checks passed

andrewlock deleted the andrew/ci/masstransit-fix-backport branch August 16, 2024 12:37

github-actions bot added this to the vNext-v2 milestone Aug 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Skip the mass transit test to see if it solves flake issues (#5861 -> v2) #5911

Skip the mass transit test to see if it solves flake issues (#5861 -> v2) #5911

andrewlock commented Aug 16, 2024

datadog-ddstaging bot commented Aug 16, 2024 •

edited

Loading

andrewlock commented Aug 16, 2024

andrewlock commented Aug 16, 2024

Raw results

Raw results

Raw results

Raw results

Raw results

Raw results

Raw results

Raw results

Raw results

Raw results

Raw results

Raw results

Raw results

Faster 🎉 in #5911

Raw results

Raw results

Skip the mass transit test to see if it solves flake issues (#5861 -> v2) #5911

Skip the mass transit test to see if it solves flake issues (#5861 -> v2) #5911

Conversation

andrewlock commented Aug 16, 2024

Summary of changes

Reason for change

Implementation details

Test coverage

Other details

datadog-ddstaging bot commented Aug 16, 2024 • edited Loading

Datadog Report

andrewlock commented Aug 16, 2024

Execution-Time Benchmarks Report ⏱️

andrewlock commented Aug 16, 2024

Benchmarks Report for tracer 🐌

Benchmark details

Raw results

Raw results

Raw results

Raw results

Raw results

Raw results

Raw results

Raw results

Raw results

Raw results

Raw results

Raw results

Raw results

Faster 🎉 in #5911

Raw results

Raw results

datadog-ddstaging bot commented Aug 16, 2024 •

edited

Loading