-
Notifications
You must be signed in to change notification settings - Fork 141
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Include extra tags in error logs #6364
base: master
Are you sure you want to change the base?
Conversation
Technically a no-op, because ci vis currently records them directly in the global config, but follows the existing pattern and sets us up. I considered just adding it to the `TelemetryProductType` - but it's not a valid option in telemetry intake so thought we better not.
Execution-Time Benchmarks Report ⏱️Execution-time results for samples comparing the following branches/commits: Execution-time benchmarks measure the whole time it takes to execute a program. And are intended to measure the one-off costs. Cases where the execution time results for the PR are worse than latest master results are shown in red. The following thresholds were used for comparing the execution times:
Note that these results are based on a single point-in-time result for each branch. For full results, see the dashboard. Graphs show the p99 interval based on the mean and StdDev of the test run, as well as the mean value of the run (shown as a diamond below the graph). gantt
title Execution time (ms) FakeDbCommand (.NET Framework 4.6.2)
dateFormat X
axisFormat %s
todayMarker off
section Baseline
This PR (6364) - mean (70ms) : 64, 76
. : milestone, 70,
master - mean (69ms) : 65, 74
. : milestone, 69,
section CallTarget+Inlining+NGEN
This PR (6364) - mean (1,023ms) : 935, 1112
. : milestone, 1023,
master - mean (976ms) : 953, 999
. : milestone, 976,
gantt
title Execution time (ms) FakeDbCommand (.NET Core 3.1)
dateFormat X
axisFormat %s
todayMarker off
section Baseline
This PR (6364) - mean (118ms) : 110, 126
. : milestone, 118,
master - mean (108ms) : 106, 110
. : milestone, 108,
section CallTarget+Inlining+NGEN
This PR (6364) - mean (701ms) : 666, 736
. : milestone, 701,
master - mean (678ms) : 659, 697
. : milestone, 678,
gantt
title Execution time (ms) FakeDbCommand (.NET 6)
dateFormat X
axisFormat %s
todayMarker off
section Baseline
This PR (6364) - mean (97ms) : 93, 101
. : milestone, 97,
master - mean (92ms) : 89, 94
. : milestone, 92,
section CallTarget+Inlining+NGEN
This PR (6364) - mean (654ms) : 632, 677
. : milestone, 654,
master - mean (633ms) : 614, 651
. : milestone, 633,
gantt
title Execution time (ms) HttpMessageHandler (.NET Framework 4.6.2)
dateFormat X
axisFormat %s
todayMarker off
section Baseline
This PR (6364) - mean (190ms) : 186, 194
. : milestone, 190,
master - mean (190ms) : 186, 194
. : milestone, 190,
section CallTarget+Inlining+NGEN
This PR (6364) - mean (1,101ms) : 1071, 1131
. : milestone, 1101,
master - mean (1,092ms) : 1068, 1116
. : milestone, 1092,
gantt
title Execution time (ms) HttpMessageHandler (.NET Core 3.1)
dateFormat X
axisFormat %s
todayMarker off
section Baseline
This PR (6364) - mean (276ms) : 272, 280
. : milestone, 276,
master - mean (276ms) : 272, 280
. : milestone, 276,
section CallTarget+Inlining+NGEN
This PR (6364) - mean (870ms) : 840, 900
. : milestone, 870,
master - mean (870ms) : 835, 905
. : milestone, 870,
gantt
title Execution time (ms) HttpMessageHandler (.NET 6)
dateFormat X
axisFormat %s
todayMarker off
section Baseline
This PR (6364) - mean (265ms) : 262, 269
. : milestone, 265,
master - mean (265ms) : 261, 269
. : milestone, 265,
section CallTarget+Inlining+NGEN
This PR (6364) - mean (844ms) : 812, 875
. : milestone, 844,
master - mean (850ms) : 811, 888
. : milestone, 850,
|
Throughput/Crank Report ⚡Throughput results for AspNetCoreSimpleController comparing the following branches/commits: Cases where throughput results for the PR are worse than latest master (5% drop or greater), results are shown in red. Note that these results are based on a single point-in-time result for each branch. For full results, see one of the many, many dashboards! gantt
title Throughput Linux x64 (Total requests)
dateFormat X
axisFormat %s
section Baseline
This PR (6364) (11.237M) : 0, 11237278
master (10.997M) : 0, 10996550
benchmarks/2.9.0 (11.033M) : 0, 11032866
section Automatic
This PR (6364) (7.348M) : 0, 7347722
master (7.185M) : 0, 7184511
benchmarks/2.9.0 (7.786M) : 0, 7785853
section Trace stats
master (7.538M) : 0, 7538247
section Manual
master (11.069M) : 0, 11069430
section Manual + Automatic
This PR (6364) (6.835M) : 0, 6834721
master (6.738M) : 0, 6737616
section DD_TRACE_ENABLED=0
master (10.302M) : 0, 10301778
gantt
title Throughput Linux arm64 (Total requests)
dateFormat X
axisFormat %s
section Baseline
This PR (6364) (9.401M) : 0, 9400857
master (9.415M) : 0, 9414535
benchmarks/2.9.0 (9.495M) : 0, 9494821
section Automatic
This PR (6364) (6.196M) : 0, 6196154
master (6.358M) : 0, 6358468
section Trace stats
master (6.815M) : 0, 6815014
section Manual
master (9.493M) : 0, 9493123
section Manual + Automatic
This PR (6364) (5.980M) : 0, 5979606
master (5.926M) : 0, 5926110
section DD_TRACE_ENABLED=0
master (8.735M) : 0, 8735109
gantt
title Throughput Windows x64 (Total requests)
dateFormat X
axisFormat %s
section Baseline
This PR (6364) (8.832M) : 0, 8831630
master (10.430M) : 0, 10430317
benchmarks/2.9.0 (10.020M) : 0, 10019592
section Automatic
This PR (6364) (5.853M) : crit ,0, 5852892
master (6.844M) : 0, 6843537
benchmarks/2.9.0 (7.255M) : 0, 7255257
section Trace stats
master (7.470M) : 0, 7470195
section Manual
master (10.484M) : 0, 10483521
section Manual + Automatic
This PR (6364) (5.466M) : crit ,0, 5465643
master (6.451M) : 0, 6451333
section DD_TRACE_ENABLED=0
master (9.895M) : 0, 9894875
|
Datadog ReportBranch report: ✅ 0 Failed, 454162 Passed, 2732 Skipped, 19h 13m 25s Total Time |
Benchmarks Report for tracer 🐌Benchmarks for #6364 compared to master:
The following thresholds were used for comparing the benchmark speeds:
Allocation changes below 0.5% are ignored. Benchmark detailsBenchmarks.Trace.ActivityBenchmark - Same speed ✔️ Same allocations ✔️Raw results
Benchmarks.Trace.AgentWriterBenchmark - Same speed ✔️ Same allocations ✔️Raw results
Benchmarks.Trace.AspNetCoreBenchmark - Same speed ✔️ Same allocations ✔️Raw results
Benchmarks.Trace.CIVisibilityProtocolWriterBenchmark - Same speed ✔️ Same allocations ✔️Raw results
Benchmarks.Trace.DbCommandBenchmark - Same speed ✔️ Same allocations ✔️Raw results
Benchmarks.Trace.ElasticsearchBenchmark - Same speed ✔️ Same allocations ✔️Raw results
Benchmarks.Trace.GraphQLBenchmark - Same speed ✔️ Same allocations ✔️Raw results
Benchmarks.Trace.HttpClientBenchmark - Same speed ✔️ Same allocations ✔️Raw results
Benchmarks.Trace.ILoggerBenchmark - Same speed ✔️ Same allocations ✔️Raw results
Benchmarks.Trace.Log4netBenchmark - Same speed ✔️ Same allocations ✔️Raw results
Benchmarks.Trace.NLogBenchmark - Same speed ✔️ Same allocations ✔️Raw results
Benchmarks.Trace.RedisBenchmark - Same speed ✔️ Same allocations ✔️Raw results
Benchmarks.Trace.SerilogBenchmark - Same speed ✔️ Same allocations ✔️Raw results
Benchmarks.Trace.SpanBenchmark - Same speed ✔️ Same allocations ✔️Raw results
Benchmarks.Trace.TraceAnnotationsBenchmark - Slower
|
Benchmark | diff/base | Base Median (ns) | Diff Median (ns) | Modality |
---|---|---|---|---|
Benchmarks.Trace.TraceAnnotationsBenchmark.RunOnMethodBegin‑net472 | 1.149 | 1,047.84 | 1,203.67 |
Benchmark | base/diff | Base Median (ns) | Diff Median (ns) | Modality |
---|---|---|---|---|
Benchmarks.Trace.TraceAnnotationsBenchmark.RunOnMethodBegin‑netcoreapp3.1 | 1.184 | 1,002.50 | 846.99 |
Raw results
Branch | Method | Toolchain | Mean | StdError | StdDev | Gen 0 | Gen 1 | Gen 2 | Allocated |
---|---|---|---|---|---|---|---|---|---|
master | RunOnMethodBegin |
net6.0 | 646ns | 0.517ns | 1.93ns | 0.00975 | 0 | 0 | 696 B |
master | RunOnMethodBegin |
netcoreapp3.1 | 1μs | 0.992ns | 3.84ns | 0.00904 | 0 | 0 | 696 B |
master | RunOnMethodBegin |
net472 | 1.05μs | 0.503ns | 1.95ns | 0.104 | 0 | 0 | 658 B |
#6364 | RunOnMethodBegin |
net6.0 | 613ns | 0.211ns | 0.817ns | 0.00988 | 0 | 0 | 696 B |
#6364 | RunOnMethodBegin |
netcoreapp3.1 | 847ns | 1.08ns | 4.16ns | 0.00929 | 0 | 0 | 696 B |
#6364 | RunOnMethodBegin |
net472 | 1.2μs | 0.427ns | 1.65ns | 0.104 | 0 | 0 | 658 B |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Really nice. We should do this in all languages.
Also do you need to allow list the tags somewhere (as we do for metrics) or are they automatically forwarded to the logs in the end?
Last, I think a good follow up would be to have all logs contain a product prefix, we could enforce it. Cause here, you solve the ci
use case easily, but for ASM you still need an extra search to route to them.
{ IsRunningInGCPFunctions: true } => ",gcp", | ||
{ LambdaMetadata.IsRunningInLambda: true } => ",aws", | ||
{ IsRunningMiniAgentInAzureFunctions: true } => ",azf", | ||
{ IsRunningInAzureAppService: true } => ",aas", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIRC correctly, we don't have that info in telemetry config, and I think it would be awesome. Actually it would be even better in metrics.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIRC correctly, we don't have that info in telemetry config
I think we do actually 🤔 It's sent at a higher level via a header or something like that I believe... we also record the settings we use to determine that in telemetry config, but whether other languages do I wouldn't like to say...
I don't believe so. AFAICT (looking at the telemetry intake code) they're added directly to the destination logs (merged with standard tags)
The problem with this is that many logs won't be one product or another. or it'll be obvious anyway, because they're in the ASM namespace. Also, that will break all existing error tracking... but yeah, we could do it for the obvious ones 🤷♂️ |
Summary of changes
Includes details about products enabled in the telemetry error logs
Reason for change
Should make it easier to categorize errors knowing e.g. whether the error came from a ci-vis app etc.
Implementation details
The tags are written to every log message, so we need to keep the tags small as possible. I limited it to just 5 tags for now:
ci
- is CI Visibility enabledasm
- is ASM enabledprof
- is profiling enableddyn
- is dynamic instrumentation enabledgcp
/aws
/aas
/azf` - are we running in GCP/lambda/AAS/Azure functions (with mini agent)I think those cover the broadest categories for now. I opted for adding the enabled tag with a
1
/0
so we could explicitly see whether it was set or not (apart from the cloud functions which are only added when they apply).Alternatively, we could only add the tag if it's enabled (like I do for the "cloud" tag). That saves bytes, at the expense of some ambiguity: if the tags aren't there does that mean they definitely were none enabled, or the tagging didn't happen? Happy to switch though if people have a preference.
Test coverage
Unit tests. Will do a manual test to make sure there's no issues with the intake accepting them.
Other details
We currently weren't explicitly recording whether CI Vis was enabled, so I added a method to call to "record" the settings. This isn't really required given we can only get the enabled flag via a static property, but it potentially sets us up a bit more cleanly for refactoring later, and is closer to how other products work. I'm happy to be persuaded to remove that commit though