Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Duration is not being counted for properly [.NET Framework 4.8 Metrics & Telemetry Module] #1431

Open
1 of 2 tasks
carlos-neto-maersk opened this issue Nov 10, 2023 · 9 comments
Labels
comp:instrumentation.aspnet.telemetryhttpmodule Things related to OpenTelemetry.Instrumentation.AspNet.TelemetryHttpModule

Comments

@carlos-neto-maersk
Copy link

carlos-neto-maersk commented Nov 10, 2023

Issue with OpenTelemetry.Instrumentation.AspNet.TelemetryHttpModule

List of all OpenTelemetry NuGet
packages
and version that you are
using (e.g. OpenTelemetry 1.3.2):

  <package id="OpenTelemetry.Api" version="1.6.0" targetFramework="net48" />
  <package id="OpenTelemetry.Api.ProviderBuilderExtensions" version="1.6.0" targetFramework="net48" />
  <package id="OpenTelemetry.Exporter.Console" version="1.6.0" targetFramework="net48" />
  <package id="OpenTelemetry.Exporter.OpenTelemetryProtocol" version="1.5.0-rc.1" targetFramework="net48" />
  <package id="OpenTelemetry.Exporter.OpenTelemetryProtocol.Logs" version="1.5.0-rc.1" targetFramework="net48" />
  <package id="OpenTelemetry.Extensions.Hosting" version="1.6.0" targetFramework="net48" />
  <package id="OpenTelemetry.Instrumentation.AspNet" version="1.6.0-beta.1.20" targetFramework="net48" />
  <package id="OpenTelemetry.Instrumentation.AspNet.TelemetryHttpModule" version="1.6.0-beta.1.22" targetFramework="net48" />
  <package id="OpenTelemetry.Instrumentation.Http" version="1.6.0-beta.2" targetFramework="net48" />
  <package id="OpenTelemetry.Instrumentation.SqlClient" version="1.5.1-beta.1" targetFramework="net48" />
  <package id="OpenTelemetry.PersistentStorage.Abstractions" version="1.0.0" targetFramework="net48" />
  <package id="OpenTelemetry.PersistentStorage.FileSystem" version="1.0.0" targetFramework="net48" />

OpenTelemetry.Instrumentation.AspNet and OpenTelemetry.Instrumentation.AspNet.TelemetryHttpModule is a packed by us version that contains the code in main + the merged code from #1425 and #1407 and #1388.

Runtime version (e.g. net462, net48, net6.0, net7.0 etc. You can
find this information from the *.csproj file):

  • net48

Is this a feature request or a bug?

  • Feature Request
  • Bug

What is the expected behavior?

What do you expect to see?
Request duration being properly logged and average, p95 times do not have strange pattern.
We are running unpublished but already merged code from #1425 and #1407 and #1388.
These were all done by the same author, whom I would like to leave the biggest word of gratitude because it is being so helpful.

What is the actual behavior?

What did you see instead?

We can see on Azure App service metrics that everything is okay, but App insights, fed by OpenTelemetry exporter, is reporting this pattern on the performance view. We also see a lot of requests being exported with a duration of ~ 1ms, with their span children having the correct duration traced.
Screenshot 2023-11-10 at 15 53 55
Screenshot 2023-11-10 at 15 54 20

Additional Context

I'm looking at the ActivityHelper.StopAspNetActivity as the possible root cause. If I'm able to find any way to fix it, I'll open a PR here for you guys. In the meanwhile, help would be very much appreciated!

@carlos-neto-maersk carlos-neto-maersk added the comp:instrumentation.aspnet.telemetryhttpmodule Things related to OpenTelemetry.Instrumentation.AspNet.TelemetryHttpModule label Nov 10, 2023
@carlos-neto-maersk
Copy link
Author

@qhris @Kielek fyi

@carlos-neto-maersk carlos-neto-maersk changed the title Duration is not being counted for properly Duration is not being counted for properly [.NET Framework 4.8 Metrics & Telemetry Module] Nov 14, 2023
@vishweshbankwar
Copy link
Member

@carlos-neto-maersk The change proposed in #1432 will reintroduce the issue fixed in #1388. Its not clear how activity duration is related to your issue. Will need a sample which repros the behavior you are noticing.

on the side note - are you enabling sampling by any chance?

@qhris
Copy link
Contributor

qhris commented Nov 14, 2023

Hey @carlos-neto-maersk, I tried to reproduce this locally but was unsuccessful (from #1429).
Any chance to provide us with a reduced code sample of how you configure the OpenTelemetry SDK in your application?

The activity is meant to be closed such that the exporters will run after the activity/metric has its attributes populated. Changing the order means that simple exporters can emit metrics without the attributes set.

@carlos-neto-maersk
Copy link
Author

I was looking for #1388 because I felt that the work there is very important for knowledge, particularly @qhris comment #1388 (comment) that allowed me to further understand more about what was happening.
#1388 also introduced the bug fixed on #1425, that led me into adding the code proposed here.

I will try to create the mvp to share with you guys ASAP.

on the side note - are you enabling sampling by any chance?

No! We haven't enabled sampling.

@carlos-neto-maersk
Copy link
Author

carlos-neto-maersk commented Nov 15, 2023

I've created a repo for the MRE.

Like I've stated before, we were not using the [email protected] but a version based on it with the code from #1407 (with the added tags similar to #1429 ) and #1425.

With the fix proposed in the PR, we got normal results of the load test with the load profile that we currently have (2,5M req/4 hours)

@qhris
Copy link
Contributor

qhris commented Nov 15, 2023

That link does not work for me to the repo, can you check it again?

It's great that you got it fixed on your end, it clearly means something is going sideways. There's some concerns I have with the PR but I'll leave those comments there.

@carlos-neto-maersk
Copy link
Author

Sorry @qhris , noob me made the repo private. Its public now.

@qhris
Copy link
Contributor

qhris commented Nov 16, 2023

Thanks for that! On a cursory glance that does look correct. I don't have time until later this week to check it out more thoroughly however.

@carlos-neto-maersk
Copy link
Author

I've been thinking about this, since the unit tests seem to cover the condition over which I'm proposing changes to.

Can it might be the ActivityHelper.RestoreContextIfNeeded code (here) that is not covering a high load/multiple processes (we are working with 18 instances of the app)?

I know it should be working... Because different processes have different heaps, and even if ASP.NET can handle multiple async requests it wouldn't await for anything on the Application_EndRequest cycle, so it wouldn't context switch to other request Application_OnExecuteRequestStep. But I think it is somewhere around here the root cause.

Anyway, like I said, I'm not an ASP.NET expert so some of this might be wrongly interpreted.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
comp:instrumentation.aspnet.telemetryhttpmodule Things related to OpenTelemetry.Instrumentation.AspNet.TelemetryHttpModule
Projects
None yet
3 participants