Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create CLR metrics semantic convention #956

Closed
gregkalapos opened this issue Apr 24, 2024 · 17 comments · Fixed by #1035
Closed

Create CLR metrics semantic convention #956

gregkalapos opened this issue Apr 24, 2024 · 17 comments · Fixed by #1035
Assignees
Labels
area:system enhancement New feature or request experts needed This issue or pull request is outside an area where general approvers feel they can approve

Comments

@gregkalapos
Copy link
Member

Area(s)

area:system

Is your change request related to a problem? Please describe.

The idea is to create SemConv CLR metrics for .NET, similar to JVM metrics.

System and Process metrics cover lots of useful things that can be utilized by .NET already, but similarly to Java, there are relevant metrics not covered by currently defined general metrics. Examples:

  • GC metrics
  • Thread pool metrics
  • Metrics around loaded/unladed assemblies
  • Exception related metrics

Describe the solution you'd like

Add a clr.* namespace and define metrics for the list above. Everything defined there should be very general and applicable to most .NET applications and instrumentations should be able to collect those on most environments.

Describe alternatives you've considered

An alternative could be to look at the already defined .NET metrics and try to fit the CLR metrics there. I believe that would not be ideal since those metrics are specific to higher level frameworks. CLR metrics that I propose here are purely runtime metrics, not tied to any framework. In this sense these are more like the jvm metrics for Java.

Additional context

There are already .NET metrics defined. Those are not really runtime metrics from the CLR (the runtime itself), rather they are metrics for higher level components (like ASP.NET Core, SignalR, etc.). I think these two can co-exist. Whatever higher level frameworks emit, still can live in their specific namespaces (like aspnetcore.*, signalr.*, http.client.*), but pure runtime level metrics (like GC, etc - see list above) would be on a clr.* namespace. So I suggest there should be no change to the current .NET metrics and there should be a clr.* namespace additionally to pure CLR runtime metrics.

@gregkalapos gregkalapos added enhancement New feature or request experts needed This issue or pull request is outside an area where general approvers feel they can approve triage:needs-triage labels Apr 24, 2024
@stevejgordon
Copy link
Contributor

I'd be interested in starting work on adding these.

@joaopgrassi
Copy link
Member

@stevejgordon Awesome! I will assign this to you then if it's fine. Let me know if you need any help with working with the semconv tooling/things. :)

@lmolkova
Copy link
Contributor

lmolkova commented May 1, 2024

/cc @open-telemetry/semconv-dotnet-approver

@antonfirsov
Copy link

antonfirsov commented May 1, 2024

IMO standardization should happen in parallel with the implementation. This is what we did with the HTTP metrics and we found many dragons. The runtime work is tracked by dotnet/runtime#85372 which seems to be on the 9.0 roadmap but I don't see an owner assigned and there was no activity around the issue for almost a year.

@pyohannes
Copy link
Contributor

Be aware that .NET runtime metrics are defined and already shipped via (one of the few) stable .NET instrumentation packages (OpenTelemetry.Instrumentation.Runtime).

Any related work should be coordinated with @open-telemetry/dotnet-maintainers, who decided to make the specification of runtime metrics part of the instrumentation library instead of semantic conventions.

@stevejgordon
Copy link
Contributor

@pyohannes If I've followed the Java story around this, I believe their instrumentation defined many of the metrics later added to the semantic conventions, and I saw a comment from @trask plan(ned) in the Go issue, to handle the switch by using the OTEL_SEMCONV_STABILITY_OPT_IN=jvm configuration. Would that be the way forward here?

@stevejgordon
Copy link
Contributor

My thoughts were to first review what existing metrics are available (not many as most are event counters I believe), which were produced from contrib instrumentation and also to compare some we collect in Elastic APM. That research would be a starting point to identifying what, if any, could/should be introduced in the semantic conventions.

@pyohannes
Copy link
Contributor

@stevejgordon A viable first step would be to sync with the .NET SIG and align with their vision for runtime metrics.

@stevejgordon
Copy link
Contributor

@pyohannes Yep, I agree. Unfortunately, the timing of those SIGs is very difficult for me to attend but I'll try to get to one soon and put this on the agenda. The way I see it we have a couple of challenges here.

a) The runtime isn't yet natively instrumented with metrics and that needs some discussion, including if any metrics that are added will comply with the semantic conventions format. It sounds like right now, the intention is that they will not.
b) There is some existing naming against the stable contrib runtime instrumentation package which differs from the semantic conventions recommendations that have come since.

In regards, to naming, a potentially naive solution is to introduce an env var as Java have, that any .NET runtime metrics and/or OTel contrib instrumentation honour when setting up the metric names. That way, users could opt-in to semantic convention naming, but the existing names (or new .NET runtime names) are the default.

@lmolkova
Copy link
Contributor

lmolkova commented May 3, 2024

Looking into https://github.com/open-telemetry/opentelemetry-dotnet-contrib/tree/main/src/OpenTelemetry.Instrumentation.Runtime, we'd make some breaking change if we attempted to make them consistent with general semantic conventions requirements:

  • metrics would be renamed (process.runtime.dotnet.* -> dotnet.*)
  • attribute names would need to be namespaced

I.e. I don't think there is a benefit in documenting existing semantic conventions for this package and if runtime is instrumented natively, this package would either need to be deprecated or would need to come with v2 compatible with new metrics.

@stevejgordon
Copy link
Contributor

@lmolkova Over on the runtime issue, I have the go-ahead to implement the metrics inside the runtime itself. This will likely be a port of the existing metrics from the instrumentation package, renamed to align with whatever we agree here for the naming. I like @gregkalapos's proposal to prefix these as clr.* since that describes that actual runtime. dotnet could be too general as arguably code could be running on the CLR or something like mono for Blazor etc.

My plan going forward is to create an initial PR with experimental metrics and attributes based on those from the runtime instrumentation library while also working to see if we can implement it in the runtime before .NET 9 is locked down.

I agree that once this all happens, the runtime library could then likely be deprecated.

@AlexanderWert
Copy link
Member

Looking into https://github.com/open-telemetry/opentelemetry-dotnet-contrib/tree/main/src/OpenTelemetry.Instrumentation.Runtime, we'd make some breaking change if we attempted to make them consistent with general semantic conventions requirements:

  • metrics would be renamed (process.runtime.dotnet.* -> dotnet.*)
  • attribute names would need to be namespaced

I.e. I don't think there is a benefit in documenting existing semantic conventions for this package and if runtime is instrumented natively, this package would either need to be deprecated or would need to come with v2 compatible with new metrics.

@lmolkova Not sure I follow your point here. Are you saying we should not define semantic conventions when there is already existing instrumentation / implementation that emits data that is inconsistent with semantic convention guidelines because we would break the implementation?

In semantic conventions we have the JVM metrics. IMHO, for consistency and usability reasons we should follow that pattern and define meaningful runtime metrics for other runtimes / languages in semantic conventions as well. So SemConv should be the single point of specification / documentation that users and vendors can rely on (rather then docs spread over different implementations).

Or do I understand you comment wrong?

@pyohannes
Copy link
Contributor

Are you saying we should not define semantic conventions when there is already existing instrumentation / implementation that emits data that is inconsistent with semantic convention guidelines because we would break the implementation?

@AlexanderWert The main point here is that the OpenTelemetry.Instrumentation.Runtime package emits runtime metrics and is declared stable. If we would now add semantic conventions for .NET runtime metrics that differ from the metrics emitted by this package, .NET maintainers would need to deal with a stable instrumentation library that doesn't conform with semantic conventions.

@stevejgordon
Copy link
Contributor

I think it's reasonable to implement the metrics in the runtime to align with the conventions established for Java. Consumers can opt into those from the runtime or stick with the instrumentation package. At least then, we have semantic convention consistency for those being added.

@lmolkova
Copy link
Contributor

lmolkova commented May 7, 2024

My plan going forward is to create an initial PR with experimental metrics and attributes based on those from the runtime instrumentation library while also working to see if we can implement it in the runtime before .NET 9 is locked down.

@stevejgordon I think it would be a great way to start!

I think it's reasonable to implement the metrics in the runtime to align with the conventions established for Java.

I believe there were a lot of discussions wrt consistency between runtimes and the decision was to put metrics into individual namespaces and not target consistency. So JVM metrics can be used as an inspiration, but not as a guidance.

//cc @trask if I got it wrong.

@lmolkova
Copy link
Contributor

lmolkova commented May 7, 2024

Are you saying we should not define semantic conventions when there is already existing instrumentation / implementation that emits data that is inconsistent with semantic convention guidelines because we would break the implementation?

@pyohannes explained my concern much better than I would #956 (comment)

The only thing to add is that if we defined proper CLR semconv right now we would need to keep them experimental until the .NET runtime (hopefully) emits them natively since there usually are a lot of nuances and changes coming out of the native implementation process.

@stevejgordon
Copy link
Contributor

@lmolkova Starting with these as experimental makes sense. I'm hoping to also contribute to the implementation in runtime so we can move to stable once those are merged and we know the release timeframe. Ideally, we'll get these in for .NET 9.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:system enhancement New feature or request experts needed This issue or pull request is outside an area where general approvers feel they can approve
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants