Support Custom Metric Attributes Per Request #6281

krak3n · 2024-10-29T09:23:45Z

What

Adds a new WithMetricsAttributesFn option which allows custom metric attributes to be added on a per request basis.

Why

So standard metrics provided by the gRPC instrumentation package can be annotated on a per request basis. Our use case is to add attributes to the instruments depending on what the handler is doing with the request.

dmathieu · 2024-10-29T09:38:18Z

See also #6092 (which is still a draft).

dmathieu · 2024-10-29T09:39:03Z

Could we follow the same pattern as we're doing for otelhttp?
https://github.com/open-telemetry/opentelemetry-go-contrib/pull/5876/files

krak3n · 2024-10-29T10:07:04Z

Hey @dmathieu

Ah I wasn't aware of #6092. Yeah we can use the same pattern, I'm happy to apply that on this PR and take over that work.

krak3n · 2024-10-29T12:51:46Z

@dmathieu pushed up changes.

Had problems getting the TestStatsHandler/Recorded/ClientMetrics tests to pass locally on the rpc.client.request.size metric. I could not figure out for the life of me why all the other scoped metrics worked but this one did not. I'm still stumped by it.

krak3n · 2024-10-29T16:28:54Z

@dmathieu would the ability to access the gRPCContext to be able to append to the metricAttrs also be acceptable?

We have a use case where we would only know what attributes to add to the metrics while we are in the handler after db / network calls.

dmathieu · 2024-10-29T17:04:56Z

I don't know if we should make metrics so permissive. By their nature, metrics must be low cardinality.
Your request looks like a good way to blow them up with high cardinality attributes.

krak3n · 2024-10-29T18:18:59Z

Yeah that's fair @dmathieu. How's this PR looking in general?

dmathieu

I'd recommend running the tests locally to ensure everything passes and help with reviews 😸

dmathieu · 2024-10-30T08:36:50Z

instrumentation/google.golang.org/grpc/otelgrpc/stats_handler.go

+			if f := c.MetricAttributesFn; f != nil {
+				attrs := f(ctx, rs.Payload)
+
+				gctx.metricAttrs = append(gctx.metricAttrs, attrs...)


It doesn't seem like we need to modify gctx here?
This also causes a race problem.

So the #6092 also modified this, otherwise the metric attributes won't be propagated on the *stats.End call where the majority of the metrics are recorded, since I don't think *stats.End has access to the Payload although I will check that.

InPayload is the only stats.RPCStats implementation that has access to the request payload. So the this is only time we can call the MetricAttributesFn to be able to append too gRPCContext.metricAttrs, I can put a lock around metricAttrs protect against races?

I'm very hesitant about updating a pointer to struct that is stored as a context value. That really looks like a smell.
Unfortunately, we can't do things cleanly (provide a new context with the updated value), as we can't provide a new context as return value.

Ok, this doesn't seem like it's going to be possible then due to the limitations in how the stats.Handler works.

@dmathieu any thoughts on if we should continue with this or not?

I'd like to have other opinions.

Ok, I'll leave it with you @dmathieu 👍

Any updates on this?

krak3n · 2024-10-30T08:59:43Z

I'd recommend running the tests locally to ensure everything passes and help with reviews 😸

I did, but I only had issues with one test as I mentioned here: #6281 (comment)

dmathieu · 2024-11-26T13:22:49Z

Coming back at this, I don't think it's a good thing to add. Metrics should be low cardinality, and this really opens a big window for folks to shoot them in the foot with high cardinality metrics.
If you wish to add request-specific data, tracing is probably what you need.

Unless there are objections, I will close this PR in 24 hours.

jsok · 2024-11-26T19:20:08Z

IMHO developers should decide if the increased cardinality is justified or not.

Maybe I'm doing it wrong but in my experience it's much cheaper to add an attribute to a metric than trace every request if you want a comprehensive view of your requests.

There's an issue and 2 PRs now trying to solve this so there is clearly demand for the capability. I've currently forked this statshandler in my code to add an extra attribute which is less than ideal.
Meanwhile my HTTP services can use the otelhttp handler and add extra attributes just fine.

dmathieu · 2024-11-26T20:23:00Z

What attributes are you adding?

jsok · 2024-11-26T22:01:17Z

What attributes are you adding?

These are domain/service specific attributes in the context.

In the case of our monitoring platform (Datadog) we still has the ability to choose to exclude tags from being indexed, so the cardinality and cost increase can be managed quite easily. OTOH enabling tracing for every request to use trace metric is prohibitively expensive at scale, hence the preference to do this as a metric attribute.

pellared · 2024-11-26T22:14:40Z

These are domain/service specific attributes in the context.

If there are domain/service specific attributes than maybe you can create domain/service specific metrics that would use these attributes instead asking to add custom attributes to instrumentation libraries which are supposed to follow the OpenTelemetry Semantic Conventions.

There's an issue and 2 PRs now trying to solve this so there is clearly demand for the capability.

There may be other ways addressing your use case.

jsok · 2024-11-26T22:25:27Z

If there are domain/service specific attributes than maybe you can create domain/service specific metrics that would use these attributes instead asking to add custom attributes to instrumentation libraries which are supposed to follow the OpenTelemetry Semantic Conventions.

IIUC you're saying that RPC metrics must never have any attributes other than those listed in the semantic conventions?

There may be other ways addressing your use case.

Yes recommendations are welcome, I believe these current set of PRs exist because the view was that this was an acceptable solution since it was also done for the http handler.

pellared · 2024-11-26T23:06:07Z

IIUC you're saying that RPC metrics must never have any attributes other than those listed in the semantic conventions?

I do understand that there are use cases where users may want to add custom attributes. E.g. see: open-telemetry/opentelemetry-specification#4298. However, I am not sure if these capabilities (to add explicitly add additional attributes) should be included to the instrumentation libraries. It may be an SDK capability.

I would not say must never, but I would just avoid if possible. Nothing in OpenTelemetry Specification says that instrumentation libraries should offer such capability.

Do you know if any other languages (e.g. Java, .NET, Python) allow adding custom metrics to the instrumentation libraries? As far as I remember it is not possible in .NET.

this was an acceptable solution since it was also done for the http handler.

I am not sure if this is actually good that it was added. Notice that otelhttp is experimental (not stable).

jsok · 2024-11-26T23:20:36Z

Appreciate the response @pellared

We seem to already have a few ways to add metric attributes:

OTEL_RESOURCE_ATTRIBUTES env var (and resource detectors in general) at an SDK level
The statshandler also has WithMetricAttributes() as an instrumentation library level

Some making a clear distinction as to why per-request metric attributes are a bridge to far vs resource-wide attributes would help the case for closing this PR and issue.

dmathieu · 2024-11-27T09:30:53Z

Related: open-telemetry/opentelemetry-specification#4311

krak3n · 2024-11-28T11:04:37Z

I think it comes down to the purpose of these instrumentation libraries.

Are they meant to be the one stop shop where we can come and get the officially maintained instrumentation from OTEL? Or is it mean to be the bare minimum instrumentation that is meant to be an example of how you could do HTTP/gRPC instrumentation.

If the former then there is clearly a use case where attributes are not known until the request is handled and the SDK should provide a way to do that. Regardless of whether we shoot ourselves in the foot with cardinality. I don't think limitations in other languages should influence the feature set of another.

If the latter then that should be it should be made clear in the documentation and it's up to Platform Engineers such as myself to use this as a template to provide the instrumentation to the teams I support.

Both are fine approaches and I'm totally down for writing my own implementation to service the teams I support. Of course I would much rather use an officially maintained SDK, takes the weight off of me lol.

pellared · 2024-11-28T11:46:28Z

Are they meant to be the one stop shop where we can come and get the officially maintained instrumentation from OTEL?

This.

I want to get a full understanding (and Go SIG as well as Spec SIG agreement) on what it the scope (and constrains) of instrumentation libraries. Thus I started a discussion in OTel Specification. I want to document what are the common expectations from instrumentation libraries and consistent usage experience across instrumentation libraries in OTel Go Contrib. It should also make contributions easier.

pellared · 2024-12-03T17:28:58Z

Based on today's OTel Spec SIG meeting:
Adding custom hooks for instrumentation library which are acceptable especially given they have access to contextual information that is not accessible to the processors (e.g. request payload)

pellared · 2024-12-03T17:30:28Z

instrumentation/google.golang.org/grpc/otelgrpc/stats_handler.go

+			// If a MetricAttributesFn is defined call this function and update the gRPCContext with the metric attributes
+			// returned by ths function.
+			if f := c.MetricAttributesFn; f != nil {
+				attrs := f(ctx, rs.Payload)


What about stats.OutPayload?

Another PR similar to this one had different functions for each rs.(type), we could go down that road or one function where the caller would have to type assert themselves.

krak3n · 2024-12-07T14:37:24Z

Based on today's OTel Spec SIG meeting:

Adding custom hooks for instrumentation library which are acceptable especially given they have access to contextual information that is not accessible to the processors (e.g. request payload)

Does that mean this PR is good to continue progressing with? I'm on vacation atm but I can pick it back up next week if so.

pellared · 2024-12-07T14:48:03Z

Does that mean this PR is good to continue progressing with? I'm on vacation atm but I can pick it back up next week if so.

This

Based on today's OTel Spec SIG meeting:
Adding custom hooks for instrumentation library which are acceptable especially given they have access to contextual information that is not accessible to the processors (e.g. request payload)

Does that mean this PR is good to continue progressing with? I'm on vacation atm but I can pick it back up next week if so.

We discussed it during last Go SIG meeting. I am on PTO as well but once I back I plan to create an issue (or even project) about designing and standarizing an approach for instrumentation libraries' customization. Are you interested into contributing? We lack velocity at this moment to work on it more actively. If so please consider joining our next Go SIG meeting.

krak3n · 2024-12-10T07:24:47Z

Does that mean this PR is good to continue progressing with? I'm on vacation atm but I can pick it back up next week if so.

This

Based on today's OTel Spec SIG meeting:

Adding custom hooks for instrumentation library which are acceptable especially given they have access to contextual information that is not accessible to the processors (e.g. request payload)

Does that mean this PR is good to continue progressing with? I'm on vacation atm but I can pick it back up next week if so.

We discussed it during last Go SIG meeting. I am on PTO as well but once I back I plan to create an issue (or even project) about designing and standarizing an approach for instrumentation libraries' customization. Are you interested into contributing? We lack velocity at this moment to work on it more actively. If so please consider joining our next Go SIG meeting.

Yeah I am totally up for contributing, when is the next SIG?

krak3n requested review from dashpole and a team as code owners October 29, 2024 09:23

krak3n force-pushed the feat/dynamic-grpc-labels branch from 370d0b2 to 737f700 Compare October 29, 2024 09:27

feat(otelgrpc): allow setting dynamic per-request metric attributes

e20e5cf

krak3n force-pushed the feat/dynamic-grpc-labels branch from 737f700 to e20e5cf Compare October 29, 2024 12:48

dmathieu reviewed Oct 30, 2024

View reviewed changes

krak3n changed the title ~~Support adding custom metric attributes to gRPCContext~~ Support Custom Metric Attributes Per Request Oct 30, 2024

pellared reviewed Dec 3, 2024

View reviewed changes

Support Custom Metric Attributes Per Request #6281

Are you sure you want to change the base?

Support Custom Metric Attributes Per Request #6281

Conversation

krak3n commented Oct 29, 2024 • edited Loading

What

Why

dmathieu commented Oct 29, 2024

dmathieu commented Oct 29, 2024

krak3n commented Oct 29, 2024

krak3n commented Oct 29, 2024

krak3n commented Oct 29, 2024

dmathieu commented Oct 29, 2024

krak3n commented Oct 29, 2024

dmathieu left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

krak3n commented Oct 30, 2024

dmathieu commented Nov 26, 2024

jsok commented Nov 26, 2024 • edited Loading

dmathieu commented Nov 26, 2024

jsok commented Nov 26, 2024

pellared commented Nov 26, 2024 • edited Loading

jsok commented Nov 26, 2024

pellared commented Nov 26, 2024 • edited Loading

jsok commented Nov 26, 2024 • edited Loading

dmathieu commented Nov 27, 2024

krak3n commented Nov 28, 2024 • edited Loading

pellared commented Nov 28, 2024 • edited Loading

pellared commented Dec 3, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

krak3n commented Dec 7, 2024

pellared commented Dec 7, 2024 • edited Loading

krak3n commented Dec 10, 2024

krak3n commented Oct 29, 2024 •

edited

Loading

jsok commented Nov 26, 2024 •

edited

Loading

pellared commented Nov 26, 2024 •

edited

Loading

pellared commented Nov 26, 2024 •

edited

Loading

jsok commented Nov 26, 2024 •

edited

Loading

krak3n commented Nov 28, 2024 •

edited

Loading

pellared commented Nov 28, 2024 •

edited

Loading

pellared commented Dec 3, 2024 •

edited

Loading

pellared commented Dec 7, 2024 •

edited

Loading