Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support Custom Metric Attributes Per Request #6281

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

krak3n
Copy link

@krak3n krak3n commented Oct 29, 2024

What

Adds a new WithMetricsAttributesFn option which allows custom metric attributes to be added on a per request basis.

Why

So standard metrics provided by the gRPC instrumentation package can be annotated on a per request basis. Our use case is to add attributes to the instruments depending on what the handler is doing with the request.

@krak3n krak3n requested review from dashpole and a team as code owners October 29, 2024 09:23
@krak3n krak3n force-pushed the feat/dynamic-grpc-labels branch from 370d0b2 to 737f700 Compare October 29, 2024 09:27
@dmathieu
Copy link
Member

See also #6092 (which is still a draft).

@dmathieu
Copy link
Member

Could we follow the same pattern as we're doing for otelhttp?
https://github.com/open-telemetry/opentelemetry-go-contrib/pull/5876/files

@krak3n
Copy link
Author

krak3n commented Oct 29, 2024

Hey @dmathieu

Ah I wasn't aware of #6092. Yeah we can use the same pattern, I'm happy to apply that on this PR and take over that work.

@krak3n krak3n force-pushed the feat/dynamic-grpc-labels branch from 737f700 to e20e5cf Compare October 29, 2024 12:48
@krak3n
Copy link
Author

krak3n commented Oct 29, 2024

@dmathieu pushed up changes.

Had problems getting the TestStatsHandler/Recorded/ClientMetrics tests to pass locally on the rpc.client.request.size metric. I could not figure out for the life of me why all the other scoped metrics worked but this one did not. I'm still stumped by it.

@krak3n
Copy link
Author

krak3n commented Oct 29, 2024

@dmathieu would the ability to access the gRPCContext to be able to append to the metricAttrs also be acceptable?

We have a use case where we would only know what attributes to add to the metrics while we are in the handler after db / network calls.

@dmathieu
Copy link
Member

I don't know if we should make metrics so permissive. By their nature, metrics must be low cardinality.
Your request looks like a good way to blow them up with high cardinality attributes.

@krak3n
Copy link
Author

krak3n commented Oct 29, 2024

Yeah that's fair @dmathieu. How's this PR looking in general?

Copy link
Member

@dmathieu dmathieu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd recommend running the tests locally to ensure everything passes and help with reviews 😸

if f := c.MetricAttributesFn; f != nil {
attrs := f(ctx, rs.Payload)

gctx.metricAttrs = append(gctx.metricAttrs, attrs...)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't seem like we need to modify gctx here?
This also causes a race problem.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the #6092 also modified this, otherwise the metric attributes won't be propagated on the *stats.End call where the majority of the metrics are recorded, since I don't think *stats.End has access to the Payload although I will check that.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

InPayload is the only stats.RPCStats implementation that has access to the request payload. So the this is only time we can call the MetricAttributesFn to be able to append too gRPCContext.metricAttrs, I can put a lock around metricAttrs protect against races?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm very hesitant about updating a pointer to struct that is stored as a context value. That really looks like a smell.
Unfortunately, we can't do things cleanly (provide a new context with the updated value), as we can't provide a new context as return value.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, this doesn't seem like it's going to be possible then due to the limitations in how the stats.Handler works.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dmathieu any thoughts on if we should continue with this or not?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to have other opinions.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I'll leave it with you @dmathieu 👍

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any updates on this?

@krak3n
Copy link
Author

krak3n commented Oct 30, 2024

I'd recommend running the tests locally to ensure everything passes and help with reviews 😸

I did, but I only had issues with one test as I mentioned here: #6281 (comment)

@krak3n krak3n changed the title Support adding custom metric attributes to gRPCContext Support Custom Metric Attributes Per Request Oct 30, 2024
@dmathieu
Copy link
Member

Coming back at this, I don't think it's a good thing to add. Metrics should be low cardinality, and this really opens a big window for folks to shoot them in the foot with high cardinality metrics.
If you wish to add request-specific data, tracing is probably what you need.

Unless there are objections, I will close this PR in 24 hours.

@jsok
Copy link

jsok commented Nov 26, 2024

IMHO developers should decide if the increased cardinality is justified or not.

Maybe I'm doing it wrong but in my experience it's much cheaper to add an attribute to a metric than trace every request if you want a comprehensive view of your requests.

There's an issue and 2 PRs now trying to solve this so there is clearly demand for the capability. I've currently forked this statshandler in my code to add an extra attribute which is less than ideal.
Meanwhile my HTTP services can use the otelhttp handler and add extra attributes just fine.

@dmathieu
Copy link
Member

What attributes are you adding?

@jsok
Copy link

jsok commented Nov 26, 2024

What attributes are you adding?

These are domain/service specific attributes in the context.

In the case of our monitoring platform (Datadog) we still has the ability to choose to exclude tags from being indexed, so the cardinality and cost increase can be managed quite easily. OTOH enabling tracing for every request to use trace metric is prohibitively expensive at scale, hence the preference to do this as a metric attribute.

@pellared
Copy link
Member

pellared commented Nov 26, 2024

These are domain/service specific attributes in the context.

If there are domain/service specific attributes than maybe you can create domain/service specific metrics that would use these attributes instead asking to add custom attributes to instrumentation libraries which are supposed to follow the OpenTelemetry Semantic Conventions.

There's an issue and 2 PRs now trying to solve this so there is clearly demand for the capability.

There may be other ways addressing your use case.

@jsok
Copy link

jsok commented Nov 26, 2024

If there are domain/service specific attributes than maybe you can create domain/service specific metrics that would use these attributes instead asking to add custom attributes to instrumentation libraries which are supposed to follow the OpenTelemetry Semantic Conventions.

IIUC you're saying that RPC metrics must never have any attributes other than those listed in the semantic conventions?

There may be other ways addressing your use case.

Yes recommendations are welcome, I believe these current set of PRs exist because the view was that this was an acceptable solution since it was also done for the http handler.

@pellared
Copy link
Member

pellared commented Nov 26, 2024

IIUC you're saying that RPC metrics must never have any attributes other than those listed in the semantic conventions?

I do understand that there are use cases where users may want to add custom attributes. E.g. see: open-telemetry/opentelemetry-specification#4298. However, I am not sure if these capabilities (to add explicitly add additional attributes) should be included to the instrumentation libraries. It may be an SDK capability.

I would not say must never, but I would just avoid if possible. Nothing in OpenTelemetry Specification says that instrumentation libraries should offer such capability.

Do you know if any other languages (e.g. Java, .NET, Python) allow adding custom metrics to the instrumentation libraries? As far as I remember it is not possible in .NET.

this was an acceptable solution since it was also done for the http handler.

I am not sure if this is actually good that it was added. Notice that otelhttp is experimental (not stable).

@jsok
Copy link

jsok commented Nov 26, 2024

Appreciate the response @pellared

We seem to already have a few ways to add metric attributes:

  • OTEL_RESOURCE_ATTRIBUTES env var (and resource detectors in general) at an SDK level
  • The statshandler also has WithMetricAttributes() as an instrumentation library level

Some making a clear distinction as to why per-request metric attributes are a bridge to far vs resource-wide attributes would help the case for closing this PR and issue.

@dmathieu
Copy link
Member

@krak3n
Copy link
Author

krak3n commented Nov 28, 2024

I think it comes down to the purpose of these instrumentation libraries.

Are they meant to be the one stop shop where we can come and get the officially maintained instrumentation from OTEL? Or is it mean to be the bare minimum instrumentation that is meant to be an example of how you could do HTTP/gRPC instrumentation.

If the former then there is clearly a use case where attributes are not known until the request is handled and the SDK should provide a way to do that. Regardless of whether we shoot ourselves in the foot with cardinality. I don't think limitations in other languages should influence the feature set of another.

If the latter then that should be it should be made clear in the documentation and it's up to Platform Engineers such as myself to use this as a template to provide the instrumentation to the teams I support.

Both are fine approaches and I'm totally down for writing my own implementation to service the teams I support. Of course I would much rather use an officially maintained SDK, takes the weight off of me lol.

@pellared
Copy link
Member

pellared commented Nov 28, 2024

Are they meant to be the one stop shop where we can come and get the officially maintained instrumentation from OTEL?

This.

I want to get a full understanding (and Go SIG as well as Spec SIG agreement) on what it the scope (and constrains) of instrumentation libraries. Thus I started a discussion in OTel Specification. I want to document what are the common expectations from instrumentation libraries and consistent usage experience across instrumentation libraries in OTel Go Contrib. It should also make contributions easier.

@pellared
Copy link
Member

pellared commented Dec 3, 2024

Based on today's OTel Spec SIG meeting:
Adding custom hooks for instrumentation library which are acceptable especially given they have access to contextual information that is not accessible to the processors (e.g. request payload)

// If a MetricAttributesFn is defined call this function and update the gRPCContext with the metric attributes
// returned by ths function.
if f := c.MetricAttributesFn; f != nil {
attrs := f(ctx, rs.Payload)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about stats.OutPayload?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another PR similar to this one had different functions for each rs.(type), we could go down that road or one function where the caller would have to type assert themselves.

@krak3n
Copy link
Author

krak3n commented Dec 7, 2024

Based on today's OTel Spec SIG meeting:

Adding custom hooks for instrumentation library which are acceptable especially given they have access to contextual information that is not accessible to the processors (e.g. request payload)

Does that mean this PR is good to continue progressing with? I'm on vacation atm but I can pick it back up next week if so.

@pellared
Copy link
Member

pellared commented Dec 7, 2024

Does that mean this PR is good to continue progressing with? I'm on vacation atm but I can pick it back up next week if so.

This

Based on today's OTel Spec SIG meeting:
Adding custom hooks for instrumentation library which are acceptable especially given they have access to contextual information that is not accessible to the processors (e.g. request payload)

Does that mean this PR is good to continue progressing with? I'm on vacation atm but I can pick it back up next week if so.

We discussed it during last Go SIG meeting. I am on PTO as well but once I back I plan to create an issue (or even project) about designing and standarizing an approach for instrumentation libraries' customization. Are you interested into contributing? We lack velocity at this moment to work on it more actively. If so please consider joining our next Go SIG meeting.

@krak3n
Copy link
Author

krak3n commented Dec 10, 2024

Does that mean this PR is good to continue progressing with? I'm on vacation atm but I can pick it back up next week if so.

This

Based on today's OTel Spec SIG meeting:

Adding custom hooks for instrumentation library which are acceptable especially given they have access to contextual information that is not accessible to the processors (e.g. request payload)

Does that mean this PR is good to continue progressing with? I'm on vacation atm but I can pick it back up next week if so.

We discussed it during last Go SIG meeting. I am on PTO as well but once I back I plan to create an issue (or even project) about designing and standarizing an approach for instrumentation libraries' customization. Are you interested into contributing? We lack velocity at this moment to work on it more actively. If so please consider joining our next Go SIG meeting.

Yeah I am totally up for contributing, when is the next SIG?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants